Commit graph

562 commits

Author SHA1 Message Date
Daniel Kahn Gillmor
eb232ee0ab reindex: drop notmuch_param_t, use notmuch_indexopts_t instead
There are at least three places in notmuch that can trigger an
indexing action:

 * notmuch new
 * notmuch insert
 * notmuch reindex

I have plans to add some indexing options (e.g. indexing the cleartext
of encrypted parts, external filters, automated property injection)
that should properly be available in all places where indexing
happens.

I also want those indexing options to be exposed by (and constrained
by) the libnotmuch C API.

This isn't yet an API break because we've never made a release with
notmuch_param_t.

These indexing options are relevant in the listed places (and in the
libnotmuch analogues), but they aren't relevant in the other kinds of
functionality that notmuch offers (e.g. dump/restore, tagging, search,
show, reply).

So i think a generic "param" object isn't well-suited for this case.
In particular:

 * a param object sounds like it could contain parameters for some
   other (non-indexing) operation.  This sounds confusing -- why would
   i pass non-indexing parameters to a function that only does
   indexing?

 * bremner suggests online a generic param object would actually be
   passed as a list of param objects, argv-style.  In this case (at
   least in the obvious argv implementation), the params might be some
   sort of generic string.  This introduces a problem where the API of
   the library doesn't grow as new options are added, which means that
   when code outside the library tries to use a feature, it first has
   to test for it, and have code to handle it not being available.
   The indexopts approach proposed here instead makes it clear at
   compile time and at dynamic link time that there is an explicit
   dependency on that feature, which allows automated tools to keep
   track of what's needed and keeps the actual code simple.

My proposal adds the notmuch_indexopts_t as an opaque struct, so that
we can extend the list of options without causing ABI breakage.

The cost of this proposal appears to be that the "boilerplate" API
increases a little bit, with a generic constructor and destructor
function for the indexopts struct.

More patches will follow that make use of this indexopts approach.
2017-08-23 07:55:12 -03:00
Daniel Kahn Gillmor
b10ce6bc23 database: add n_d_index_file (deprecates n_d_add_message)
We need a way to pass parameters to the indexing functionality on the
first index, not just on reindexing.  The obvious place is in
notmuch_database_add_message.  But since modifying the argument list
would break both API and ABI, we needed a new name.

I considered notmuch_database_add_message_with_params(), but the
functionality we're talking about doesn't always add a message.  It
tries to index a specific file, possibly adding a message, but
possibly doing other things, like adding terms to an existing message,
or failing to deal with message objects entirely (e.g. because the
file didn't contain a message).

So i chose the function name notmuch_database_index_file.

I confess i'm a little concerned about confusing future notmuch
developers with the new name, since we already have a private
_notmuch_message_index_file function, and the two do rather different
things.  But i think the added clarity for people linking against the
future libnotmuch and the capacity for using index parameters makes
this a worthwhile tradeoff.  (that said, if anyone has another name
that they strongly prefer, i'd be happy to go with it)

This changeset also adjusts the tests so that we test whether the new,
preferred function returns bad values (since the deprecated function
just calls the new one).

We can keep the deprecated n_d_add_message function around as long as
we like, but at the next place where we're forced to break API or ABI
we can probably choose to drop the name relatively safely.

NOTE: there is probably more cleanup to do in the ruby and go bindings
to complete the deprecation directly.  I don't know those languages
well enough to attempt a fix; i don't know how to test them; and i
don't know the culture around those languages about API additions or
deprecations.
2017-08-23 07:38:37 -03:00
Yuri Volchkov
cec4a87539 database: move striping of trailing '/' into helper function
Stripping trailing character is not that uncommon
operation. Particularly, the next patch has to perform it as
well. Lets move it to the separate function to avoid code duplication.

Also the new function has a little improvement: if the character to
strip is repeated several times in the end of a string, function
strips them all.

Signed-off-by: Yuri Volchkov <yuri.volchkov@gmail.com>
2017-08-22 18:47:51 -03:00
Daniel Kahn Gillmor
55f9f6505e lib: clarify description of notmuch_database_add_message
Since we're accumulating the index when we add a new file to the
message, the semantics have slightly changed.  This tries to align the
documentation with the actual functionality.
2017-08-20 08:33:46 -03:00
Daniel Kahn Gillmor
5b93fa6e70 lib: add notmuch_message_reindex
This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
2017-08-01 21:17:47 -04:00
David Bremner
34d7753992 lib: add _notmuch_message_remove_indexed_terms
Testing will be provided via use in notmuch_message_reindex
2017-08-01 21:17:47 -04:00
David Bremner
50340bcb78 lib: add notmuch_thread_get_total_files
This is relatively inexpensive in terms of run time and implementation
cost as we are already traversing the list of messages in a thread.
2017-08-01 21:17:47 -04:00
David Bremner
8a8e2b11c2 lib: add notmuch_message_count_files
This operation is relatively inexpensive, as the needed metadata is
already computed by our lazy metadata fetching. The goal is to support
better UI for messages with multipile files.
2017-08-01 21:17:47 -04:00
David Bremner
411675a6ce lib: index message files with duplicate message-ids
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything. Values on the other hand get
overwritten, which is a bit annoying, but arguably it is not worse to
take the values (from, subject, date) from the last file indexed
rather than the first.
2017-08-01 21:17:47 -04:00
David Bremner
4fdabd636e lib: refactor notmuch_database_add_message header parsing
This function is large and hard to understand and modify. Start to
break it down into meaningful pieces.
2017-08-01 21:17:47 -04:00
David Bremner
2f94b3090c lib: factor out message-id parsing to separate file.
This is really pure C string parsing, and doesn't need to be mixed in
with the Xapian/C++ layer. Although not strictly necessary, it also
makes it a bit more natural to call _parse_message_id from multiple
compilation units.
2017-08-01 21:17:47 -04:00
David Bremner
95b52e85b2 lib/n_d_add_message: refactor test for new/ghost messages
The switch is easier to understand than the side effects in the if
test. It also potentially allows us more flexibility in breaking up
this function into smaller pieces, since passing private_status around
is icky.
2017-08-01 21:17:47 -04:00
David Bremner
4034a7cec7 lib: isolate n_d_add_message and helper functions into own file
'database.cc' is becoming a monster, and it's hard to follow what the
various static functions are used for. It turns out that about 1/3 of
this file notmuch_database_add_message and helper functions not used
by any other function. This commit isolates this code into it's own
file.

Some side effects of this refactoring:

- find_doc_ids becomes the non-static (but still private)
  _notmuch_database_find_doc_ids
- a few instances of 'string' have 'std::' prepended, avoiding the
  need for 'using namespace std;' in the new file.
2017-08-01 21:17:47 -04:00
Daniel Kahn Gillmor
d55fffffd7 fix the generated documentation output 2017-07-18 06:53:57 -03:00
Daniel Kahn Gillmor
87bdfbc91f Fix orthography 2017-07-18 06:50:44 -03:00
David Bremner
4ce7591610 lib: paper over allocation difference
In gmime 3.0 this function is "transfer none", so no deallocation is
needed (or permitted)
2017-07-14 21:23:52 -03:00
David Bremner
eeb64cdeeb lib: add version of _n_m_f_get_combinded_header for gmime 3.0
The iterator is gone, so we need a new loop structure.
2017-07-14 21:23:52 -03:00
David Bremner
439c5896b6 lib: refactor _notmuch_messsage_file_get_combined_header
We need to rewrite the loop for gmime-3.0; move the loop body to its
own function to avoid code duplication.  Keep the common exit via
"goto DONE" to make this pure code movement.  It's important to note
that the existing exit path only deallocates the iterator.
2017-07-14 21:23:52 -03:00
David Bremner
c040464a7c lib: wrap use of g_mime_utils_header_decode_date
This changes return type in gmime 3.0
2017-07-14 21:23:52 -03:00
David Bremner
cbb2d5608e lib/cli: replace use of g_mime_message_get_sender
This function changes semantics in gmime-3.0 so make a new function
that provides the same functionality in both
2017-07-14 17:58:09 -03:00
David Bremner
6dd00d6486 lib/index: add simple html filter
The filter just drops all (HTML) tags. As an enabling change, pass the
content type to the filter constructor so we can decide which scanner
to user.
2017-07-01 12:32:27 -03:00
David Bremner
64f81f95a1 lib/index.cc: generalize filter state machine
To match things more complicated than fixed strings, we need states
with multiple out arrows.
2017-07-01 12:32:17 -03:00
David Bremner
4a085a5137 lib/index: separate state table definition from scanner.
We want to reuse the scanner definition with a different table.  This
is mainly code movement, and making the state table part of the filter
struct/class.
2017-07-01 12:32:03 -03:00
David Bremner
20c15bc820 lib/index: generalize name of indexing filter
In followup commits we will generalize the functionality of this
filter to deal with other types of non-indexable content.
2017-07-01 12:31:55 -03:00
Jani Nikula
30c475c1ef build: visibility=default for library structs is no longer needed
Commit d5523ead90 ("Mark some structures in the library interface
with visibility=default attribute.") fixed some mixed visibility
issues with structs. With the symbol default visibility reversed, this
is no longer a problem.
2017-05-13 08:38:18 -03:00
Jani Nikula
bc11759dd1 build: switch to hiding libnotmuch symbols by default
The dynamic generation of the linker version script for libnotmuch
exports has grown rather complicated.

Reverse the visibility control by hiding symbols by default using
-fvisibility=hidden, and explicitly exporting symbols in notmuch.h
using #pragma GCC visibility. (We could also use __attribute__
((visibility ("default"))) for each exported function, but the pragma
is more convenient.)

The above is not quite enough alone, as it would "leak" a number of
weak symbols from Xapian and C++ standard library. Combine it with a
small static version script that filters out everything except the
notmuch_* symbols that we explicitly exposed, and the C++ RTTI
typeinfo symbols for exception handling.

Finally, as the symbol hiding test can no longer look at the generated
symbol table, switch the test to parse the functions from notmuch.h.
2017-05-12 07:17:18 -03:00
Jani Nikula
d5ed9af0e4 build: do not export compat functions from lib
Commits 9db2145272 ("lib/gen-version-script.h: add getline and
getdelim to notmuch.sym if needed") and 3242e29e57 ("build: add
canonicalize_file_name to symbols exported from libnotmuch.so")
started exporting compat functions from libnotmuch so that the cli
could use them. But we shouldn't export such functions from the
library. They are not part of our ABI. Instead, the cli should include
its own copies of the compat functions.
2017-05-11 20:41:10 -03:00
David Bremner
11d47950c1 lib: Add regexp expansion for for tags and paths
From a UI perspective this looks similar to what was already provided
for from, subject, and mid, but the implementation is quite
different. It uses the database's list of terms to construct a term
based query equivalent to the passed regular expression.
2017-05-09 07:44:29 -03:00
David Bremner
eab365c742 lib: Add regexp searching for mid: prefix
The bulk of the change is passing in the field options to the regexp
field processor, so that we can properly handle the
fallback (non-regexp case).
2017-05-09 07:44:15 -03:00
Fredrik Fornwall
e565118172 Replace index(3) with strchr(3)
The index(3) function has been deprecated in POSIX since 2001 and
removed in 2008, and most code in notmuch already calls strchr(3).

This fixes a compilation error on Android whose libc does not have
index(3).
2017-04-20 06:59:22 -03:00
David Bremner
e1c1d33f37 Merge branch 'release'
Another regexp search fix.
2017-03-29 20:58:34 -03:00
David Bremner
cb84f84878 lib: handle empty string in regexp field processors
The non-field processor behaviour is is convert the corresponding
queries into a search for the unprefixed terms. This yields pretty
surprising results so I decided to generate a query that would match
the terms (i.e. none with that prefix) generated for an empty header.
2017-03-29 20:44:32 -03:00
David Bremner
d877240f4e Merge branch 'release'
wildcard search fixes, plus release busywork
2017-03-25 11:51:03 -03:00
David Bremner
38a56b98f9 lib: only trigger phrase processing for regexp fields when needed
The argument is that if the string passed to the field processor has
no spaces, then the added quotes won't have any benefit except for
disabling wildcards. But disabling wildcards doesn't seem very useful
in the normal Xapian query parser, since they're stripped before
generating terms anyway. It does mean that the query 'from:"foo*"' will
not be precisely equivalent to 'from:foo' as it is for the non
field-processor version.
2017-03-24 09:24:13 -03:00
David Bremner
242d5a3ed5 lib: make notmuch_query_add_tag_exclude return a status value
Since this is an ABI breaking change, but we already bumped the SONAME
for the next release
2017-03-22 08:47:13 -03:00
David Bremner
3721bd45d7 lib: replace deprecated n_q_count_threads with status returning version
This function was deprecated in notmuch 0.21.  We re-use the name for
a status returning version, and deprecate the _st name.
2017-03-22 08:35:07 -03:00
David Bremner
5ce8e0b11b lib: replace deprecated n_q_count_messages with status returning version
This function was deprecated in notmuch 0.21.  We re-use the name for
a status returning version, and deprecate the _st name. One or two
remaining uses of the (removed) non-status returning version fixed at
the same time
2017-03-22 08:35:07 -03:00
David Bremner
86cbd215eb lib: replace deprecated n_q_search_messages with status returning version
This function was deprecated in notmuch 0.21.  We re-use the name for
a status returning version, and deprecate the _st name.
2017-03-22 08:35:07 -03:00
David Bremner
1e982de508 lib: replace n_query_search_threads with status returning version
This function was deprecated in notmuch 0.21. We finally remove the
deprecated API, and rename the status returning version to the simpler
name. The status returning is kept as a deprecated alias.
2017-03-22 08:28:09 -03:00
David Bremner
fc63c15833 lib: bump SONAME to libnotmuch5
We plan a sequence of ABI breaking changes. Put the SONAME change in a
separate commit to make reordering easier.
2017-03-22 08:27:58 -03:00
David Bremner
c39f6361d0 rename libutil.a to libnotmuch_util.a
Apparently some systems (MacOS?) have a system library called libutil
and the name conflict causes problems. Since this library is quite
notmuch specific, rename it to something less generic.
2017-03-18 21:37:43 -03:00
David Bremner
a8a2705222 Merge branch 'release'
Merge in memory fixes
2017-03-18 21:02:42 -03:00
Tomi Ollila
06adc27668 lib/message.cc: fix Coverity finding (use after free)
The object where pointer to `data` was received was deleted before
it was used in _notmuch_string_list_append().

Relevant Coverity messages follow:

3: extract
Assigning: data = std::__cxx11::string(message->doc.()).c_str(),
which extracts wrapped state from temporary of type std::__cxx11::string.

4: dtor_free
The internal representation of temporary of type std::__cxx11::string
is freed by its destructor.

5: use after free:
Wrapper object use after free (WRAPPER_ESCAPE)
Using internal representation of destroyed object local data.
2017-03-18 20:59:46 -03:00
David Bremner
62822a4e2d lib: clamp return value of g_mime_utils_header_decode_date to >=0
For reasons not completely understood at this time, gmime (as of
2.6.22) is returning a date before 1900 on bad date input. Since this
confuses some other software, we clamp such dates to 0,
i.e. 1970-01-01.
2017-03-15 21:58:25 -03:00
Jani Nikula
d56a801b67 lib/database: reduce try block scope to things that really need it
No need to maintain the pure C stuff within a try block, it's arguably
confusing. This also reduces indent for a bunch of code. No functional
changes.
2017-03-10 09:21:05 -04:00
Olly Betts
81bd72cebb lib: Fix RegexpPostingSource
Remove incorrect skipping to first match from init(), and add explicit
skip_to() and check() methods to work around xapian-core bug (the
check() method will also improve speed when filtering by one of
these).
2017-03-07 19:44:36 -04:00
David Bremner
dfacfe14f3 lib: query make exclude handling non-destructive
We filter added exclude at add time, rather than modifying the query by
count search. As noted in the comments, there are several ignored
conditions here.
2017-03-04 20:47:25 -04:00
David Bremner
e209b71873 lib: centralize query parsing, store results.
The main goal is to prepare the way for non-destructive (or at least
less destructive) exclude tag handling. It does this by having a
pre-parsed query available for further processing. This also allows us
to provide slightly more precise error messages.
2017-03-04 20:47:25 -04:00
Jani Nikula
f3edc5dc86 lib: use delete[] to free buffer allocated using new[]
Fix warning caught by clang:

lib/regexp-fields.cc:41:2: warning: 'delete' applied to a pointer that was allocated
      with 'new[]'; did you mean 'delete[]'? [-Wmismatched-new-delete]
        delete buffer;
        ^
              []
lib/regexp-fields.cc:37:17: note: allocated with 'new[]' here
        char *buffer = new char[len];
                       ^
2017-03-04 20:42:39 -04:00
David Bremner
6cb1c617a7 lib: add mid: as a synonym for id:
mid: is the url scheme suggested by URL 2392. We also plan to
introduce more flexible searches for mid: than are possible with
id: (in order not to break assumptions about the special behaviour of
id:, e.g. identifying at most one message).
2017-03-03 17:46:48 -04:00