Commit graph

172 commits

Author SHA1 Message Date
David Bremner
b0a11dbc38 lib/{open,message}: make some internal functions static
They are not used outside their file, so being extern seems like an oversight
2021-06-05 15:40:00 -03:00
David Bremner
b1b6798588 lib/message: mark flag2tag as const
This table is intended to be immutable
2021-05-14 06:39:12 -03:00
David Bremner
9ad19e4454 lib: directly traverse postlists in _n_message_delete
This is intended to fix the slow behaviour of "notmuch new" (and possibly
"notmuch reindex") when large numbers of files are deleted.

The underlying issue [1] seems to be the Xapian glass backend spending
a large amount of time in db.has_positions when running queries with
large-ish amounts of unflushed changes.

This commit removes two uses of Xapian queries [2], and replaces them with
an approximation of what Xapian would do after optimizing the
queries. This avoids the calls to has_positions (which are in any case
un-needed because we are only using boolean terms here).

[1] Thanks to "andres" on IRC for narrowing down the performance
bottleneck.

[2] Thanks to Olly Betts of Xapian fame for talking me a through a fix
that does not require people to update Xapian.
2021-04-18 09:50:36 -03:00
David Bremner
e823d05ae6 lib: support splitting mail from database location.
Introduce a new configuration value for the mail root, and use it to
locate mail messages in preference to the database.path (which
previously implied the mail messages were also in this location.

Initially only a subset of the CLI is tested in a split
configuration. Further changes will be needed for the remainder of the
CLI to work in split configurations.
2021-03-20 07:39:12 -03:00
David Bremner
1121299905 lib: publish API for notmuch_database_reopen
Include the (currently unused) mode argument which will specify which
mode to re-open the database in. Functionality and docs to be
finalized in a followup commit.
2021-03-18 08:03:36 -03:00
uncrustify
8aeba1228a lib: run uncrustify
This is the result of running

     $ uncrustify --replace --config ../devel/uncrustify.cfg *.c *.h *.cc

in the lib directory
2021-03-13 08:45:34 -04:00
David Bremner
a09293793f lib: replace use of static_cast for writable databases
static_cast is a bit tricky to understand and error prone, so add a
second pointer to (potentially the same) Xapian database object that
we know has the right subclass.
2020-07-28 08:47:58 -03:00
David Bremner
d7d4c729ab lib: encapsulate the use of notmuch_database_t field 'mode'
The plan is to change the underlying representation.
2020-07-28 08:47:58 -03:00
David Bremner
e9867b818b lib: fix exception messages for n_m_message_*
The original generic handler had an extra '%s' in the format
string. Update tests that failed to catch this because the template to
print status strings checked 'stat', which was not set.
2020-07-22 19:52:55 -03:00
David Bremner
765ca7bc08 lib: fix return value for n_m_reindex
Also update the documentation for the behaviour of n_m_get_thread_id
that this fix relies on.
2020-07-20 08:54:42 -03:00
David Bremner
a2b90dc084 lib: handle xapian exception in n_m_remove_all_tags
At least the exception we already catch should be reported properly.
2020-07-20 08:54:42 -03:00
David Bremner
b7572ceb14 lib: add notmuch_message_has_maildir_flag_st
Initially the new function is mainly tested indirectly via the
wrapper.
2020-07-20 08:54:42 -03:00
David Bremner
b21f0fcb6a test: add regression test for notmuch_message_has_maildir_flag
This passes the NULL return inside _ensure_maildir_flags does not
break anything. Probably this should be handled more explicitely.
2020-07-20 08:45:15 -03:00
David Bremner
2d04ed2631 lib: catch exceptions in n_m_get_flag, provide n_m_get_flag_st
It's not very nice to return FALSE for an error, so provide
notmuch_message_get_flag_st as a migration path.

Bump LIBNOTMUCH_MINOR_VERSION because the API is extended.
2020-07-18 09:52:27 -03:00
David Bremner
78e9b3467d lib: use COERCE_STATUS in n_m_{add,remove}_tag
Currently I don't know of a good way of testing this, but at least in
principle a Xapian exception in _notmuch_message_{add,remove}_term
would cause an abort in the library.
2020-07-14 07:31:45 -03:00
David Bremner
aa8e3f4487 lib: catch Xapian exceptions in n_m_remove_tag
The churn here is again mainly re-indentation.
2020-07-14 07:31:45 -03:00
David Bremner
33dd5fdc69 lib: catch Xapian exceptions in n_m_add_tag
This is mostly just (horizontal) code movement due to wrapping
everything in a try / catch.
2020-07-14 07:31:45 -03:00
David Bremner
96befd0dd0 lib: catch Xapian exceptions in n_m_count_files
This will require some care for the caller to check the sign, and not
just add error returns into a running total.
2020-07-14 07:31:37 -03:00
David Bremner
00f1abfdf4 lib: catch Xapian exceptions in n_m_get_tags
This allows the function to return an error value rather than
crashing.
2020-07-14 07:12:52 -03:00
David Bremner
e404d8a51d lib: use LOG_XAPIAN_EXCEPTION in n_m_get_date
This should not change functionality, but does slightly reduce code
duplication. Perhaps more importantly it allows consistent changes to
all of the similar exception handling in message.cc.
2020-07-14 07:12:52 -03:00
David Bremner
286161b703 lib: catch exceptions in n_m_get_filenames
This is essentially copied from the change to notmuch_message_get_filename
2020-07-13 07:19:22 -03:00
David Bremner
a606cba32b lib/n_m_g_filename: catch Xapian exceptions, document NULL return
This is the same machinery as applied for

     notmuch_message_get_{thread,message}_id
2020-07-13 07:19:22 -03:00
David Bremner
9201c50204 lib/message: use LOG_XAPIAN_EXCEPTION in n_m_get_header
This is just for consistency, and a small reduction in the amount of
boilerplate.
2020-07-13 07:19:22 -03:00
David Bremner
dbdb860bb9 lib/message: catch exception in n_m_get_thread_id
This allows us to return an error value from the library.
2020-07-03 21:04:43 -03:00
David Bremner
87d462a204 lib: catch error from closed db in n_m_get_message_id
By catching it at the library top level, we can return an error value.
2020-07-03 21:03:51 -03:00
David Bremner
45cfeb2e55 lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms
strncmp looks for a prefix that matches, which is very much not what
we want here. This fixes the bug reported by Franz Fellner in
id:1588595993-ner-8.651@TPL520
2020-05-04 10:55:43 -03:00
uncrustify
2b62ca2e3b lib: run uncrustify
This is the result of running

     $ uncrustify --replace --config ../devel/uncrustify.cfg *.c *.h *.cc

in the lib directory
2019-06-14 07:41:27 -03:00
Daniel Kahn Gillmor
5c3a44681f indexing: record protected subject when indexing cleartext
When indexing the cleartext of an encrypted message, record any
protected subject in the database, which should make it findable and
visible in search.

Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
2019-05-29 08:14:44 -03:00
David Bremner
75bdce7952 lib: support user prefix names in term generation
This should not change the indexing process yet as nothing calls
_notmuch_message_gen_terms with a user prefix name. On the other hand,
it should not break anything either.

_notmuch_database_prefix does a linear walk of the list of (built-in)
prefixes, followed by a logarithmic time search of the list of user
prefixes. The latter is probably not really noticable.
2019-05-25 07:17:27 -03:00
David Bremner
97939170b3 n_m_remove_indexed_terms: reduce number of Xapian API calls.
Previously this functioned scanned every term attached to a given
Xapian document. It turns out we know how to read only the terms we
need to preserve (and we might have already done so). This commit
replaces many calls to Xapian::Document::remove_term with one call to
::clear_terms, and a (typically much smaller) number of calls to
::add_term. Roughly speaking this is based on the assumption that most
messages have more text than they have tags.

According to the performance test suite, this yields a roughly 40%
speedup on "notmuch reindex '*'"
2019-05-23 08:00:56 -03:00
David Bremner
319dd95ebb lib: add 'body:' field, stop indexing headers twice.
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
2019-04-17 08:48:16 -03:00
David Bremner
0a7181dd16 lib: calculate message depth in thread
This will be used in reparenting messages without useful in-reply-to,
but with useful references
2018-09-06 08:07:13 -03:00
David Bremner
d0b844b358 lib: read reference terms into message struct.
The plan is to use these in resolving threads.
2018-09-06 08:07:12 -03:00
David Bremner
9b568e73e1 lib/thread: sort sibling messages by date
For non-root messages, this should not should anything currently, as
the messages are already added in date order. In the future we will
add some non-root messages in a second pass out of order and the
sorting will be useful. It does fix the order of multiple
root-messages (although it is overkill for that).
2018-09-06 08:07:12 -03:00
Daniel Kahn Gillmor
6a9f26b4a0 lib: make notmuch_message_get_database() take a const notmuch_message_t*
This is technically an API change, but it is not an ABI change, and
it's merely a statement that limits what the library can do.

This is in parallel to notmuch_query_get_database(), which also takes
a const pointer.
2018-05-26 07:32:01 -07:00
Daniel Kahn Gillmor
9088db76d8 lib: expose notmuch_message_get_database()
We've had _notmuch_message_database() internally for a while, and it's
useful.  It turns out to be useful on the other side of the library
interface as well (i'll use it later in this series for "notmuch
show"), so we expose it publicly now.
2018-05-26 07:30:32 -07:00
David Bremner
f0131af6c5 lib: define specialized get_thread_id for use in thread subquery
The observation is that we are only using the messages to get there
thread_id, which is kindof a pessimal access pattern for the current
notmuch_message_get_thread_id
2018-05-07 08:42:53 -03:00
Daniel Kahn Gillmor
6a9626a2fd cli/reindex: destroy stashed session keys when --decrypt=false
There are some situations where the user wants to get rid of the
cleartext index of a message.  For example, if they're indexing
encrypted messages normally, but suddenly they run across a message
that they really don't want any trace of in their index.

In that case, the natural thing to do is:

   notmuch reindex --decrypt=false id:whatever@example.biz

But of course, clearing the cleartext index without clearing the
stashed session key is just silly.  So we do the expected thing and
also destroy any stashed session keys while we're destroying the index
of the cleartext.

Note that stashed session keys are stored in the xapian database, but
xapian does not currently allow safe deletion (see
https://trac.xapian.org/ticket/742).

As a workaround, after removing session keys and cleartext material
from the database, the user probably should do something like "notmuch
compact" to try to purge whatever recoverable data is left in the
xapian freelist.  This problem really needs to be addressed within
xapian, though, if we want it fixed right.
2017-12-08 08:08:47 -04:00
Daniel Kahn Gillmor
4dfcc8c9b2 crypto: index encrypted parts when indexopts try_decrypt is set.
If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.

If we can decrypt, we add the property index.decryption=success.

If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.

Note that a single message may have both values of the
"index.decryption" property: "success" and "failure".  For example,
consider a message that includes multiple layers of encryption.  If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").

Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing.  This means it will subsequently
correspond to the actual semantics of the stored index.
2017-10-21 19:53:19 -03:00
Daniel Kahn Gillmor
0bb05ff693 reindex: drop all properties named with prefix "index."
This allows us to create new properties that will be automatically set
during indexing, and cleared during re-indexing, just by choice of
property name.
2017-10-21 19:53:08 -03:00
Jani Nikula
008a5e92eb lib: convert notmuch_bool_t to stdbool internally
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
2017-10-09 22:27:16 -03:00
David Bremner
debfae20db lib: enforce that n_message_reindex takes headers from first file
This is still a bit stopgap to be only choosing one set of headers,
but this seems like a more defensible set of headers to choose.
2017-09-05 21:51:57 -03:00
David Bremner
0a40ea4b48 lib: add notmuch_message_has_maildir_flag
I considered a higher level interface where the caller passes a tag
name rather than a flag character, but the role of the "unread" tag is
particularly confusing with such an interface.
2017-08-29 21:56:21 -03:00
David Bremner
8a8fb39b0c lib/message: split n_m_maildir_flags_tags, store maildir flags
In a future commit this will allow querying maildir flags seperately
from tags to allow resolving certain conflicts.
2017-08-29 21:51:10 -03:00
Daniel Kahn Gillmor
eb232ee0ab reindex: drop notmuch_param_t, use notmuch_indexopts_t instead
There are at least three places in notmuch that can trigger an
indexing action:

 * notmuch new
 * notmuch insert
 * notmuch reindex

I have plans to add some indexing options (e.g. indexing the cleartext
of encrypted parts, external filters, automated property injection)
that should properly be available in all places where indexing
happens.

I also want those indexing options to be exposed by (and constrained
by) the libnotmuch C API.

This isn't yet an API break because we've never made a release with
notmuch_param_t.

These indexing options are relevant in the listed places (and in the
libnotmuch analogues), but they aren't relevant in the other kinds of
functionality that notmuch offers (e.g. dump/restore, tagging, search,
show, reply).

So i think a generic "param" object isn't well-suited for this case.
In particular:

 * a param object sounds like it could contain parameters for some
   other (non-indexing) operation.  This sounds confusing -- why would
   i pass non-indexing parameters to a function that only does
   indexing?

 * bremner suggests online a generic param object would actually be
   passed as a list of param objects, argv-style.  In this case (at
   least in the obvious argv implementation), the params might be some
   sort of generic string.  This introduces a problem where the API of
   the library doesn't grow as new options are added, which means that
   when code outside the library tries to use a feature, it first has
   to test for it, and have code to handle it not being available.
   The indexopts approach proposed here instead makes it clear at
   compile time and at dynamic link time that there is an explicit
   dependency on that feature, which allows automated tools to keep
   track of what's needed and keeps the actual code simple.

My proposal adds the notmuch_indexopts_t as an opaque struct, so that
we can extend the list of options without causing ABI breakage.

The cost of this proposal appears to be that the "boilerplate" API
increases a little bit, with a generic constructor and destructor
function for the indexopts struct.

More patches will follow that make use of this indexopts approach.
2017-08-23 07:55:12 -03:00
Daniel Kahn Gillmor
5b93fa6e70 lib: add notmuch_message_reindex
This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
2017-08-01 21:17:47 -04:00
David Bremner
34d7753992 lib: add _notmuch_message_remove_indexed_terms
Testing will be provided via use in notmuch_message_reindex
2017-08-01 21:17:47 -04:00
David Bremner
8a8e2b11c2 lib: add notmuch_message_count_files
This operation is relatively inexpensive, as the needed metadata is
already computed by our lazy metadata fetching. The goal is to support
better UI for messages with multipile files.
2017-08-01 21:17:47 -04:00
David Bremner
c040464a7c lib: wrap use of g_mime_utils_header_decode_date
This changes return type in gmime 3.0
2017-07-14 21:23:52 -03:00
Jani Nikula
30c475c1ef build: visibility=default for library structs is no longer needed
Commit d5523ead90 ("Mark some structures in the library interface
with visibility=default attribute.") fixed some mixed visibility
issues with structs. With the symbol default visibility reversed, this
is no longer a problem.
2017-05-13 08:38:18 -03:00