As reported by Sean Whitton, there are mailers (in particular the
Debian Bug Tracking System) that have sensible In-Reply-To headers,
but un-useful-for-notmuch References (in particular with the BTS, the
oldest reference is last). I looked at a sample of about 200K
messages, and only about 0.5% these had something other than a single
message-id in In-Reply-To. On this basis, if we see a single
message-id in In-Reply-To, consider that as authoritative.
If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.
If we can decrypt, we add the property index.decryption=success.
If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.
Note that a single message may have both values of the
"index.decryption" property: "success" and "failure". For example,
consider a message that includes multiple layers of encryption. If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").
Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing. This means it will subsequently
correspond to the actual semantics of the stored index.
This is currently mostly a wrapper around _notmuch_crypto_t that keeps
its internals private and doesn't expose any of the GMime API.
However, non-crypto indexing options might also be added later
(e.g. filters or other transformations).
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
There are at least three places in notmuch that can trigger an
indexing action:
* notmuch new
* notmuch insert
* notmuch reindex
I have plans to add some indexing options (e.g. indexing the cleartext
of encrypted parts, external filters, automated property injection)
that should properly be available in all places where indexing
happens.
I also want those indexing options to be exposed by (and constrained
by) the libnotmuch C API.
This isn't yet an API break because we've never made a release with
notmuch_param_t.
These indexing options are relevant in the listed places (and in the
libnotmuch analogues), but they aren't relevant in the other kinds of
functionality that notmuch offers (e.g. dump/restore, tagging, search,
show, reply).
So i think a generic "param" object isn't well-suited for this case.
In particular:
* a param object sounds like it could contain parameters for some
other (non-indexing) operation. This sounds confusing -- why would
i pass non-indexing parameters to a function that only does
indexing?
* bremner suggests online a generic param object would actually be
passed as a list of param objects, argv-style. In this case (at
least in the obvious argv implementation), the params might be some
sort of generic string. This introduces a problem where the API of
the library doesn't grow as new options are added, which means that
when code outside the library tries to use a feature, it first has
to test for it, and have code to handle it not being available.
The indexopts approach proposed here instead makes it clear at
compile time and at dynamic link time that there is an explicit
dependency on that feature, which allows automated tools to keep
track of what's needed and keeps the actual code simple.
My proposal adds the notmuch_indexopts_t as an opaque struct, so that
we can extend the list of options without causing ABI breakage.
The cost of this proposal appears to be that the "boilerplate" API
increases a little bit, with a generic constructor and destructor
function for the indexopts struct.
More patches will follow that make use of this indexopts approach.
We need a way to pass parameters to the indexing functionality on the
first index, not just on reindexing. The obvious place is in
notmuch_database_add_message. But since modifying the argument list
would break both API and ABI, we needed a new name.
I considered notmuch_database_add_message_with_params(), but the
functionality we're talking about doesn't always add a message. It
tries to index a specific file, possibly adding a message, but
possibly doing other things, like adding terms to an existing message,
or failing to deal with message objects entirely (e.g. because the
file didn't contain a message).
So i chose the function name notmuch_database_index_file.
I confess i'm a little concerned about confusing future notmuch
developers with the new name, since we already have a private
_notmuch_message_index_file function, and the two do rather different
things. But i think the added clarity for people linking against the
future libnotmuch and the capacity for using index parameters makes
this a worthwhile tradeoff. (that said, if anyone has another name
that they strongly prefer, i'd be happy to go with it)
This changeset also adjusts the tests so that we test whether the new,
preferred function returns bad values (since the deprecated function
just calls the new one).
We can keep the deprecated n_d_add_message function around as long as
we like, but at the next place where we're forced to break API or ABI
we can probably choose to drop the name relatively safely.
NOTE: there is probably more cleanup to do in the ruby and go bindings
to complete the deprecation directly. I don't know those languages
well enough to attempt a fix; i don't know how to test them; and i
don't know the culture around those languages about API additions or
deprecations.
This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything. Values on the other hand get
overwritten, which is a bit annoying, but arguably it is not worse to
take the values (from, subject, date) from the last file indexed
rather than the first.
This is really pure C string parsing, and doesn't need to be mixed in
with the Xapian/C++ layer. Although not strictly necessary, it also
makes it a bit more natural to call _parse_message_id from multiple
compilation units.
The switch is easier to understand than the side effects in the if
test. It also potentially allows us more flexibility in breaking up
this function into smaller pieces, since passing private_status around
is icky.
'database.cc' is becoming a monster, and it's hard to follow what the
various static functions are used for. It turns out that about 1/3 of
this file notmuch_database_add_message and helper functions not used
by any other function. This commit isolates this code into it's own
file.
Some side effects of this refactoring:
- find_doc_ids becomes the non-static (but still private)
_notmuch_database_find_doc_ids
- a few instances of 'string' have 'std::' prepended, avoiding the
need for 'using namespace std;' in the new file.