If you're going to store the cleartext index of an encrypted message,
in most situations you might just as well store the session key.
Doing this storage has efficiency and recoverability advantages.
Combined with a schedule of regular OpenPGP subkey rotation and
destruction, this can also offer security benefits, like "deletable
e-mail", which is the store-and-forward analog to "forward secrecy".
But wait, i hear you saying, i have a special need to store cleartext
indexes but it's really bad for me to store session keys! Maybe
(let's imagine) i get lots of e-mails with incriminating photos
attached, and i want to be able to search for them by the text in the
e-mail, but i don't want someone with access to the index to be
actually able to see the photos themselves.
Fret not, the next patch in this series will support your wacky
uncommon use case.
There are some situations where the user wants to get rid of the
cleartext index of a message. For example, if they're indexing
encrypted messages normally, but suddenly they run across a message
that they really don't want any trace of in their index.
In that case, the natural thing to do is:
notmuch reindex --decrypt=false id:whatever@example.biz
But of course, clearing the cleartext index without clearing the
stashed session key is just silly. So we do the expected thing and
also destroy any stashed session keys while we're destroying the index
of the cleartext.
Note that stashed session keys are stored in the xapian database, but
xapian does not currently allow safe deletion (see
https://trac.xapian.org/ticket/742).
As a workaround, after removing session keys and cleartext material
from the database, the user probably should do something like "notmuch
compact" to try to purge whatever recoverable data is left in the
xapian freelist. This problem really needs to be addressed within
xapian, though, if we want it fixed right.
The new "auto" decryption policy is not only good for "notmuch show"
and "notmuch reindex". It's also useful for indexing messages --
there's no good reason to not try to go ahead and index the cleartext
of a message that we have a stashed session key for.
This change updates the defaults and tunes the test suite to make sure
that they have taken effect.
In our consolidation of _notmuch_crypto_decrypt, the callers lost
track a little bit of whether any actual decryption was attempted.
Now that we have the more-subtle "auto" policy, it's possible that
_notmuch_crypto_decrypt could be called without having any actual
decryption take place.
This change lets the callers be a little bit smarter about whether or
not any decryption was actually attempted.
This new automatic decryption policy should make it possible to
decrypt messages that we have stashed session keys for, without
incurring a call to the user's asymmetric keys.
Future patches in this series will introduce new policies; this merely
readies the way for them.
We also convert --try-decrypt to a keyword argument instead of a boolean.
the command-line interface for indexing (reindex, new, insert) used
--try-decrypt; and the configuration records used index.try_decrypt.
But by comparison with "show" and "reply", there doesn't seem to be
any reason for the "try" prefix.
This changeset adjusts the command-line interface and the
configuration interface.
For the moment, i've left indexopts_{set,get}_try_decrypt alone. The
subsequent changeset will address those.
When doing any decryption, if the notmuch database knows of any
session keys associated with the message in question, try them before
defaulting to using default symmetric crypto.
This changeset does the primary work in _notmuch_crypto_decrypt, which
grows some new parameters to handle it.
The primary advantage this patch offers is a significant speedup when
rendering large encrypted threads ("notmuch show") if session keys
happen to be cached.
Additionally, it permits message composition without access to
asymmetric secret keys ("notmuch reply"); and it permits recovering a
cleartext index when reindexing after a "notmuch restore" for those
messages that already have a session key stored.
Note that we may try multiple decryptions here (e.g. if there are
multiple session keys in the database), but we will ignore and throw
away all the GMime errors except for those that come from last
decryption attempt. Since we don't necessarily know at the time of
the decryption that this *is* the last decryption attempt, we'll ask
for the errors each time anyway.
This does nothing if no session keys are stashed in the database,
which is fine. Actually stashing session keys in the database will
come as a subsequent patch.
This flag should make it easier to write the code for session-key
handling.
Note that this only works for GMime 2.6.21 and later (the session key
interface wasn't available before then). It should be fine to build
the rest of notmuch if this functionality isn't available.
Note that this also adds the "session_key" built_with() aspect to
libnotmuch.
We will use this centralized function to consolidate the awkward
behavior around different gmime versions.
It's only invoked from two places: mime-node.c's
node_decrypt_and_verify() and lib/index.cc's
_index_encrypted_mime_part().
However, those two places have some markedly distinct logic, so the
interface for this _notmuch_crypto_decrypt function is going to get a
little bit clunky. It's worthwhile, though, for the sake of keeping
these #if directives reasonably well-contained.
By default, notmuch won't try to decrypt on indexing. With this
patch, we make it possible to indicate a per-database preference using
the config variable "index.try_decrypt", which by default will be
false.
At indexing time, the database needs some way to know its internal
defaults for how to index encrypted parts. It shouldn't be contingent
on an external config file (since that can't be retrieved from the
database object itself), so we store it in the database.
This behaves similarly to the query.* configurations, which are also
stored in the database itself, so we're not introducing any new
dependencies by requiring that it be stored in the database.
If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.
If we can decrypt, we add the property index.decryption=success.
If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.
Note that a single message may have both values of the
"index.decryption" property: "success" and "failure". For example,
consider a message that includes multiple layers of encryption. If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").
Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing. This means it will subsequently
correspond to the actual semantics of the stored index.
This allows us to create new properties that will be automatically set
during indexing, and cleared during re-indexing, just by choice of
property name.
This is currently mostly a wrapper around _notmuch_crypto_t that keeps
its internals private and doesn't expose any of the GMime API.
However, non-crypto indexing options might also be added later
(e.g. filters or other transformations).
Subsequent patches will introduce a convention that properties whose
name starts with "index." will be stripped (and possibly re-added)
during re-indexing. This patch lays the groundwork for doing that.
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
This is a logical followup to "lib: index the content type of
signature parts", which will make it easier to record the message
structure of all messages.
It's useful (*) to be able to easily find messages with certain types
of signatures. Having the mimetype: prefix searches fail for some
content types is also genuinely surprising (*). Index the content type
of signature parts.
While at it, switch to the gmime convenience constants for content and
signature part indexes.
*) At least for developers of email software!
'g_object_newv' is deprecated, and prints annoying warnings. The
warnings suggest using 'g_object_new_with_properties', but that's only
available since glib 2.55 (i.e. a month ago as of this writing).
Since we don't actuall pass any properties, it seems we can just call
'g_object_new'.
I considered a higher level interface where the caller passes a tag
name rather than a flag character, but the role of the "unread" tag is
particularly confusing with such an interface.
There are at least three places in notmuch that can trigger an
indexing action:
* notmuch new
* notmuch insert
* notmuch reindex
I have plans to add some indexing options (e.g. indexing the cleartext
of encrypted parts, external filters, automated property injection)
that should properly be available in all places where indexing
happens.
I also want those indexing options to be exposed by (and constrained
by) the libnotmuch C API.
This isn't yet an API break because we've never made a release with
notmuch_param_t.
These indexing options are relevant in the listed places (and in the
libnotmuch analogues), but they aren't relevant in the other kinds of
functionality that notmuch offers (e.g. dump/restore, tagging, search,
show, reply).
So i think a generic "param" object isn't well-suited for this case.
In particular:
* a param object sounds like it could contain parameters for some
other (non-indexing) operation. This sounds confusing -- why would
i pass non-indexing parameters to a function that only does
indexing?
* bremner suggests online a generic param object would actually be
passed as a list of param objects, argv-style. In this case (at
least in the obvious argv implementation), the params might be some
sort of generic string. This introduces a problem where the API of
the library doesn't grow as new options are added, which means that
when code outside the library tries to use a feature, it first has
to test for it, and have code to handle it not being available.
The indexopts approach proposed here instead makes it clear at
compile time and at dynamic link time that there is an explicit
dependency on that feature, which allows automated tools to keep
track of what's needed and keeps the actual code simple.
My proposal adds the notmuch_indexopts_t as an opaque struct, so that
we can extend the list of options without causing ABI breakage.
The cost of this proposal appears to be that the "boilerplate" API
increases a little bit, with a generic constructor and destructor
function for the indexopts struct.
More patches will follow that make use of this indexopts approach.
We need a way to pass parameters to the indexing functionality on the
first index, not just on reindexing. The obvious place is in
notmuch_database_add_message. But since modifying the argument list
would break both API and ABI, we needed a new name.
I considered notmuch_database_add_message_with_params(), but the
functionality we're talking about doesn't always add a message. It
tries to index a specific file, possibly adding a message, but
possibly doing other things, like adding terms to an existing message,
or failing to deal with message objects entirely (e.g. because the
file didn't contain a message).
So i chose the function name notmuch_database_index_file.
I confess i'm a little concerned about confusing future notmuch
developers with the new name, since we already have a private
_notmuch_message_index_file function, and the two do rather different
things. But i think the added clarity for people linking against the
future libnotmuch and the capacity for using index parameters makes
this a worthwhile tradeoff. (that said, if anyone has another name
that they strongly prefer, i'd be happy to go with it)
This changeset also adjusts the tests so that we test whether the new,
preferred function returns bad values (since the deprecated function
just calls the new one).
We can keep the deprecated n_d_add_message function around as long as
we like, but at the next place where we're forced to break API or ABI
we can probably choose to drop the name relatively safely.
NOTE: there is probably more cleanup to do in the ruby and go bindings
to complete the deprecation directly. I don't know those languages
well enough to attempt a fix; i don't know how to test them; and i
don't know the culture around those languages about API additions or
deprecations.
Stripping trailing character is not that uncommon
operation. Particularly, the next patch has to perform it as
well. Lets move it to the separate function to avoid code duplication.
Also the new function has a little improvement: if the character to
strip is repeated several times in the end of a string, function
strips them all.
Signed-off-by: Yuri Volchkov <yuri.volchkov@gmail.com>
Since we're accumulating the index when we add a new file to the
message, the semantics have slightly changed. This tries to align the
documentation with the actual functionality.
This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
This operation is relatively inexpensive, as the needed metadata is
already computed by our lazy metadata fetching. The goal is to support
better UI for messages with multipile files.
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything. Values on the other hand get
overwritten, which is a bit annoying, but arguably it is not worse to
take the values (from, subject, date) from the last file indexed
rather than the first.
This is really pure C string parsing, and doesn't need to be mixed in
with the Xapian/C++ layer. Although not strictly necessary, it also
makes it a bit more natural to call _parse_message_id from multiple
compilation units.
The switch is easier to understand than the side effects in the if
test. It also potentially allows us more flexibility in breaking up
this function into smaller pieces, since passing private_status around
is icky.
'database.cc' is becoming a monster, and it's hard to follow what the
various static functions are used for. It turns out that about 1/3 of
this file notmuch_database_add_message and helper functions not used
by any other function. This commit isolates this code into it's own
file.
Some side effects of this refactoring:
- find_doc_ids becomes the non-static (but still private)
_notmuch_database_find_doc_ids
- a few instances of 'string' have 'std::' prepended, avoiding the
need for 'using namespace std;' in the new file.
We need to rewrite the loop for gmime-3.0; move the loop body to its
own function to avoid code duplication. Keep the common exit via
"goto DONE" to make this pure code movement. It's important to note
that the existing exit path only deallocates the iterator.
We want to reuse the scanner definition with a different table. This
is mainly code movement, and making the state table part of the filter
struct/class.
Commit d5523ead90 ("Mark some structures in the library interface
with visibility=default attribute.") fixed some mixed visibility
issues with structs. With the symbol default visibility reversed, this
is no longer a problem.
The dynamic generation of the linker version script for libnotmuch
exports has grown rather complicated.
Reverse the visibility control by hiding symbols by default using
-fvisibility=hidden, and explicitly exporting symbols in notmuch.h
using #pragma GCC visibility. (We could also use __attribute__
((visibility ("default"))) for each exported function, but the pragma
is more convenient.)
The above is not quite enough alone, as it would "leak" a number of
weak symbols from Xapian and C++ standard library. Combine it with a
small static version script that filters out everything except the
notmuch_* symbols that we explicitly exposed, and the C++ RTTI
typeinfo symbols for exception handling.
Finally, as the symbol hiding test can no longer look at the generated
symbol table, switch the test to parse the functions from notmuch.h.
Commits 9db2145272 ("lib/gen-version-script.h: add getline and
getdelim to notmuch.sym if needed") and 3242e29e57 ("build: add
canonicalize_file_name to symbols exported from libnotmuch.so")
started exporting compat functions from libnotmuch so that the cli
could use them. But we shouldn't export such functions from the
library. They are not part of our ABI. Instead, the cli should include
its own copies of the compat functions.
From a UI perspective this looks similar to what was already provided
for from, subject, and mid, but the implementation is quite
different. It uses the database's list of terms to construct a term
based query equivalent to the passed regular expression.
The index(3) function has been deprecated in POSIX since 2001 and
removed in 2008, and most code in notmuch already calls strchr(3).
This fixes a compilation error on Android whose libc does not have
index(3).
The non-field processor behaviour is is convert the corresponding
queries into a search for the unprefixed terms. This yields pretty
surprising results so I decided to generate a query that would match
the terms (i.e. none with that prefix) generated for an empty header.
The argument is that if the string passed to the field processor has
no spaces, then the added quotes won't have any benefit except for
disabling wildcards. But disabling wildcards doesn't seem very useful
in the normal Xapian query parser, since they're stripped before
generating terms anyway. It does mean that the query 'from:"foo*"' will
not be precisely equivalent to 'from:foo' as it is for the non
field-processor version.
This function was deprecated in notmuch 0.21. We re-use the name for
a status returning version, and deprecate the _st name. One or two
remaining uses of the (removed) non-status returning version fixed at
the same time
This function was deprecated in notmuch 0.21. We finally remove the
deprecated API, and rename the status returning version to the simpler
name. The status returning is kept as a deprecated alias.
Apparently some systems (MacOS?) have a system library called libutil
and the name conflict causes problems. Since this library is quite
notmuch specific, rename it to something less generic.
The object where pointer to `data` was received was deleted before
it was used in _notmuch_string_list_append().
Relevant Coverity messages follow:
3: extract
Assigning: data = std::__cxx11::string(message->doc.()).c_str(),
which extracts wrapped state from temporary of type std::__cxx11::string.
4: dtor_free
The internal representation of temporary of type std::__cxx11::string
is freed by its destructor.
5: use after free:
Wrapper object use after free (WRAPPER_ESCAPE)
Using internal representation of destroyed object local data.
For reasons not completely understood at this time, gmime (as of
2.6.22) is returning a date before 1900 on bad date input. Since this
confuses some other software, we clamp such dates to 0,
i.e. 1970-01-01.
Remove incorrect skipping to first match from init(), and add explicit
skip_to() and check() methods to work around xapian-core bug (the
check() method will also improve speed when filtering by one of
these).
We filter added exclude at add time, rather than modifying the query by
count search. As noted in the comments, there are several ignored
conditions here.
The main goal is to prepare the way for non-destructive (or at least
less destructive) exclude tag handling. It does this by having a
pre-parsed query available for further processing. This also allows us
to provide slightly more precise error messages.
Fix warning caught by clang:
lib/regexp-fields.cc:41:2: warning: 'delete' applied to a pointer that was allocated
with 'new[]'; did you mean 'delete[]'? [-Wmismatched-new-delete]
delete buffer;
^
[]
lib/regexp-fields.cc:37:17: note: allocated with 'new[]' here
char *buffer = new char[len];
^
mid: is the url scheme suggested by URL 2392. We also plan to
introduce more flexible searches for mid: than are possible with
id: (in order not to break assumptions about the special behaviour of
id:, e.g. identifying at most one message).
the idea is that you can run
% notmuch search subject:/<your-favourite-regexp>/
% notmuch search from:/<your-favourite-regexp>/
or
% notmuch search subject:"your usual phrase search"
% notmuch search from:"usual phrase search"
This feature is only available with recent Xapian, specifically
support for field processors is needed.
It should work with bindings, since it extends the query parser.
This is easy to extend for other value slots, but currently the only
value slots are date, message_id, from, subject, and last_mod. Date is
already searchable; message_id is left for a followup commit.
This was originally written by Austin Clements, and ported to Xapian
field processors (from Austin's custom query parser) by yours truly.
The retries are hardcoded to a small number, and error handling aborts
than propagating errors from notmuch_database_reopen. These are both
somewhat justified by the assumption that most things that can go
wrong in Xapian::Database::reopen are rare and fatal. Here's the brief
discussion with Xapian upstream:
24-02-2017 08:12:57 < bremner> any intuition about how likely
Xapian::Database::reopen is to fail? I'm catching a
DatabaseModifiedError somewhere where handling any further errors is
tricky, and wondering about treating a failed reopen as as "the
impossible happened, stopping"
24-02-2017 16:22:34 < olly> bremner: there should not be much scope for
failure - stuff like out of memory or disk errors, which are probably a
good enough excuse to stop
The two g_hash_table functions (insert, add) have different behaviour
with respect to existing keys. g_hash_table_insert frees the new key,
while g_hash_table_add (which is really g_hash_table_replace in
disguise) frees the existing key. With this change 'ref' is live until
the end of the function (assuming single-threaded access to
'hash'). We can't guarantee it will continue to be live in the
future (i.e. there may be a future key duplication) so we copy it with
the allocation context passed to parse_references (in practice this is
the notmuch_message_t object whose parents we are finding).
Thanks to Tomi for the simpler approach to the problem based on
reading the fine glib manual.
Replace multiple tables with some flags in a single table. This makes
the code in notmuch_database_open_verbose a bit shorter, and it should
also make it easier to add other options to fields, e.g. regexp
searching.
From #xapian
olly> bremner: btw, i noticed notmuch count see ms to request all the documents and then ignores them
bremner> hmm. There's something funny about the way that notmuch uses matches in general iirc
olly> it should be able to do: mset = enquire.get_mset (0, 0, notmuch->xapian_db->get_doccount ());
...
olly> get_matches_estimated() will be exact because check_at_least is the size of the database
We already depend on glib both directly and indirectly (via gmime). We
might as well make use of its facilities. Drop the embedded libsha1
and use glib for sha1 digests.
The todo comment got separated from the status it's related to at
commit 3f32fd8a1c ("Add missing comment for
NOTMUCH_STATUS_READONLY_DATABASE."). Later, commit b65ca8e0ba ("lib:
modify notmuch.h for automatic document generation") moved it, but to
the wrong place. Fix the location.
It seems that no-one tried to compile without Xapian compact support
since March of 2015, since that's when I introduced a syntax error in
that branch of the ifdef.
Given the choice of maintaining this underused branch of code, or
bumping the Xapian dependency to a version from 2011, it seems
reasonable to do the latter.
This should not change the SONAME, and therefore won't change the
dynamic linking behaviour, but it may help some users debug missing
symbols in case their libnotmuch is too old.
This is needed so that when the map is modified during traversal, and
thus unlinked by the database code, the map is not disposed of until the
iterator is done with it.