This changes some error reporting, either intentionally by reporting
the highest level missing directory, or by side effect from looking in
XDG locations when given null database location.
The main functionality will be tested when notmuch-new is converted to
support split configuration. Here only the somewhat odd case of split
mail root which is actually symlinked to the database path is tested.
Introduce a new configuration value for the mail root, and use it to
locate mail messages in preference to the database.path (which
previously implied the mail messages were also in this location.
Initially only a subset of the CLI is tested in a split
configuration. Further changes will be needed for the remainder of the
CLI to work in split configurations.
The idea is to allow reuse in n_d_create_with_config. This is
primarily code movement, with some changes in error messages to reduce
the number of input parameters.
This is slightly more tidy, but more importantly it allows for re-use
of this code in n_d_create_with_config. That re-use will be crucial
when we no longer call n_d_open_with_config from
n_d_create_with_config.
This removes duplication between the struct element and the
configuration string_map entry. Create a simple wrapper for setting
the database path that makes sure the trailing / is stripped.
In the future Xapian will apparently support this more conveniently
for the cases other than READ_ONLY => READ_ONLY
Conceptually this function seems to fit better in lib/open.cc;
database.cc is still large enough that moving the function makes
sense.
This will allow re-opening in a different mode (read/write
vs. read-only) with current Xapian API. It will also prove useful when
updating the compact functions to support more flexible database
location.
Include the (currently unused) mode argument which will specify which
mode to re-open the database in. Functionality and docs to be
finalized in a followup commit.
Based on a patch from Michael J Gruber [1]. As of glib 2.67 (more
specifically [2]), including "gmime-extra.h" inside an extern "C"
block causes build failures, because glib is using C++ features.
Observing that "gmime-extra.h" is no longer needed in
notmuch-private.h, which can simply delete that include, but
we have to correspondingly move the includes which might include
it (in particular crypto.h) out of the extern "C" block also.
This seems less fragile than only moving gmime-extra, and relying on
preprocessor sentinels to keep the deeper includes from happening.
Move to the include to the outside of the extern block.
[1]: id:aee618a3d41f7889a7449aa16893e992325a909a.1613055071.git.git@grubix.eu
[2]: https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1715
The hook directory configuration needs to be kept in synch with the
other configuration information, so add scaffolding to support this at
database opening time.
This will allow client code to provide more meaningful diagnostics. In
particular it will enable "notmuch new" to continue suggsting the user
run "notmuch setup" to create a config after "notmuch new" is
transitioned to the new configuration framework.
By using an enum we can have better error detection than copy pasting
key strings around.
The question of what layer this belongs in is a bit
tricky. Historically most of the keys are defined by the CLI. On the
other hand features like excludes are supported in the
library/bindings, and it makes sense to configure them from the
library as well.
The somewhat long prefix for notmuch_config_t is to avoid collisions
with the existing usage in notmuch-client.h.
Fill in the remainder of the documented functionality for
n_d_open_with_config with respect to config file location. Similar
searching default locations of the database file still needs to be
added.
The main goal is to allow configuration information to be temporarily
overridden by a separate config file. That will require further
changes not in this commit.
The performance impact is unclear, and will depend on the balance
between number of queries and number of distinct metadata items read
on the first call to n_d_get_config.
database.cc is uncomfortably large, and some of the static data
structures do not need to be shared as much as they are.
This is a somewhat small piece to factor out, but it will turn out to
be helpful to further refactoring.
As diagnosed by Olivier Taïbi in
id:20201027100916.emry3k2wujod4xnl@galois.lan, if an exception is
thrown while the initialization is happening (e.g. if the function is
called on a closed database), then the destructor is (sometimes)
invoked on an uninitialized Xapian object.
Solve the problem by moving the setting of the destructor until after
the placement new successfully completes. It is conceivable this might
cause a memory leak, but that seems preferable to crashing, and in any
case, there seems to be nothing better to be done if the
initialization is failing things are in an undefined state by
definition.
Use `makefile-gmake-mode' instead of `makefile-mode' because the
former also highlights ifdef et al. while the latter does not.
"./Makefile.global" and one "Makefile.local" failed to specify any
major mode at all but doing so is necessary because Emacs does not
automatically figure out that these are Makefiles (of any flavor).
static_cast is a bit tricky to understand and error prone, so add a
second pointer to (potentially the same) Xapian database object that
we know has the right subclass.
I'm not sure what the point of modifying that right before destroying
the object is. In a future commit I want to remove that element of the
object, so simplify that task.
The API docs promise to handle relative filenames, but the code did
not do it.
Also check for files outside the mail root, as implied by the API
description.
This fixes the bug reported at
id:87sgdqo0rz.fsf@tethera.net
In order to mimic the "best effort" API of Xapian to provide
information from a closed database when possible, do not
destroy the Xapian database object too early.
Because the pointer to a Xapian database is no longer nulled on close,
introduce a flag to track whether the notmuch database is open or not.
The original generic handler had an extra '%s' in the format
string. Update tests that failed to catch this because the template to
print status strings checked 'stat', which was not set.
As a side effect, we revert the switch from notmuch_bool_t to bool
here. This is because those two types are not actually compatible when
passing by reference.
It's not very nice to return FALSE for an error, so provide
notmuch_message_get_flag_st as a migration path.
Bump LIBNOTMUCH_MINOR_VERSION because the API is extended.
Currently I don't know of a good way of testing this, but at least in
principle a Xapian exception in _notmuch_message_{add,remove}_term
would cause an abort in the library.
This should not change functionality, but does slightly reduce code
duplication. Perhaps more importantly it allows consistent changes to
all of the similar exception handling in message.cc.
This will be mandatory as of Xapian 1.5. The API is also more
consistent with the FieldProcessor API, which helps code re-use a bit.
Note that this switches to using the built-in Xapian support for
prefixes on ranges (i.e. deleted code at beginning of
ParseTimeRangeProcessor::operator(), added prefix to constructor).
Another side effect of the migration is that we are generating smaller
queries, using one OP_VALUE_RANGE instead of an AND of two OP_VALUE_*
queries.
As we prepare to handle S/MIME-encrypted PKCS#7 EnvelopedData (which
is not multipart), we don't want to be limited to passing only
GMimeMultipartEncrypted MIME parts to _notmuch_crypto_decrypt.
There is no functional change here, just a matter of adjusting how we
pass arguments internally.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When we are indexing, we should treat SignedData parts the same way
that we treat a multipart object, indexing the wrapped part as a
distinct MIME object.
Unfortunately, this means doing some sort of cryptographic
verification whose results we throw away, because GMime doesn't offer
us any way to unwrap without doing signature verification.
I've opened https://github.com/jstedfast/gmime/issues/67 to request
the capability from GMime but for now, we'll just accept the
additional performance hit.
As we do this indexing, we also apply the "signed" tag, by analogy
with how we handle multipart/signed messages. These days, that kind
of change should probably be done with a property instead, but that's
a different set of changes. This one is just for consistency.
Note that we are currently *only* handling signedData parts, which are
basically clearsigned messages. PKCS#7 parts can also be
envelopedData and authEnvelopedData (which are effectively encryption
layers), and compressedData (which afaict isn't implemented anywhere,
i've never encountered it). We're laying the groundwork for indexing
these other S/MIME types here, but we're only dealing with signedData
for now.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
strncmp looks for a prefix that matches, which is very much not what
we want here. This fixes the bug reported by Franz Fellner in
id:1588595993-ner-8.651@TPL520
Xapian 1.4 is over 3 years old now (1.4.0 released 2016-06-24),
and 1.2 has been deprecated in Notmuch version 0.27 (2018-06-13).
Xapian 1.4 supports compaction, field processors and retry locking;
conditionals checking compaction and field processors were removed
but user may want to disable retry locking at configure time so it
is kept.
Apparently doxygen needs its comments formatted in a specific way to
notice that the group is closed.
Without this fix, with doxygen 1.8.16-2 we see:
```
doxygen ./doc/doxygen.cfg
…/notmuch/lib/notmuch.h:2322: warning: end of file while inside a group
```
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The documentation for notmuch_config_list_key warns that that the
returned value will be destroyed by the next call to
notmuch_config_list_key, but it neglected to mention that calling
notmuch_config_list_value would also destroy it (by calling
notmuch_config_list_key). This is surprising, and caused a use after
free bug in _setup_user_query_fields (first noticed by an OpenBSD
porter, so kudos to the OpenBSD malloc implementation). This change
fixes that use-after-free bug.
When encountering a message that has been mangled in the "mixed up"
way by an intermediate MTA, notmuch should instead repair it and index
the repaired form.
When it does this, it also associates the index.repaired=mixedup
property with the message. If a problem is found with this repair
process, or an improved repair process is proposed later, this should
make it easy for people to reindex the relevant message. The property
will also hopefully make it easier to diagnose this particular problem
in the future.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When we notice a legacy-display part during indexing, it makes more
sense to avoid indexing it as part of the message body.
Given that the protected subject will already be indexed, there is no
need to index this part at all, so we skip over it.
If this happens during indexing, we set a property on the message:
index.repaired=skip-protected-headers-legacy-display
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Our _notmuch_message_crypto_potential_payload implementation could
only return a failure if bad arguments were passed to it. It is an
internal function, so if that happens it's an entirely internal bug
for notmuch.
It will be more useful for this function to return whether or not the
part is in fact a cryptographic payload, so we dispense with the
status return.
If some future change suggests adding a status return back, there are
only a handful of call sites, and no pressure to retain a stable API,
so it could be changed easily. But for now, go with the simpler
function.
We will use this return value in future patches, to make different
decisions based on whether a part is the cryptographic payload or not.
But for now, we just leave the places where it gets invoked marked
with (void) to show that the result is ignored.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This adds no functionality directly, but is a useful starting point
for adding new repair functionality.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When indexing the cleartext of an encrypted message, record any
protected subject in the database, which should make it findable and
visible in search.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This should not change the indexing process yet as nothing calls
_notmuch_message_gen_terms with a user prefix name. On the other hand,
it should not break anything either.
_notmuch_database_prefix does a linear walk of the list of (built-in)
prefixes, followed by a logarithmic time search of the list of user
prefixes. The latter is probably not really noticable.
This will be used to avoid needing a database access to resolve a db
prefix from the corresponding UI prefix (e.g. when indexing). Arguably
the setup of the separate header map does not belong here, since it is
about indexing rather than querying, but we currently don't have any
other indexing setup to do.
Previously this functioned scanned every term attached to a given
Xapian document. It turns out we know how to read only the terms we
need to preserve (and we might have already done so). This commit
replaces many calls to Xapian::Document::remove_term with one call to
::clear_terms, and a (typically much smaller) number of calls to
::add_term. Roughly speaking this is based on the assumption that most
messages have more text than they have tags.
According to the performance test suite, this yields a roughly 40%
speedup on "notmuch reindex '*'"
Without this,
$ make time-test OPTIONS=--small
leads to fatal errors from too many open files.
Thanks to st-gourichon-fid for bringing this problem to my attention in IRC.
Rather than storing the lower level stdio FILE object, we store a
GMime stream. This allows both transparent decompression, and passing
the stream into GMime for parsing. As a side effect, we can let GMime
close the underlying OS stream (indeed, that stream isn't visible here
anymore).
This change is enough to get notmuch-{new,search} working, but there is still
some work required for notmuch-show, to be done in a following commit.
This is a functional change, not a straight translation, because we
are no longer directly invoking g_mime_parser_options_get_default(),
but the GMime source has indicated that the options parameter for
g_mime_parser_construct_message() is "nullable" since upstream commit
d0ebdd2ea3e6fa635a2a551c846e9bc8b6040353 (which itself precedes GMime
3.0).
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Several GMime 2.6 functions sprouted a change in the argument order in
GMime 3.0. We had a compatibility layer here to be able to handle
compiling against both GMime 2.6 and 3.0. Now that we're using 3.0
only, rip out the compatibility layer for those functions with changed
argument lists, and explicitly use the 3.0 argument lists.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Several of these #defines were not actually used in the notmuch
codebase any longer. And as of GMime 3.0, g_mime_init takes no
arguments, so we can also drop the bogus RFC2047 argument that we were
passing and then #defining away.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This means dropping GMimeCryptoContext and notmuch_config arguments.
All the argument changes are to internal functions, so this is not an
API or ABI break.
We also get to drop the #define for g_mime_3_unused.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
In _index_mime_part, we don't need to extract the content-type from
the part until just before we use it, so we also defer it lazily.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.
Unprefixed query terms should continue to match anywhere (header or
body) in the message.
This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.
Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
The exact error messages returned by regerror() aren't standardized;
relying on them isn't portable. Thus, add a a prefix to make clear that
the subsequent message is a regexp parsing error, and only look for this
prefix in the test suite, ignoring the rest of the message.
I can't figure out how checking the sign of a bool ever worked. The
following program demonstrates the problem (i.e. for me it prints 1).
#include <stdio.h>
#include <stdbool.h>
int main(int argc, char **argv) {
bool x;
x = -1;
printf("x = %d\n", x);
}
This seems to be mandated by the C99 standard 6.3.1.2.
Use explicit labels for GTypeInfo member initializers, rather than
relying on comments and ordering. This is both easier to read, and
harder to screw up. This also makes it clear that we're mis-casting
GObject class initializers for gcc.
Without this patch, g++ 8.2.0-7 produces this warning:
CXX -g -O2 lib/index.o
lib/index.cc: In function ‘GMimeFilter* notmuch_filter_discard_non_term_new(GMimeContentType*)’:
lib/index.cc:252:23: warning: cast between incompatible function types from ‘void (*)(NotmuchFilterDiscardNonTermClass*)’ {aka ‘void (*)(_NotmuchFilterDiscardNonTermClass*)’} to ‘GClassInitFunc’ {aka ‘void (*)(void*, void*)’} [-Wcast-function-type]
(GClassInitFunc) notmuch_filter_discard_non_term_class_init,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The definition of GClassInitFunc in
/usr/include/glib-2.0/gobject/gtype.h suggests that this function will
always be called with the class_data member of the GTypeInfo. We set
that value to NULL in both GObject definitions in notmuch. So we mark
it as explicitly unused.
There is no functional change here, just code cleanup.
As reported by Sean Whitton, there are mailers (in particular the
Debian Bug Tracking System) that have sensible In-Reply-To headers,
but un-useful-for-notmuch References (in particular with the BTS, the
oldest reference is last). I looked at a sample of about 200K
messages, and only about 0.5% these had something other than a single
message-id in In-Reply-To. On this basis, if we see a single
message-id in In-Reply-To, consider that as authoritative.
The idea is that if a message-id parses with this function, the MUA
generating it was probably sane, and in particular it's probably safe
to use the result as a parent from In-Reply-to.
We (finally) implement the XXX comment. It requires a bit of care not
to reparent all of the possible toplevel messages.
_notmuch_messages_has_next is not ready to be a public function yet,
since it punts on the mset case. We know in the one case it is called,
the notmuch_messages_t is just a regular list / iterator.
This is mainly to lay out the structure of the final code. The problem
isn't really solved yet, although some very simple cases are
better (hence the fixed test). We need two passes through the messages
because we need to be careful not to re-parent too many messages and
end up without any toplevel messages.
There is no public notmuch_message_list_t public interface, so to this
is added to the private API. We use it immediately in thread.cc;
future commits will use it further.
For non-root messages, this should not should anything currently, as
the messages are already added in date order. In the future we will
add some non-root messages in a second pass out of order and the
sorting will be useful. It does fix the order of multiple
root-messages (although it is overkill for that).
This is technically an API change, but it is not an ABI change, and
it's merely a statement that limits what the library can do.
This is in parallel to notmuch_query_get_database(), which also takes
a const pointer.
The user can already do this manually, of course, but (a) it's nice to
have a convenience function, and (b) exposing this interface means
that someone more clever with a _notmuch_string_map_t than i am can
write a more efficient version if they like, and it will just
accelerate the users of the convenience function.
We've had _notmuch_message_database() internally for a while, and it's
useful. It turns out to be useful on the other side of the library
interface as well (i'll use it later in this series for "notmuch
show"), so we expose it publicly now.
The observation is that we are only using the messages to get there
thread_id, which is kindof a pessimal access pattern for the current
notmuch_message_get_thread_id
This change allows queries of the form
thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
This is still somewhat brute-force, but it's a big improvement over
both the shell script solution and the previous proposal [1], because it
does not build the whole thread structure just generate a
query. A further potential optimization is to replace the calls to
notmuch with more specialized Xapian code; in particular it's not
likely that reading all of the message metadata is a win here.
[1]: id:20170820213240.20526-1-david@tethera.net
Correct URLs that have crept into the notmuch codebase with http://
when https:// is possible.
As part of this conversion, this changeset also indicates the current
preferred upstream URLs for both gmime and sup. the new URLs are
https-enabled, the old ones are not.
This also fixes T310-emacs.sh, thanks to Bremner for catching it.
At least Fedora28 triggers this Xapian bug due to some toolchain change .
https://bugzilla.redhat.com/show_bug.cgi?id=1546162
The underlying bug is fixed in xapian commit f92e2a936c1592, and
should be fixed in Xapian 1.4.6
We added several new functions, at least
notmuch_database_get_default_indexopts
notmuch_database_index_file
notmuch_indexopts_destroy
notmuch_indexopts_get_decrypt_policy
notmuch_indexopts_set_decrypt_policy
notmuch_message_count_files
notmuch_message_has_maildir_flag
notmuch_message_reindex
notmuch_message_remove_all_properties_with_prefix
notmuch_thread_get_total_files
This change allows queries of the form
thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
This is still somewhat brute-force, but it's a big improvement over
both the shell script solution and the previous proposal [1], because it
does not build the whole thread structure just generate a
query. A further potential optimization is to replace the calls to
notmuch with more specialized Xapian code; in particular it's not
likely that reading all of the message metadata is a win here.
[1]: id:20170820213240.20526-1-david@tethera.net
The current behaviour is at best under-documented. The modified test in
T470-missing-headers.sh previously relied on printf doing the right
thing with NULL, which seems icky.
The use of talloc_strdup here is probably overkill, but it avoids
having to enforce that thread->authors is never mutated outside
_resolve_thread_authors_string.
Here's the configuration choice for people who want a cleartext index,
but don't want stashed session keys.
Interestingly, this "nostash" decryption policy is actually the same
policy that should be used by "notmuch show" and "notmuch reply",
since they never modify the index or database when they are invoked
with --decrypt.
We take advantage of this parallel to tune the behavior of those
programs so that we're not requesting session keys from GnuPG during
"show" and "reply" that we would then otherwise just throw away.
If you're going to store the cleartext index of an encrypted message,
in most situations you might just as well store the session key.
Doing this storage has efficiency and recoverability advantages.
Combined with a schedule of regular OpenPGP subkey rotation and
destruction, this can also offer security benefits, like "deletable
e-mail", which is the store-and-forward analog to "forward secrecy".
But wait, i hear you saying, i have a special need to store cleartext
indexes but it's really bad for me to store session keys! Maybe
(let's imagine) i get lots of e-mails with incriminating photos
attached, and i want to be able to search for them by the text in the
e-mail, but i don't want someone with access to the index to be
actually able to see the photos themselves.
Fret not, the next patch in this series will support your wacky
uncommon use case.
There are some situations where the user wants to get rid of the
cleartext index of a message. For example, if they're indexing
encrypted messages normally, but suddenly they run across a message
that they really don't want any trace of in their index.
In that case, the natural thing to do is:
notmuch reindex --decrypt=false id:whatever@example.biz
But of course, clearing the cleartext index without clearing the
stashed session key is just silly. So we do the expected thing and
also destroy any stashed session keys while we're destroying the index
of the cleartext.
Note that stashed session keys are stored in the xapian database, but
xapian does not currently allow safe deletion (see
https://trac.xapian.org/ticket/742).
As a workaround, after removing session keys and cleartext material
from the database, the user probably should do something like "notmuch
compact" to try to purge whatever recoverable data is left in the
xapian freelist. This problem really needs to be addressed within
xapian, though, if we want it fixed right.
The new "auto" decryption policy is not only good for "notmuch show"
and "notmuch reindex". It's also useful for indexing messages --
there's no good reason to not try to go ahead and index the cleartext
of a message that we have a stashed session key for.
This change updates the defaults and tunes the test suite to make sure
that they have taken effect.
In our consolidation of _notmuch_crypto_decrypt, the callers lost
track a little bit of whether any actual decryption was attempted.
Now that we have the more-subtle "auto" policy, it's possible that
_notmuch_crypto_decrypt could be called without having any actual
decryption take place.
This change lets the callers be a little bit smarter about whether or
not any decryption was actually attempted.
This new automatic decryption policy should make it possible to
decrypt messages that we have stashed session keys for, without
incurring a call to the user's asymmetric keys.
Future patches in this series will introduce new policies; this merely
readies the way for them.
We also convert --try-decrypt to a keyword argument instead of a boolean.
the command-line interface for indexing (reindex, new, insert) used
--try-decrypt; and the configuration records used index.try_decrypt.
But by comparison with "show" and "reply", there doesn't seem to be
any reason for the "try" prefix.
This changeset adjusts the command-line interface and the
configuration interface.
For the moment, i've left indexopts_{set,get}_try_decrypt alone. The
subsequent changeset will address those.
When doing any decryption, if the notmuch database knows of any
session keys associated with the message in question, try them before
defaulting to using default symmetric crypto.
This changeset does the primary work in _notmuch_crypto_decrypt, which
grows some new parameters to handle it.
The primary advantage this patch offers is a significant speedup when
rendering large encrypted threads ("notmuch show") if session keys
happen to be cached.
Additionally, it permits message composition without access to
asymmetric secret keys ("notmuch reply"); and it permits recovering a
cleartext index when reindexing after a "notmuch restore" for those
messages that already have a session key stored.
Note that we may try multiple decryptions here (e.g. if there are
multiple session keys in the database), but we will ignore and throw
away all the GMime errors except for those that come from last
decryption attempt. Since we don't necessarily know at the time of
the decryption that this *is* the last decryption attempt, we'll ask
for the errors each time anyway.
This does nothing if no session keys are stashed in the database,
which is fine. Actually stashing session keys in the database will
come as a subsequent patch.
This flag should make it easier to write the code for session-key
handling.
Note that this only works for GMime 2.6.21 and later (the session key
interface wasn't available before then). It should be fine to build
the rest of notmuch if this functionality isn't available.
Note that this also adds the "session_key" built_with() aspect to
libnotmuch.
We will use this centralized function to consolidate the awkward
behavior around different gmime versions.
It's only invoked from two places: mime-node.c's
node_decrypt_and_verify() and lib/index.cc's
_index_encrypted_mime_part().
However, those two places have some markedly distinct logic, so the
interface for this _notmuch_crypto_decrypt function is going to get a
little bit clunky. It's worthwhile, though, for the sake of keeping
these #if directives reasonably well-contained.
By default, notmuch won't try to decrypt on indexing. With this
patch, we make it possible to indicate a per-database preference using
the config variable "index.try_decrypt", which by default will be
false.
At indexing time, the database needs some way to know its internal
defaults for how to index encrypted parts. It shouldn't be contingent
on an external config file (since that can't be retrieved from the
database object itself), so we store it in the database.
This behaves similarly to the query.* configurations, which are also
stored in the database itself, so we're not introducing any new
dependencies by requiring that it be stored in the database.
If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.
If we can decrypt, we add the property index.decryption=success.
If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.
Note that a single message may have both values of the
"index.decryption" property: "success" and "failure". For example,
consider a message that includes multiple layers of encryption. If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").
Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing. This means it will subsequently
correspond to the actual semantics of the stored index.
This allows us to create new properties that will be automatically set
during indexing, and cleared during re-indexing, just by choice of
property name.
This is currently mostly a wrapper around _notmuch_crypto_t that keeps
its internals private and doesn't expose any of the GMime API.
However, non-crypto indexing options might also be added later
(e.g. filters or other transformations).
Subsequent patches will introduce a convention that properties whose
name starts with "index." will be stripped (and possibly re-added)
during re-indexing. This patch lays the groundwork for doing that.
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
This is a logical followup to "lib: index the content type of
signature parts", which will make it easier to record the message
structure of all messages.
It's useful (*) to be able to easily find messages with certain types
of signatures. Having the mimetype: prefix searches fail for some
content types is also genuinely surprising (*). Index the content type
of signature parts.
While at it, switch to the gmime convenience constants for content and
signature part indexes.
*) At least for developers of email software!
'g_object_newv' is deprecated, and prints annoying warnings. The
warnings suggest using 'g_object_new_with_properties', but that's only
available since glib 2.55 (i.e. a month ago as of this writing).
Since we don't actuall pass any properties, it seems we can just call
'g_object_new'.
I considered a higher level interface where the caller passes a tag
name rather than a flag character, but the role of the "unread" tag is
particularly confusing with such an interface.
There are at least three places in notmuch that can trigger an
indexing action:
* notmuch new
* notmuch insert
* notmuch reindex
I have plans to add some indexing options (e.g. indexing the cleartext
of encrypted parts, external filters, automated property injection)
that should properly be available in all places where indexing
happens.
I also want those indexing options to be exposed by (and constrained
by) the libnotmuch C API.
This isn't yet an API break because we've never made a release with
notmuch_param_t.
These indexing options are relevant in the listed places (and in the
libnotmuch analogues), but they aren't relevant in the other kinds of
functionality that notmuch offers (e.g. dump/restore, tagging, search,
show, reply).
So i think a generic "param" object isn't well-suited for this case.
In particular:
* a param object sounds like it could contain parameters for some
other (non-indexing) operation. This sounds confusing -- why would
i pass non-indexing parameters to a function that only does
indexing?
* bremner suggests online a generic param object would actually be
passed as a list of param objects, argv-style. In this case (at
least in the obvious argv implementation), the params might be some
sort of generic string. This introduces a problem where the API of
the library doesn't grow as new options are added, which means that
when code outside the library tries to use a feature, it first has
to test for it, and have code to handle it not being available.
The indexopts approach proposed here instead makes it clear at
compile time and at dynamic link time that there is an explicit
dependency on that feature, which allows automated tools to keep
track of what's needed and keeps the actual code simple.
My proposal adds the notmuch_indexopts_t as an opaque struct, so that
we can extend the list of options without causing ABI breakage.
The cost of this proposal appears to be that the "boilerplate" API
increases a little bit, with a generic constructor and destructor
function for the indexopts struct.
More patches will follow that make use of this indexopts approach.
We need a way to pass parameters to the indexing functionality on the
first index, not just on reindexing. The obvious place is in
notmuch_database_add_message. But since modifying the argument list
would break both API and ABI, we needed a new name.
I considered notmuch_database_add_message_with_params(), but the
functionality we're talking about doesn't always add a message. It
tries to index a specific file, possibly adding a message, but
possibly doing other things, like adding terms to an existing message,
or failing to deal with message objects entirely (e.g. because the
file didn't contain a message).
So i chose the function name notmuch_database_index_file.
I confess i'm a little concerned about confusing future notmuch
developers with the new name, since we already have a private
_notmuch_message_index_file function, and the two do rather different
things. But i think the added clarity for people linking against the
future libnotmuch and the capacity for using index parameters makes
this a worthwhile tradeoff. (that said, if anyone has another name
that they strongly prefer, i'd be happy to go with it)
This changeset also adjusts the tests so that we test whether the new,
preferred function returns bad values (since the deprecated function
just calls the new one).
We can keep the deprecated n_d_add_message function around as long as
we like, but at the next place where we're forced to break API or ABI
we can probably choose to drop the name relatively safely.
NOTE: there is probably more cleanup to do in the ruby and go bindings
to complete the deprecation directly. I don't know those languages
well enough to attempt a fix; i don't know how to test them; and i
don't know the culture around those languages about API additions or
deprecations.
Stripping trailing character is not that uncommon
operation. Particularly, the next patch has to perform it as
well. Lets move it to the separate function to avoid code duplication.
Also the new function has a little improvement: if the character to
strip is repeated several times in the end of a string, function
strips them all.
Signed-off-by: Yuri Volchkov <yuri.volchkov@gmail.com>
Since we're accumulating the index when we add a new file to the
message, the semantics have slightly changed. This tries to align the
documentation with the actual functionality.
This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
This operation is relatively inexpensive, as the needed metadata is
already computed by our lazy metadata fetching. The goal is to support
better UI for messages with multipile files.