This will be mandatory as of Xapian 1.5. The API is also more
consistent with the FieldProcessor API, which helps code re-use a bit.
Note that this switches to using the built-in Xapian support for
prefixes on ranges (i.e. deleted code at beginning of
ParseTimeRangeProcessor::operator(), added prefix to constructor).
Another side effect of the migration is that we are generating smaller
queries, using one OP_VALUE_RANGE instead of an AND of two OP_VALUE_*
queries.
As we prepare to handle S/MIME-encrypted PKCS#7 EnvelopedData (which
is not multipart), we don't want to be limited to passing only
GMimeMultipartEncrypted MIME parts to _notmuch_crypto_decrypt.
There is no functional change here, just a matter of adjusting how we
pass arguments internally.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When we are indexing, we should treat SignedData parts the same way
that we treat a multipart object, indexing the wrapped part as a
distinct MIME object.
Unfortunately, this means doing some sort of cryptographic
verification whose results we throw away, because GMime doesn't offer
us any way to unwrap without doing signature verification.
I've opened https://github.com/jstedfast/gmime/issues/67 to request
the capability from GMime but for now, we'll just accept the
additional performance hit.
As we do this indexing, we also apply the "signed" tag, by analogy
with how we handle multipart/signed messages. These days, that kind
of change should probably be done with a property instead, but that's
a different set of changes. This one is just for consistency.
Note that we are currently *only* handling signedData parts, which are
basically clearsigned messages. PKCS#7 parts can also be
envelopedData and authEnvelopedData (which are effectively encryption
layers), and compressedData (which afaict isn't implemented anywhere,
i've never encountered it). We're laying the groundwork for indexing
these other S/MIME types here, but we're only dealing with signedData
for now.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
strncmp looks for a prefix that matches, which is very much not what
we want here. This fixes the bug reported by Franz Fellner in
id:1588595993-ner-8.651@TPL520
Xapian 1.4 is over 3 years old now (1.4.0 released 2016-06-24),
and 1.2 has been deprecated in Notmuch version 0.27 (2018-06-13).
Xapian 1.4 supports compaction, field processors and retry locking;
conditionals checking compaction and field processors were removed
but user may want to disable retry locking at configure time so it
is kept.
Apparently doxygen needs its comments formatted in a specific way to
notice that the group is closed.
Without this fix, with doxygen 1.8.16-2 we see:
```
doxygen ./doc/doxygen.cfg
…/notmuch/lib/notmuch.h:2322: warning: end of file while inside a group
```
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The documentation for notmuch_config_list_key warns that that the
returned value will be destroyed by the next call to
notmuch_config_list_key, but it neglected to mention that calling
notmuch_config_list_value would also destroy it (by calling
notmuch_config_list_key). This is surprising, and caused a use after
free bug in _setup_user_query_fields (first noticed by an OpenBSD
porter, so kudos to the OpenBSD malloc implementation). This change
fixes that use-after-free bug.
When encountering a message that has been mangled in the "mixed up"
way by an intermediate MTA, notmuch should instead repair it and index
the repaired form.
When it does this, it also associates the index.repaired=mixedup
property with the message. If a problem is found with this repair
process, or an improved repair process is proposed later, this should
make it easy for people to reindex the relevant message. The property
will also hopefully make it easier to diagnose this particular problem
in the future.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When we notice a legacy-display part during indexing, it makes more
sense to avoid indexing it as part of the message body.
Given that the protected subject will already be indexed, there is no
need to index this part at all, so we skip over it.
If this happens during indexing, we set a property on the message:
index.repaired=skip-protected-headers-legacy-display
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Our _notmuch_message_crypto_potential_payload implementation could
only return a failure if bad arguments were passed to it. It is an
internal function, so if that happens it's an entirely internal bug
for notmuch.
It will be more useful for this function to return whether or not the
part is in fact a cryptographic payload, so we dispense with the
status return.
If some future change suggests adding a status return back, there are
only a handful of call sites, and no pressure to retain a stable API,
so it could be changed easily. But for now, go with the simpler
function.
We will use this return value in future patches, to make different
decisions based on whether a part is the cryptographic payload or not.
But for now, we just leave the places where it gets invoked marked
with (void) to show that the result is ignored.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This adds no functionality directly, but is a useful starting point
for adding new repair functionality.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When indexing the cleartext of an encrypted message, record any
protected subject in the database, which should make it findable and
visible in search.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This should not change the indexing process yet as nothing calls
_notmuch_message_gen_terms with a user prefix name. On the other hand,
it should not break anything either.
_notmuch_database_prefix does a linear walk of the list of (built-in)
prefixes, followed by a logarithmic time search of the list of user
prefixes. The latter is probably not really noticable.
This will be used to avoid needing a database access to resolve a db
prefix from the corresponding UI prefix (e.g. when indexing). Arguably
the setup of the separate header map does not belong here, since it is
about indexing rather than querying, but we currently don't have any
other indexing setup to do.
Previously this functioned scanned every term attached to a given
Xapian document. It turns out we know how to read only the terms we
need to preserve (and we might have already done so). This commit
replaces many calls to Xapian::Document::remove_term with one call to
::clear_terms, and a (typically much smaller) number of calls to
::add_term. Roughly speaking this is based on the assumption that most
messages have more text than they have tags.
According to the performance test suite, this yields a roughly 40%
speedup on "notmuch reindex '*'"
Without this,
$ make time-test OPTIONS=--small
leads to fatal errors from too many open files.
Thanks to st-gourichon-fid for bringing this problem to my attention in IRC.
Rather than storing the lower level stdio FILE object, we store a
GMime stream. This allows both transparent decompression, and passing
the stream into GMime for parsing. As a side effect, we can let GMime
close the underlying OS stream (indeed, that stream isn't visible here
anymore).
This change is enough to get notmuch-{new,search} working, but there is still
some work required for notmuch-show, to be done in a following commit.
This is a functional change, not a straight translation, because we
are no longer directly invoking g_mime_parser_options_get_default(),
but the GMime source has indicated that the options parameter for
g_mime_parser_construct_message() is "nullable" since upstream commit
d0ebdd2ea3e6fa635a2a551c846e9bc8b6040353 (which itself precedes GMime
3.0).
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Several GMime 2.6 functions sprouted a change in the argument order in
GMime 3.0. We had a compatibility layer here to be able to handle
compiling against both GMime 2.6 and 3.0. Now that we're using 3.0
only, rip out the compatibility layer for those functions with changed
argument lists, and explicitly use the 3.0 argument lists.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Several of these #defines were not actually used in the notmuch
codebase any longer. And as of GMime 3.0, g_mime_init takes no
arguments, so we can also drop the bogus RFC2047 argument that we were
passing and then #defining away.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This means dropping GMimeCryptoContext and notmuch_config arguments.
All the argument changes are to internal functions, so this is not an
API or ABI break.
We also get to drop the #define for g_mime_3_unused.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
In _index_mime_part, we don't need to extract the content-type from
the part until just before we use it, so we also defer it lazily.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.
Unprefixed query terms should continue to match anywhere (header or
body) in the message.
This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.
Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
The exact error messages returned by regerror() aren't standardized;
relying on them isn't portable. Thus, add a a prefix to make clear that
the subsequent message is a regexp parsing error, and only look for this
prefix in the test suite, ignoring the rest of the message.
I can't figure out how checking the sign of a bool ever worked. The
following program demonstrates the problem (i.e. for me it prints 1).
#include <stdio.h>
#include <stdbool.h>
int main(int argc, char **argv) {
bool x;
x = -1;
printf("x = %d\n", x);
}
This seems to be mandated by the C99 standard 6.3.1.2.
Use explicit labels for GTypeInfo member initializers, rather than
relying on comments and ordering. This is both easier to read, and
harder to screw up. This also makes it clear that we're mis-casting
GObject class initializers for gcc.
Without this patch, g++ 8.2.0-7 produces this warning:
CXX -g -O2 lib/index.o
lib/index.cc: In function ‘GMimeFilter* notmuch_filter_discard_non_term_new(GMimeContentType*)’:
lib/index.cc:252:23: warning: cast between incompatible function types from ‘void (*)(NotmuchFilterDiscardNonTermClass*)’ {aka ‘void (*)(_NotmuchFilterDiscardNonTermClass*)’} to ‘GClassInitFunc’ {aka ‘void (*)(void*, void*)’} [-Wcast-function-type]
(GClassInitFunc) notmuch_filter_discard_non_term_class_init,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The definition of GClassInitFunc in
/usr/include/glib-2.0/gobject/gtype.h suggests that this function will
always be called with the class_data member of the GTypeInfo. We set
that value to NULL in both GObject definitions in notmuch. So we mark
it as explicitly unused.
There is no functional change here, just code cleanup.
As reported by Sean Whitton, there are mailers (in particular the
Debian Bug Tracking System) that have sensible In-Reply-To headers,
but un-useful-for-notmuch References (in particular with the BTS, the
oldest reference is last). I looked at a sample of about 200K
messages, and only about 0.5% these had something other than a single
message-id in In-Reply-To. On this basis, if we see a single
message-id in In-Reply-To, consider that as authoritative.
The idea is that if a message-id parses with this function, the MUA
generating it was probably sane, and in particular it's probably safe
to use the result as a parent from In-Reply-to.
We (finally) implement the XXX comment. It requires a bit of care not
to reparent all of the possible toplevel messages.
_notmuch_messages_has_next is not ready to be a public function yet,
since it punts on the mset case. We know in the one case it is called,
the notmuch_messages_t is just a regular list / iterator.