When encountering a message that has been mangled in the "mixed up"
way by an intermediate MTA, notmuch should instead repair it and index
the repaired form.
When it does this, it also associates the index.repaired=mixedup
property with the message. If a problem is found with this repair
process, or an improved repair process is proposed later, this should
make it easy for people to reindex the relevant message. The property
will also hopefully make it easier to diagnose this particular problem
in the future.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When we notice a legacy-display part during indexing, it makes more
sense to avoid indexing it as part of the message body.
Given that the protected subject will already be indexed, there is no
need to index this part at all, so we skip over it.
If this happens during indexing, we set a property on the message:
index.repaired=skip-protected-headers-legacy-display
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Our _notmuch_message_crypto_potential_payload implementation could
only return a failure if bad arguments were passed to it. It is an
internal function, so if that happens it's an entirely internal bug
for notmuch.
It will be more useful for this function to return whether or not the
part is in fact a cryptographic payload, so we dispense with the
status return.
If some future change suggests adding a status return back, there are
only a handful of call sites, and no pressure to retain a stable API,
so it could be changed easily. But for now, go with the simpler
function.
We will use this return value in future patches, to make different
decisions based on whether a part is the cryptographic payload or not.
But for now, we just leave the places where it gets invoked marked
with (void) to show that the result is ignored.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This adds no functionality directly, but is a useful starting point
for adding new repair functionality.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When indexing the cleartext of an encrypted message, record any
protected subject in the database, which should make it findable and
visible in search.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This should not change the indexing process yet as nothing calls
_notmuch_message_gen_terms with a user prefix name. On the other hand,
it should not break anything either.
_notmuch_database_prefix does a linear walk of the list of (built-in)
prefixes, followed by a logarithmic time search of the list of user
prefixes. The latter is probably not really noticable.
This will be used to avoid needing a database access to resolve a db
prefix from the corresponding UI prefix (e.g. when indexing). Arguably
the setup of the separate header map does not belong here, since it is
about indexing rather than querying, but we currently don't have any
other indexing setup to do.
Previously this functioned scanned every term attached to a given
Xapian document. It turns out we know how to read only the terms we
need to preserve (and we might have already done so). This commit
replaces many calls to Xapian::Document::remove_term with one call to
::clear_terms, and a (typically much smaller) number of calls to
::add_term. Roughly speaking this is based on the assumption that most
messages have more text than they have tags.
According to the performance test suite, this yields a roughly 40%
speedup on "notmuch reindex '*'"
Without this,
$ make time-test OPTIONS=--small
leads to fatal errors from too many open files.
Thanks to st-gourichon-fid for bringing this problem to my attention in IRC.
Rather than storing the lower level stdio FILE object, we store a
GMime stream. This allows both transparent decompression, and passing
the stream into GMime for parsing. As a side effect, we can let GMime
close the underlying OS stream (indeed, that stream isn't visible here
anymore).
This change is enough to get notmuch-{new,search} working, but there is still
some work required for notmuch-show, to be done in a following commit.
This is a functional change, not a straight translation, because we
are no longer directly invoking g_mime_parser_options_get_default(),
but the GMime source has indicated that the options parameter for
g_mime_parser_construct_message() is "nullable" since upstream commit
d0ebdd2ea3e6fa635a2a551c846e9bc8b6040353 (which itself precedes GMime
3.0).
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Several GMime 2.6 functions sprouted a change in the argument order in
GMime 3.0. We had a compatibility layer here to be able to handle
compiling against both GMime 2.6 and 3.0. Now that we're using 3.0
only, rip out the compatibility layer for those functions with changed
argument lists, and explicitly use the 3.0 argument lists.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Several of these #defines were not actually used in the notmuch
codebase any longer. And as of GMime 3.0, g_mime_init takes no
arguments, so we can also drop the bogus RFC2047 argument that we were
passing and then #defining away.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This means dropping GMimeCryptoContext and notmuch_config arguments.
All the argument changes are to internal functions, so this is not an
API or ABI break.
We also get to drop the #define for g_mime_3_unused.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
In _index_mime_part, we don't need to extract the content-type from
the part until just before we use it, so we also defer it lazily.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.
Unprefixed query terms should continue to match anywhere (header or
body) in the message.
This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.
Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
The exact error messages returned by regerror() aren't standardized;
relying on them isn't portable. Thus, add a a prefix to make clear that
the subsequent message is a regexp parsing error, and only look for this
prefix in the test suite, ignoring the rest of the message.
I can't figure out how checking the sign of a bool ever worked. The
following program demonstrates the problem (i.e. for me it prints 1).
#include <stdio.h>
#include <stdbool.h>
int main(int argc, char **argv) {
bool x;
x = -1;
printf("x = %d\n", x);
}
This seems to be mandated by the C99 standard 6.3.1.2.
Use explicit labels for GTypeInfo member initializers, rather than
relying on comments and ordering. This is both easier to read, and
harder to screw up. This also makes it clear that we're mis-casting
GObject class initializers for gcc.
Without this patch, g++ 8.2.0-7 produces this warning:
CXX -g -O2 lib/index.o
lib/index.cc: In function ‘GMimeFilter* notmuch_filter_discard_non_term_new(GMimeContentType*)’:
lib/index.cc:252:23: warning: cast between incompatible function types from ‘void (*)(NotmuchFilterDiscardNonTermClass*)’ {aka ‘void (*)(_NotmuchFilterDiscardNonTermClass*)’} to ‘GClassInitFunc’ {aka ‘void (*)(void*, void*)’} [-Wcast-function-type]
(GClassInitFunc) notmuch_filter_discard_non_term_class_init,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The definition of GClassInitFunc in
/usr/include/glib-2.0/gobject/gtype.h suggests that this function will
always be called with the class_data member of the GTypeInfo. We set
that value to NULL in both GObject definitions in notmuch. So we mark
it as explicitly unused.
There is no functional change here, just code cleanup.
As reported by Sean Whitton, there are mailers (in particular the
Debian Bug Tracking System) that have sensible In-Reply-To headers,
but un-useful-for-notmuch References (in particular with the BTS, the
oldest reference is last). I looked at a sample of about 200K
messages, and only about 0.5% these had something other than a single
message-id in In-Reply-To. On this basis, if we see a single
message-id in In-Reply-To, consider that as authoritative.
The idea is that if a message-id parses with this function, the MUA
generating it was probably sane, and in particular it's probably safe
to use the result as a parent from In-Reply-to.
We (finally) implement the XXX comment. It requires a bit of care not
to reparent all of the possible toplevel messages.
_notmuch_messages_has_next is not ready to be a public function yet,
since it punts on the mset case. We know in the one case it is called,
the notmuch_messages_t is just a regular list / iterator.
This is mainly to lay out the structure of the final code. The problem
isn't really solved yet, although some very simple cases are
better (hence the fixed test). We need two passes through the messages
because we need to be careful not to re-parent too many messages and
end up without any toplevel messages.
There is no public notmuch_message_list_t public interface, so to this
is added to the private API. We use it immediately in thread.cc;
future commits will use it further.
For non-root messages, this should not should anything currently, as
the messages are already added in date order. In the future we will
add some non-root messages in a second pass out of order and the
sorting will be useful. It does fix the order of multiple
root-messages (although it is overkill for that).
This is technically an API change, but it is not an ABI change, and
it's merely a statement that limits what the library can do.
This is in parallel to notmuch_query_get_database(), which also takes
a const pointer.
The user can already do this manually, of course, but (a) it's nice to
have a convenience function, and (b) exposing this interface means
that someone more clever with a _notmuch_string_map_t than i am can
write a more efficient version if they like, and it will just
accelerate the users of the convenience function.
We've had _notmuch_message_database() internally for a while, and it's
useful. It turns out to be useful on the other side of the library
interface as well (i'll use it later in this series for "notmuch
show"), so we expose it publicly now.