database.cc is uncomfortably large, and some of the static data
structures do not need to be shared as much as they are.
This is a somewhat small piece to factor out, but it will turn out to
be helpful to further refactoring.
static_cast is a bit tricky to understand and error prone, so add a
second pointer to (potentially the same) Xapian database object that
we know has the right subclass.
I'm not sure what the point of modifying that right before destroying
the object is. In a future commit I want to remove that element of the
object, so simplify that task.
In order to mimic the "best effort" API of Xapian to provide
information from a closed database when possible, do not
destroy the Xapian database object too early.
Because the pointer to a Xapian database is no longer nulled on close,
introduce a flag to track whether the notmuch database is open or not.
This will be mandatory as of Xapian 1.5. The API is also more
consistent with the FieldProcessor API, which helps code re-use a bit.
Note that this switches to using the built-in Xapian support for
prefixes on ranges (i.e. deleted code at beginning of
ParseTimeRangeProcessor::operator(), added prefix to constructor).
Another side effect of the migration is that we are generating smaller
queries, using one OP_VALUE_RANGE instead of an AND of two OP_VALUE_*
queries.
Xapian 1.4 is over 3 years old now (1.4.0 released 2016-06-24),
and 1.2 has been deprecated in Notmuch version 0.27 (2018-06-13).
Xapian 1.4 supports compaction, field processors and retry locking;
conditionals checking compaction and field processors were removed
but user may want to disable retry locking at configure time so it
is kept.
This should not change the indexing process yet as nothing calls
_notmuch_message_gen_terms with a user prefix name. On the other hand,
it should not break anything either.
_notmuch_database_prefix does a linear walk of the list of (built-in)
prefixes, followed by a logarithmic time search of the list of user
prefixes. The latter is probably not really noticable.
This will be used to avoid needing a database access to resolve a db
prefix from the corresponding UI prefix (e.g. when indexing). Arguably
the setup of the separate header map does not belong here, since it is
about indexing rather than querying, but we currently don't have any
other indexing setup to do.
Several of these #defines were not actually used in the notmuch
codebase any longer. And as of GMime 3.0, g_mime_init takes no
arguments, so we can also drop the bogus RFC2047 argument that we were
passing and then #defining away.
signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.
Unprefixed query terms should continue to match anywhere (header or
body) in the message.
This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.
Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
This change allows queries of the form
thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
This is still somewhat brute-force, but it's a big improvement over
both the shell script solution and the previous proposal [1], because it
does not build the whole thread structure just generate a
query. A further potential optimization is to replace the calls to
notmuch with more specialized Xapian code; in particular it's not
likely that reading all of the message metadata is a win here.
[1]: id:20170820213240.20526-1-david@tethera.net
This change allows queries of the form
thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
This is still somewhat brute-force, but it's a big improvement over
both the shell script solution and the previous proposal [1], because it
does not build the whole thread structure just generate a
query. A further potential optimization is to replace the calls to
notmuch with more specialized Xapian code; in particular it's not
likely that reading all of the message metadata is a win here.
[1]: id:20170820213240.20526-1-david@tethera.net
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
Stripping trailing character is not that uncommon
operation. Particularly, the next patch has to perform it as
well. Lets move it to the separate function to avoid code duplication.
Also the new function has a little improvement: if the character to
strip is repeated several times in the end of a string, function
strips them all.
Signed-off-by: Yuri Volchkov <yuri.volchkov@gmail.com>
'database.cc' is becoming a monster, and it's hard to follow what the
various static functions are used for. It turns out that about 1/3 of
this file notmuch_database_add_message and helper functions not used
by any other function. This commit isolates this code into it's own
file.
Some side effects of this refactoring:
- find_doc_ids becomes the non-static (but still private)
_notmuch_database_find_doc_ids
- a few instances of 'string' have 'std::' prepended, avoiding the
need for 'using namespace std;' in the new file.
From a UI perspective this looks similar to what was already provided
for from, subject, and mid, but the implementation is quite
different. It uses the database's list of terms to construct a term
based query equivalent to the passed regular expression.
This function was deprecated in notmuch 0.21. We re-use the name for
a status returning version, and deprecate the _st name. One or two
remaining uses of the (removed) non-status returning version fixed at
the same time
mid: is the url scheme suggested by URL 2392. We also plan to
introduce more flexible searches for mid: than are possible with
id: (in order not to break assumptions about the special behaviour of
id:, e.g. identifying at most one message).
the idea is that you can run
% notmuch search subject:/<your-favourite-regexp>/
% notmuch search from:/<your-favourite-regexp>/
or
% notmuch search subject:"your usual phrase search"
% notmuch search from:"usual phrase search"
This feature is only available with recent Xapian, specifically
support for field processors is needed.
It should work with bindings, since it extends the query parser.
This is easy to extend for other value slots, but currently the only
value slots are date, message_id, from, subject, and last_mod. Date is
already searchable; message_id is left for a followup commit.
This was originally written by Austin Clements, and ported to Xapian
field processors (from Austin's custom query parser) by yours truly.
The two g_hash_table functions (insert, add) have different behaviour
with respect to existing keys. g_hash_table_insert frees the new key,
while g_hash_table_add (which is really g_hash_table_replace in
disguise) frees the existing key. With this change 'ref' is live until
the end of the function (assuming single-threaded access to
'hash'). We can't guarantee it will continue to be live in the
future (i.e. there may be a future key duplication) so we copy it with
the allocation context passed to parse_references (in practice this is
the notmuch_message_t object whose parents we are finding).
Thanks to Tomi for the simpler approach to the problem based on
reading the fine glib manual.
Replace multiple tables with some flags in a single table. This makes
the code in notmuch_database_open_verbose a bit shorter, and it should
also make it easier to add other options to fields, e.g. regexp
searching.
It seems that no-one tried to compile without Xapian compact support
since March of 2015, since that's when I introduced a syntax error in
that branch of the ifdef.
Given the choice of maintaining this underused branch of code, or
bumping the Xapian dependency to a version from 2011, it seems
reasonable to do the latter.
We want to be able to query the properties directly, like:
notmuch count property:foo=bar
which should return a count of messages where the property with key
"foo" has value equal to "bar".
I believe the current one is misleading, because in my experiments
Xapian did not add : when prefix and term were both upper case. Indeed,
it's hard to see how it could, because prefixes are added at a layer
above Xapian in our code. See _notmuch_message_add_term for an example.
Also try to explain why this is a good idea. As far as I can ascertain,
this is more of an issue for a system trying to work with an unknown set
of prefixes. Since notmuch has a fixed set of prefixes, and we can
hopefully be trusted not to add XGOLD and XGOLDEN as prefixes, it is
harder for problems to arise.
_notmuch_database_log clears the log buffer each time. Rather than
introducing more complicated semantics about for this function, provide
a second function that does not clear the buffer. This is mainly a
convenience function for callers constructing complex or multi-line log
messages.
The changes to query.cc are to make sure that the common code path of
the new function is tested.