This replaces two instances of Xapian::Query::MatchAll with the
equivalent but thread-safe alternative Xapian::Query(std::string()).
Xapian::Query::MatchAll maintains an internal pointer to a refcounted
Xapian::Internal::QueryTerm.
None of this is thread-safe but that wouldn't be an issue if
Xapian::Query::MatchAll wasn't static. Because it's static, the
refcounting goes awry when Notmuch is called from multiple threads.
This is actually documented by Xapian:
4715de3a9f/xapian-core/include/xapian/query.h (L65)
While static, Xapian::Query::MatchNothing is safe because it doesn't
maintain an internal object and as such, doesn't use references.
Two best-effort tests making use of TSan were added to showcase the
issue (I couldn't figure out a way to deterministically reproduce it
without making an unmaintainable mess).
First, when two databases are created in parallel, a query that uses
Xapian::Query::MatchAll is made (lib/query.cc), resulting in the
following backtrace on a segfault:
#0 0x00007ffff76822af in Xapian::Query::get_terms_begin (this=0x7fffe80137f0) at api/query.cc:141
#1 0x00007ffff7f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) at lib/query.cc:176
#2 0x00007ffff7f93784 in _notmuch_query_ensure_parsed_xapian (query=0x7fffe80137c0) at lib/query.cc:225
#3 0x00007ffff7f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) at lib/query.cc:260
#4 0x00007ffff7f93bfe in _notmuch_query_search_documents (query=0x7fffe80137c0, type=0x7ffff7fa9b1e "mail", out=0x7ffff666da18) at lib/query.cc:361
#5 0x00007ffff7f93ba4 in notmuch_query_search_messages (query=0x7fffe80137c0, out=0x7ffff666da18) at lib/query.cc:349
#6 0x00007ffff7f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, progress_notify=0x0, closure=0x0) at lib/database.cc:934
#7 0x00007ffff7fa110f in notmuch_database_create_with_config (database_path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x7ffff7faab3c "", profile=0x0, database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:754
#8 0x00007ffff7fa0d6f in notmuch_database_create_verbose (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:653
#9 0x00007ffff7fa0ceb in notmuch_database_create (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637
...
Second, some queries would make use of Xapian::Query::MatchAll
(lib/regexp-fields.cc), resulting in the following backtrace on a
segfault:
#0 0x00007f629828b690 in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
#1 0x00007f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms (this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434
#2 0x00007f629828b69f in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
#3 0x00007f6298282571 in Xapian::Query::get_unique_terms_begin (this=0x7f628800dcd8) at api/query.cc:166
#4 0x00007f629841a59b in Xapian::Weight::Internal::accumulate_stats (this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86
#5 0x00007f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, nowait=true, total_stats=...) at matcher/localsubmatch.cc:172
#6 0x00007f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237
#7 0x00007f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, order_=Xapian::Enquire::ASCENDING, sort_key_=0, sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of length 0, capacity 0, have_sorter=false, have_mdecider=false) at matcher/multimatch.cc:353
#8 0x00007f629826fcba in Xapian::Enquire::Internal::get_mset (this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:569
#9 0x00007f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:937
#10 0x00007f6298be529a in _notmuch_query_search_documents (query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at lib/query.cc:447
#11 0x00007f6298be4ae8 in notmuch_query_search_messages (query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349
...
Printing Xapian::Query::MatchAll->internal.px->_refs in these
circumstances can help quickly identifying this scenario.
This is motivated by some test frameworks (like Rust's Cargo) that
runs unit tests in parallel and would easily encounter this issue,
unless client code gates every call to Notmuch behind a lock.
This is what can be expected from the tests when they fail:
== stderr ==
+==================
+WARNING: ThreadSanitizer: data race (pid=207931)
+ Read of size 1 at 0x7b10000001a0 by thread T2:
+ #0 memcpy <null> (libtsan.so.2+0x62506)
+ #1 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] <null> (libxapian.so.30+0x872b3)
+
+ Previous write of size 8 at 0x7b10000001a0 by thread T1:
+ #0 operator new(unsigned long) <null> (libtsan.so.2+0x8ba83)
+ #1 Xapian::Query::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) <null> (libxapian.so.30+0x855cd)
...
a1d139de ("lib: add sexp: prefix to Xapian (infix) query parser.",
2022-04-09) introduced sfsexp infix queries. This requires the infix
preprocessor to be built in in a way which does not require sfsexp when
notmuch is built without it.
Make the preprocessor throw a Xapian error in this case (and fix the
build).
Signed-off-by: Michael J Gruber <git@grubix.eu>
It will be convenient not to have to construct a notmuch query object
when parsing subqueries, so the commit rewrites the query
expansion (currently only used for thread:{} queries) using only
Xapian. As a bonus it seems about 15% faster in initial experiments.
When dealing with recursive queries (i.e. thread:{foo}) it turns out
to be useful just to deal with the underlying Xapian objects, and not
wrap them in notmuch objects.
The previous code had the somewhat bizarre effect that the (notmuch
specific) query string was "*" (interpreted as MatchAll) and the
allegedly parsed xapian_query was "MatchNothing".
This commit also reduces code duplication.
There is not much of a parser here yet, but it already does some
useful error reporting. Most functionality sketched in the
documentation is not implemented yet; detailed documentation will
follow with the implementation.
Set the parsing syntax when the (notmuch) query object is
created. Initially the library always returns a trivial query that
matches all messages when using s-expression syntax.
It seems better to select the syntax at query creation time because
the lazy parsing is an implementation detail.
At least Fedora28 triggers this Xapian bug due to some toolchain change .
https://bugzilla.redhat.com/show_bug.cgi?id=1546162
The underlying bug is fixed in xapian commit f92e2a936c1592, and
should be fixed in Xapian 1.4.6
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
Commit d5523ead90 ("Mark some structures in the library interface
with visibility=default attribute.") fixed some mixed visibility
issues with structs. With the symbol default visibility reversed, this
is no longer a problem.
This function was deprecated in notmuch 0.21. We re-use the name for
a status returning version, and deprecate the _st name. One or two
remaining uses of the (removed) non-status returning version fixed at
the same time
This function was deprecated in notmuch 0.21. We finally remove the
deprecated API, and rename the status returning version to the simpler
name. The status returning is kept as a deprecated alias.
We filter added exclude at add time, rather than modifying the query by
count search. As noted in the comments, there are several ignored
conditions here.
The main goal is to prepare the way for non-destructive (or at least
less destructive) exclude tag handling. It does this by having a
pre-parsed query available for further processing. This also allows us
to provide slightly more precise error messages.
From #xapian
olly> bremner: btw, i noticed notmuch count see ms to request all the documents and then ignores them
bremner> hmm. There's something funny about the way that notmuch uses matches in general iirc
olly> it should be able to do: mset = enquire.get_mset (0, 0, notmuch->xapian_db->get_doccount ());
...
olly> get_matches_estimated() will be exact because check_at_least is the size of the database
_notmuch_database_log clears the log buffer each time. Rather than
introducing more complicated semantics about for this function, provide
a second function that does not clear the buffer. This is mainly a
convenience function for callers constructing complex or multi-line log
messages.
The changes to query.cc are to make sure that the common code path of
the new function is tested.
Many of the external links found in the notmuch source can be resolved
using https instead of http. This changeset addresses as many as i
could find, without touching the e-mail corpus or expected outputs
found in tests.
Cleaned the following whitespace in lib/* files:
lib/index.cc: 1 line: trailing whitespace
lib/database.cc 5 lines: 8 spaces at the beginning of line
lib/notmuch-private.h: 4 lines: 8 spaces at the beginning of line
lib/message.cc: 1 line: trailing whitespace
lib/sha1.c: 1 line: empty lines at the end of file
lib/query.cc: 2 lines: 8 spaces at the beginning of line
lib/gen-version-script.sh: 1 line: trailing whitespace
It's already kindof gross that this is hardcoded in two different
places. We will also need these later in field processors calling back
into the query parser.
Publicly we are only exposing the non-ghost documents (of "type"
"mail"). But internally we might want to inspect the ghost messages
as well.
This changeset adds two new private interfaces to queries to recover
information about alternate document types.
Although I think it's a pretty bad idea to continue using the old API,
this allows both a more gentle transition for clients of the library,
and allows us to break one monolithic change into a series
These functions are all just accessors, and it's pretty clear they don't
modify the query struct. This also fixes one warning I created when I
introduced status.c.
This is not supposed to change any functionality from an end user
point of view. Note that it will eliminate some output to stderr. The
query debugging output is left as is; it doesn't really fit with the
current primitive logging model. The remaining "bad" fprintf will need
an internal API change.
The default is actually exact if no checkatleast parameter is
specified. This change makes that explicit, mainly for documentation,
but also to be safe in the unlikely event of a change of default.
[ commit message rewritten by db based on id:87lho0nlkk.fsf@nikula.org
]
This at least allows distinguishing between out of memory and Xapian
exceptions. Adding finer grained status codes would allow different
Xapian exceptions to be preserved.
Adding wrappers allows people to transition gradually to the new API,
at the cost of bloating the library API a bit.
The queries don't really work after a database is closed, and we would
like them to be freed if the database is destroyed.
Acknowledged-by: David Bremner <david@tethera.net>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Add NOTMUCH_EXCLUDE_FLAG to notmuch_exclude_t so that it can
cover all four values of search --exclude in the cli.
Previously the way to avoid any message being marked excluded was to
pass in an empty list of excluded tags: since we now have an explicit
option we might as well honour it.
The enum is in a slightly strange order as the existing FALSE/TRUE
options correspond to the new
NOTMUCH_EXCLUDE_FLAG/NOTMUCH_EXCLUDE_TRUE options so this means we do
not need to bump the version number.
Indeed, an example of this is that the cli count and show still use
FALSE/TRUE and still work.
Using char instead of int allows for simpler definitions of the
DOCIDSET macros so the code is easier to understand and consistent with
respect to memory-usage. Estimated reduction of memory-usage for
bitmap about 8 times.
When the exclude tags contain a tag that does not occur anywhere in
the Xapian database the exclusion fails. We modify the way the query
is constructed to `work around' this. (In fact the new code is cleaner
anyway.)
It also seems to fix another exclusion failure bug reported by
jrollins but we have not yet worked out why it helps in that case.
Allow query debugging to be enabled at run-time by setting the
NOTMUCH_DEBUG_QUERY environment variable to a non-empty string.
Previously, enabling query debugging required recompiling, but parsed
queries are often useful for tracking down bugs in situations where
recompiling is inconvenient.
Add the NOTMUCH_MESSAGE_FLAG_EXCLUDED flag to
notmuch_query_search_threads. Implemented by inspecting the tags
directly in _notmuch_thread_create/_thread_add_message rather than as
a Xapian query for speed reasons.
Note notmuch_thread_get_matched_messages now returns the number of
non-excluded matching messages. This API is not totally desirable but
fixing it means breaking binary compatibility so we delay that.
Add a flag NOTMUCH_MESSAGE_FLAG_EXCLUDED which is set by
notmuch_query_search_messages for excluded messages. Also add an
option omit_excluded_messages to the search that we do not want the
excludes at all.
This exclude flag will be added to notmuch_query_search threads in the
next patch.
This is useful for tags like "deleted" and "spam" that people
generally want to exclude from query results. These exclusions will
be overridden if a tag is explicitly mentioned in a query.
Add function notmuch_query_count_threads() to get the number of threads
matching a search. This is done by performing a search and figuring out the
number of unique thread IDs in the matching messages, a significantly
heavier operation than notmuch_query_count_messages().
Signed-off-by: Jani Nikula <jani@nikula.org>
As of gcc 4.6, there are new warnings from -Wattributes along the lines of:
warning: ‘_notmuch_messages’ declared with greater visibility
than the type of its field ‘_notmuch_messages::iterator’
[-Wattributes]
To squelch these, we decorate all such containing structs with
__attribute__((visibility("default"))). We take care to let only the
C++ compiler see this, (since the C compiler would otherwise warn
about ignored visibility attributes on types).
Don't require the caller of _notmuch_doc_id_set_init to pass in a
correct bound; instead compute it from the array. This simplifies the
caller and makes this interface easier to use correctly.
Remove the repeated "sizeof (doc_ids->bitmap[0])" that bothered cworth
by instead defining macros to compute the word and bit offset of a
given bit in the doc ID set bitmap.