Commit graph

901 commits

Author SHA1 Message Date
David Bremner
4f82acce17 lib/string_map: simulate stable sorting
qsort(3) does not promise stability, and recent versions of glibc have
been showing more unstable behaviour [2]. Michael Gruber observed [1] test
breakage due to changing output order for message properties.

We provide a sorting order of (key,value) pairs that _looks_ stable by
breaking ties based on value if keys are equal. Internally there may
be some instability in the case of duplicate (key,value) pairs, but it
should not be observable via the iterator API.

[1]: id:CAA19uiSHjVFmwH0pMC7WwDYCOSzu3yqNbuYhu3ZMeNNRh313eA@mail.gmail.com
[2]: id:87msv3i44u.fsf@oldenburg.str.redhat.com
2023-11-28 09:19:21 -04:00
David Bremner
1c10d91d8e Pass error message from GLib ini parser to CLI
The function _notmuch_config_load_from_file is only called in two
places in open.cc. Update internal API to match the idiom in open.cc.
Adding a newline is needed for consistency with other status strings.

Based in part on a patch [1] from Eric Blake.

[1]: id:20230906153402.101471-1-eblake@redhat.com
2023-09-23 08:34:48 -03:00
David Bremner
b6f144abe1 lib/n_d_remove_message: do not remove unique filename
It is wasteful to remove a filename term when the whole message
document is about to be removed from the database. Profiling with perf
shows this takes a significant portion of the time when cleaning up
removed files in the database.

The logic of n_d_remove_message becomes a bit more convoluted here in
order to make the change minimal.

It is possible that this function can be further optimized, since the
expansion of filename terms into filenames is probably not needed
here.
2023-07-22 07:15:59 -03:00
David Bremner
d93d49b6ae lib/message: check message type before deleting document
It isn't really clear how this worked before. Traversing the terms of
a document after deleting it from the database seems likely to be
undefined behaviour at best
2023-07-22 07:11:46 -03:00
David Bremner
a62b8a95c0 doc/lib: clarify ownership for notmuch_database_get_revision
The ownership is implicit in the const declaration (I think!), but
that does not show up in the doxygen generated API docs.
2023-07-09 12:08:28 -03:00
David Bremner
a554690d6a lib: index attachments with mime types matching index.as_text
Instead of skipping indexing all attachments, we check of a (user
configured) mime type that is indexable as text.
2023-04-02 19:24:43 -03:00
David Bremner
3f5809bf28 lib: parse index.as_text
We pre-parse into a list of compiled regular expressions to avoid
calling regexc on the hot (indexing) path.  As explained in the code
comment, this cannot be done lazily with reasonable error reporting,
at least not without touching a lot of the code in index.cc.
2023-04-02 19:22:36 -03:00
David Bremner
c6733a45c8 lib: add config key INDEX_AS_TEXT
Higher level processing as a list of regular expressions and
documentation will follow.
2023-04-02 19:21:37 -03:00
Kevin Boulain
6273966d0b lib: replace some uses of Query::MatchAll with a thread-safe alternative
This replaces two instances of Xapian::Query::MatchAll with the
equivalent but thread-safe alternative Xapian::Query(std::string()).
Xapian::Query::MatchAll maintains an internal pointer to a refcounted
Xapian::Internal::QueryTerm.

None of this is thread-safe but that wouldn't be an issue if
Xapian::Query::MatchAll wasn't static. Because it's static, the
refcounting goes awry when Notmuch is called from multiple threads.
This is actually documented by Xapian:
4715de3a9f/xapian-core/include/xapian/query.h (L65)

While static, Xapian::Query::MatchNothing is safe because it doesn't
maintain an internal object and as such, doesn't use references.

Two best-effort tests making use of TSan were added to showcase the
issue (I couldn't figure out a way to deterministically reproduce it
without making an unmaintainable mess).

First, when two databases are created in parallel, a query that uses
Xapian::Query::MatchAll is made (lib/query.cc), resulting in the
following backtrace on a segfault:
  #0  0x00007ffff76822af in Xapian::Query::get_terms_begin (this=0x7fffe80137f0) at api/query.cc:141
  #1  0x00007ffff7f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) at lib/query.cc:176
  #2  0x00007ffff7f93784 in _notmuch_query_ensure_parsed_xapian (query=0x7fffe80137c0) at lib/query.cc:225
  #3  0x00007ffff7f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) at lib/query.cc:260
  #4  0x00007ffff7f93bfe in _notmuch_query_search_documents (query=0x7fffe80137c0, type=0x7ffff7fa9b1e "mail", out=0x7ffff666da18) at lib/query.cc:361
  #5  0x00007ffff7f93ba4 in notmuch_query_search_messages (query=0x7fffe80137c0, out=0x7ffff666da18) at lib/query.cc:349
  #6  0x00007ffff7f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, progress_notify=0x0, closure=0x0) at lib/database.cc:934
  #7  0x00007ffff7fa110f in notmuch_database_create_with_config (database_path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x7ffff7faab3c "", profile=0x0, database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:754
  #8  0x00007ffff7fa0d6f in notmuch_database_create_verbose (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:653
  #9  0x00007ffff7fa0ceb in notmuch_database_create (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637
  ...

Second, some queries would make use of Xapian::Query::MatchAll
(lib/regexp-fields.cc), resulting in the following backtrace on a
segfault:
  #0  0x00007f629828b690 in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #1  0x00007f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms (this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434
  #2  0x00007f629828b69f in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #3  0x00007f6298282571 in Xapian::Query::get_unique_terms_begin (this=0x7f628800dcd8) at api/query.cc:166
  #4  0x00007f629841a59b in Xapian::Weight::Internal::accumulate_stats (this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86
  #5  0x00007f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, nowait=true, total_stats=...) at matcher/localsubmatch.cc:172
  #6  0x00007f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237
  #7  0x00007f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, order_=Xapian::Enquire::ASCENDING, sort_key_=0, sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of length 0, capacity 0, have_sorter=false, have_mdecider=false) at matcher/multimatch.cc:353
  #8  0x00007f629826fcba in Xapian::Enquire::Internal::get_mset (this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:569
  #9  0x00007f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:937
  #10 0x00007f6298be529a in _notmuch_query_search_documents (query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at lib/query.cc:447
  #11 0x00007f6298be4ae8 in notmuch_query_search_messages (query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349
  ...

Printing Xapian::Query::MatchAll->internal.px->_refs in these
circumstances can help quickly identifying this scenario.

This is motivated by some test frameworks (like Rust's Cargo) that
runs unit tests in parallel and would easily encounter this issue,
unless client code gates every call to Notmuch behind a lock.

This is what can be expected from the tests when they fail:
   == stderr ==
  +==================
  +WARNING: ThreadSanitizer: data race (pid=207931)
  +  Read of size 1 at 0x7b10000001a0 by thread T2:
  +    #0 memcpy <null> (libtsan.so.2+0x62506)
  +    #1 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] <null> (libxapian.so.30+0x872b3)
  +
  +  Previous write of size 8 at 0x7b10000001a0 by thread T1:
  +    #0 operator new(unsigned long) <null> (libtsan.so.2+0x8ba83)
  +    #1 Xapian::Query::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) <null> (libxapian.so.30+0x855cd)
  ...
2023-03-31 08:11:39 -03:00
Kevin Boulain
fb55ff28a2 lib/message-property: sync removed properties to the database
_notmuch_message_remove_all_properties wasn't syncing the message back
to the database but was still invalidating the metadata, giving the
impression the properties had actually been removed.

Also move the metadata invalidation to _notmuch_message_remove_terms
to be closer to what's done in _notmuch_message_modify_property and
_notmuch_message_remove_term.
2023-03-30 08:01:09 -03:00
Kevin Boulain
568f6bc3c2 lib/message-property: catch xapian exceptions
Since libnotmuch exposes a C interface there's no way for clients to
catch this.
Inspired by what's done for tags (see notmuch_message_remove_tag).
2023-03-30 07:08:47 -03:00
Kevin Boulain
d86e03c786 lib/notmuch: update example
Likely missed in 86cbd215e, when notmuch_query_search_messages_st was
renamed to notmuch_query_search_messages.
2023-02-27 08:34:38 -04:00
David Bremner
09f2ad8e85 lib: add better diagnostics for over long filenames.
Previously we just crashed with an internal error. With this change,
the caller can handle it better. Update notmuch-new so that it doesn't
crash with "unknown error code" because of this change.
2023-02-20 09:22:32 -04:00
David Bremner
1d5d0ae686 lib/message: move xapian call inside try/catch block in _n_m_delete
The call to delete_document can throw exceptions (and can happen in
practice [1]), so catch the exception and extract the error
message. As a side effect, also move the call to _n_m_has_term inside
the try/catch. This should not change anything as that function
already traps any Xapian exceptions.

[1]: id:wwuk039sk2p.fsf@chaotikum.eu
2022-12-27 11:59:46 -04:00
David Bremner
16d92abf9f lib/database: propagate status code from _notmuch_message_delete
_notmuch_message_delete can return (at least)
NOTMUCH_STATUS_XAPIAN_EXCEPTION, which we should not ignore.
2022-12-27 11:59:29 -04:00
David Bremner
2e5ef69fbf lib: add field processor for lastmod: prefix
By sharing the existing logic used by the sexp query parser, this
allows negative lastmod revisions to be interpreted as relative to the
most recent revision.
2022-09-03 08:43:33 -03:00
David Bremner
93c602a82f lib: factor out lastmod range handling from sexp parser.
This will permit the re-use of the same logic in the infix query
parser. The location of the shared code in the infix side is for
consistency with the other shared parsing logic. It will make more
sense when a Xapian field processor is added for the lastmod prefix.
2022-09-03 08:36:53 -03:00
David Bremner
606d9b02e4 lib/sexp: provide relative lastmod queries
Test the relatively trivial logic changes for the sexp query parser
first before refactoring that logic to share with the infix query
parser.
2022-09-03 08:36:53 -03:00
David Bremner
84e4e130e2 lib/open: create database path in some cases
There is some duplication of code here, but not all of the locations
valid to find a database make sense to create. Furthermore we nead two
passes, so the control flow in _choose_database_path would get a bit
convoluted.
2022-09-03 08:24:43 -03:00
David Bremner
8ba3057d01 lib/open: return non-SUCCESS on missing database path
This simplifies the logic of creating the directory path when it doesn't
exist.
2022-09-03 08:24:43 -03:00
David Bremner
25e2790e30 lib/open: refactor call to mkdir into function
This makes the error handling available for re-use. Using
g_mkdir_with_parents also handles the case of a pre-existing
directory. This introduces new functionality, namely creating the
parent directories, which will be useful for creating directories like
'.local/share/notmuch/default'.
2022-09-03 08:24:43 -03:00
David Bremner
6a9ae99099 lib/sexp: add parameter expansion for regex and wildcard
Fix the bug reported at [1].

The parameter expansion for regex and wildcard modifiers has to be
done a bit differently, because their arguments are not s-expressions
defining complete Xapian queries.

[1]: id:87o7yxqxy6.fsf@code.pm
2022-07-01 08:37:00 -03:00
David Bremner
e7ffb74041 lib/sexp: allow * as alias for "" in range searches.
It can be tedious to use "" inside of a string, e.g. in a shell script.
2022-06-25 19:49:55 -03:00
David Bremner
7863234586 lib/sexp: special case "" as an argument in lastmod ranges.
Support this syntax for constincy with (data from to) ranges.
2022-06-25 19:49:55 -03:00
David Bremner
6f749dd24a lib: check for writable db in n_m_tags_maildir_flags
The database needs to be writable because the list of stored file
names will change in general.
2022-06-25 16:06:34 -03:00
David Bremner
3f27cce71f lib: add NOTMUCH_STATUS_CLOSED_DATABASE, use in _n_d_ensure_writable
In order for a database to actually be writeable, it must be the case that it
is open, not just the correct type of Xapian object. By explicitely
checking, we are able to provide better error reporting, in particular
for the previously broken test in T566-lib-message.
2022-06-25 16:06:18 -03:00
David Bremner
7e654e2a45 lib: Add missing private status values.
These were missed when the corresponding status codes were added.
2022-06-25 16:06:10 -03:00
David Bremner
e70df92085 lib/tag: handle NULL argument to notmuch_tags_valid
Make the behaviour when passed NULL consistent with
notmuch_filenames_valid. The library already passes the result of
notmuch_message_get_tags without checking for NULL, so it should be
handled.
2022-06-25 16:05:45 -03:00
David Bremner
879ec9d76a lib/message: check return status from _n_m_add_{path,folder}_terms
Mainly to propagate information about Xapian exceptions.
2022-06-25 12:55:02 -03:00
David Bremner
b102d0ad11 lib/message: check return status of _n_m_{add,remove}_term
Xapian exceptions are not something that can be ignored, in general.
2022-06-25 12:55:02 -03:00
David Bremner
7d5a9bd3ae lib: define macro NODISCARD
In either C++17 (or later) mode, or when running cppcheck, this can be
used to selectively generate warnings about discarded return values.
2022-06-25 12:55:02 -03:00
David Bremner
bc80ff829a lib/message: drop _notmuch_message_get_thread_id_only
This function has been unused since commit 4083fd8.
2022-06-25 12:55:02 -03:00
David Bremner
f48d2e2ff8 lib/message: catch exceptions in _n_m_add_term
Some code movement is needed to make sure the cache is only
invalidated when the Xapian operation succeeds.
2022-06-25 12:55:02 -03:00
David Bremner
4f8a2d2253 lib/message: use false from stdbool.h
As far as I know, this is just a style / consistency thing, unless
notmuch code starts defining FALSE inconsistently with false.
2022-05-26 08:30:00 -03:00
David Bremner
6810881705 lib: fix uninitialized field in message objects.
Initially reported by Eliza Vasquez [1] (via valgrind).

[1]: id:87o7zxj086.fsf@eliza.
2022-05-26 08:09:32 -03:00
Michael J Gruber
785f9d656d fix build without sfsexp
a1d139de ("lib: add sexp: prefix to Xapian (infix) query parser.",
2022-04-09) introduced sfsexp infix queries. This requires the infix
preprocessor to be built in in a way which does not require sfsexp when
notmuch is built without it.

Make the preprocessor throw a Xapian error in this case (and fix the
build).

Signed-off-by: Michael J Gruber <git@grubix.eu>
2022-04-15 14:17:31 -03:00
David Bremner
a1d139de4d lib: add sexp: prefix to Xapian (infix) query parser.
This is analogous to the "infix" prefix provided by the s-expression
based query parser.
2022-04-15 08:25:46 -03:00
David Bremner
8ed6a172b3 lib: do not phrase parse prefixed bracketed subexpressions
Since Xapian does not preserve quotes when passing the subquery to a
field processor, we have to make a guess as to what the user
intended. Here the added assumption is that a string surrounded by
parens is not intended to be a phrase.
2022-03-19 07:27:29 -03:00
David Bremner
7f8af14bdc lib: bump minor version to 6.
One new status value and one configuration value added.
2022-01-29 18:13:26 -04:00
David Bremner
2c1d1107f5 lib: strip trailing '/' from pathnames (sexp queries).
This changes makes the sexp query parser consistent with the infix one
in ignoring trailing '/'. Here we do a bit better and ignore any
number of trailing '/'.
2022-01-27 07:48:27 -04:00
David Bremner
c62c22c9fb lib: drop trailing slash for path and folder searches (infix)
This resolves an old bug reported by David Edmondson in 2014. The fix
is only needed for the "boolean" case, as probabilistic / phrase
searching already ignores punctuation.

This fix is only for the infix (xapian provided) query parser.

[1]: id:cunoasuolcv.fsf@gargravarr.hh.sledj.net
2022-01-27 07:48:27 -04:00
David Bremner
0a32741fce lib/parse-sexp: handle lastmod queries.
This particular choice of converting strings to integers requires C++11.
2022-01-26 07:41:02 -04:00
David Bremner
77ab961a1d lib/parse-sexp: support actual date queries.
The default argument processing overlaps somewhat with what is already
done in _notmuch_date_strings_to_query, but we can give more specific
error messages for the s-expression context.

The extra generality of _sexp_parse_range will be useful when we
implement additional range prefixes (at least 'lastmod' is needed).
2022-01-26 07:41:02 -04:00
David Bremner
bf3cc5eed2 lib/date: factor out date range parsing.
This will allow re-using the same logic in the s-expression parser.
2022-01-26 07:41:02 -04:00
David Bremner
303f207a54 lib/parse-sexp: support zero argument date queries
These are not too practical, although they may simplify some user
query generation code. Mainly this adds a new prefix keyword to the
parser.
2022-01-26 07:41:02 -04:00
David Bremner
2786aa4d54 lib/database: delete stemmer on destroy
Commit [0] left the stemmer object accessible, but did not add
de-allocation code to notmuch_database_destroy. This commit corrects
that oversight.

Leak originally reported by Austin Ray [1].

[0]: 3202e0d1fe
[1]: id:20220105224538.m36lnjn7rf3ieonc@athena
2022-01-22 21:14:29 -04:00
David Bremner
df7c5acd75 lib/config: move g_key_File_get_string before continue
In [1] Austin Ray reported some memory leaks in
notmuch_database_open. One of those leaks is caused by jumping to the
next key without freeing val. This change avoids that leak.

[1]: id:20220105224538.m36lnjn7rf3ieonc@athena
2022-01-22 21:14:29 -04:00
David Bremner
79936ac93e lib/config: add known config key "show.extra_headers"
Used in a following commit to enable including extra headers beyond
the default in structured output.
2022-01-18 08:09:14 -04:00
David Bremner
fad2e7540b lib/open: no default mail root in split configurations
If we know the configuration is split, but there is no mail root
defined, this indicates a (lack of) configuration error. Currently
this can only arise in XDG configurations.
2022-01-15 15:59:39 -04:00
David Bremner
64212c7b91 lib/config: make sure the config map exists when loading defaults
We should not rely on one of the other "_notmuch_config_load_*"
functions being called before this one.
2022-01-15 15:59:27 -04:00