This change addresses two known issues with large sets of changes to
the database. The first is that as reported by Steven Allen [1],
notmuch commits are not "flushed" when they complete, which means that
if there is an open transaction when the database closes (or e.g. the
program crashes) then all changes since the last commit will be
discarded (nothing is irrecoverably lost for "notmuch new", as the
indexing process just restarts next time it is run). This does not
really "fix" the issue reported in [1]; that seems rather difficult
given how transactions work in Xapian. On the other hand, with the
default settings, this should mean one only loses less than a minutes
worth of work. The second issue is the occasionally reported "storm"
of disk writes when notmuch finishes. I don't yet have a test for
this, but I think committing as we go should reduce the amount of work
when finalizing the database.
[1]: id:20151025210215.GA3754@stebalien.com
Unfortunately, it doesn't make a difference if we call
cancel_transaction or not, all uncommited changes are discarded if
there is an open (unflushed) transaction.
Since most memory allocation is (ultimately) in the talloc context
defined by a notmuch_database_t pointer, this gives a more complete
view of memory still allocated at program shutdown.
We also change the talloc report in notmuch.c to mode "a" to avoid
clobbering the newly reported log.
This prevents the message document getting multiple thread-id terms
when there are multiple files with the same message-id.
This change shifts some thread ids, requiring adjustments to other tests.
Although this default worked for "notmuch config get", it didn't work
most other places. Restore the previous functionality, with the
wrinkle that XDG locations will shadow $HOME/mail if they exist.
This fixes a bug reported by Jack Kamm in id:87eeefdc8b.fsf@gmail.com
In principle this could be done without depending on C++11 features,
but these features should be available since gcc 4.8.1, and this
localized usage is easy to replace if it turns out to be problematic
for portability.
Prior to 0.32, notmuch had the (undocumented) behaviour that it
expanded a relative value of database.path with respect to $HOME. In
0.32 this was special cased for database.path but broken for
database.mail_root, which causes problems for at least notmuch-new
when database.path is set to a relative path.
The change in T030-config.sh reflects a user visible, but hopefully
harmless behaviour change; the expanded form of the paths will now be
printed by notmuch config.
When compat canonicalize_file_name was introduced, it was limited to
C code only because it was used by C code only during that time.
>From 5ec6fd4d, (lib/open: check for split configuration when creating
database., 2021-02-16), lib/open.cc, which is C++, relies on the
existent of canonicalize_file_name.
However, we can't blindly enable canonicalize_file_name for C++ code,
because different implementation has different additional signature for
C++ and users can arbitrarily add -DHAVE_CANONICALIZE_FILE_NAME=0 to
{C,CXX}FLAGS.
Let's move our implementation into a util library.
Helped-by: Tomi Ollila <tomi.ollila@iki.fi>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Ignoring this return value seems like a bad idea in general, and in
particular it has been hiding one or more bugs related to handling
long directory names.
This is intended to fix the slow behaviour of "notmuch new" (and possibly
"notmuch reindex") when large numbers of files are deleted.
The underlying issue [1] seems to be the Xapian glass backend spending
a large amount of time in db.has_positions when running queries with
large-ish amounts of unflushed changes.
This commit removes two uses of Xapian queries [2], and replaces them with
an approximation of what Xapian would do after optimizing the
queries. This avoids the calls to has_positions (which are in any case
un-needed because we are only using boolean terms here).
[1] Thanks to "andres" on IRC for narrowing down the performance
bottleneck.
[2] Thanks to Olly Betts of Xapian fame for talking me a through a fix
that does not require people to update Xapian.
Since the library searches in several locations for a config file, the
caller does not know which of these is chosen in the usual case of
passing NULL as a config file. This changes provides an API for the
caller to retrieve the name of the config file chosen. It will be
tested in a following commit.
Eventually we want to do all opening of databases in the top
level (main function). This means that detection of missing databases
needs to move out of subcommands. It also requires updating the
library to use the new NO_DATABASE status code.
This matches functionality in the the CLI function
notmuch_config_get_database_path, which was previously used in the CLI
code for all calls to open a database.
The layer of shims here seems a bit wasteful compared to just calling
the corresponding string map functions directly, but it allows control
over the API (calling with notmuch_database_t *) and flexibility for
future changes.
_trial_open can't know if the PATH_ERROR return value will cause the
error message to be returned from the library, so it's up the caller
to clean up if not.
Like the hook directory, we primarily need a way to communicate this
directory between various components, but we may as well let the user
configure it.
Most of the diff is generalizing choose_hook_dir to work for both
backup and hook directories.
Choose sibling directory of xapian database, as .notmuch may not
exist.
libgen.h is already used in debugger.c, so it is not a new dependency
/ potential portability problem.
This changes some error reporting, either intentionally by reporting
the highest level missing directory, or by side effect from looking in
XDG locations when given null database location.
The main functionality will be tested when notmuch-new is converted to
support split configuration. Here only the somewhat odd case of split
mail root which is actually symlinked to the database path is tested.
Introduce a new configuration value for the mail root, and use it to
locate mail messages in preference to the database.path (which
previously implied the mail messages were also in this location.
Initially only a subset of the CLI is tested in a split
configuration. Further changes will be needed for the remainder of the
CLI to work in split configurations.
The idea is to allow reuse in n_d_create_with_config. This is
primarily code movement, with some changes in error messages to reduce
the number of input parameters.
This is slightly more tidy, but more importantly it allows for re-use
of this code in n_d_create_with_config. That re-use will be crucial
when we no longer call n_d_open_with_config from
n_d_create_with_config.
This removes duplication between the struct element and the
configuration string_map entry. Create a simple wrapper for setting
the database path that makes sure the trailing / is stripped.