Since the file size will have changed, there is no performance benefit
to calling fdatasync. Somewhat surprisingly, using fdatasync
apparently causes portability problems on FreeBSD.
The main goal is to support gzipped output for future internal
calls (e.g. from notmuch-new) to notmuch_database_dump.
The additional dependency is not very heavy since xapian already pulls
in zlib.
We want the dump to be "atomic", in the sense that after running the
dump file is either present and complete, or not present. This avoids
certain classes of mishaps involving overwriting a good backup with a
bad or partial one.
Apart from the status codes for format mismatches, the non-zero exit
status codes have been arbitrary. Make the cli consistently return
either EXIT_SUCCESS or EXIT_FAILURE.
This allows specifying config file as a top level argument to notmuch,
and generally makes it possible to override config file options in
main(), without having to touch the subcommands.
If the config file does not exist, one will be created for the notmuch
main command and setup and help subcommands. Help is special in this
regard; the config is created just to avoid errors about missing
config, but it will not be saved.
This also makes notmuch config the talloc context for subcommands.
We now have a notmuch_config_is_new() function to query whether a
config was created or not. Change the notmuch_config_open() is_new
parameter into boolean create_new to determine whether the function
should create a new config if one doesn't exist. This reduces the
complexity of the API.
This switches the new batch-tag format away from using a home-grown
hex-encoding scheme for message IDs in the dump to simply using Xapian
queries with Xapian quoting syntax.
This has a variety of advantages beyond presenting a cleaner and more
consistent interface. Foremost is that it will dramatically simplify
the quoting for batch tagging, which shares the same input format.
While the hex-encoding is no better or worse for the simple ID queries
used by dump/restore, it becomes onerous for general-purpose queries
used in batch tagging. It also better handles strange cases like
"id:foo and bar", since this is no longer syntactically valid.
When we switch to using regular Xapian queries in the dump format, \n
will cause problems, so we disallow it. Specially, while Xapian can
quote and parse queries containing \n without difficultly, quoted
queries containing \n still span multiple lines, which breaks the
line-orientedness of the dump format. Strictly speaking, we could
still round-trip these, but it would significantly complicate restore
as well as scripts that deal with tag dumps. This complexity would
come at absolutely no benefit: because of the RFC 2822 unfolding
rules, no amount of standards negligence can produce a message with a
message ID containing a line break (not even Outlook can do it!).
Hence, we simply disallow it.
sup is the old format, and remains the default, at least until
restore is converted to parse this format.
Each line of the batch-tag format is modelled on the syntax of notmuch tag:
- "notmuch tag" is omitted from the front of the line
- The dump format only uses query strings of a single message-id.
- Each space seperated tag/message-id is 'hex-encoded' to remove
trouble-making characters.
- It is permitted (and will be useful) for there to be no tags before
the query.
In particular this format won't have the same problem with e.g. spaces
in message-ids or tags; they will be round-trip-able.
The syntax --output=filename is a smaller change than deleting the
output argument completely, and conceivably useful e.g. when running
notmuch under a debugger.
It has been a long-standing issue that notmuch_database_open doesn't
return any indication of why it failed. This patch changes its
prototype to return a notmuch_status_t and set an out-argument to the
database itself, like other functions that return both a status and an
object.
In the interest of atomicity, this also updates every use in the CLI
so that notmuch still compiles. Since this patch does not update the
bindings, the Python bindings test fails.
Asking xapian to sort the messages for us causes suboptimal IO patterns. This
would be useful, if we only wanted the first few results, but since we want
everything anyway, this is pessimization.
On 2011-10-29, a measurement on a 372981 messages instance showed that wall
time can be reduced from 28 minutes (sorted by Message-ID) to 15 minutes
(unsorted).
Timings on 189605 messages:
$ time notmuch.old dump
19.48user 5.83system 12:10.42elapsed 3%CPU (0avgtext+0avgdata 110656maxresident)k
3629584inputs+22720outputs (33major+7073minor)pagefaults 0swaps
$ echo 3 > /proc/sys/vm/drop_caches
$ time notmuch.new
14.89user 1.20system 3:23.58elapsed 7%CPU (0avgtext+0avgdata 46032maxresident)k
1256264inputs+22464outputs (43major+1990minor)pagefaults 0swaps
previously we deleted the subcommand name from argv before passing to
the subcommand. In this version, the deletion is done in the actual
subcommands. Although this causes some duplication of code, it allows
us to be more flexible about how we parse command line arguments in
the subcommand, including possibly using off-the-shelf routines like
getopt_long that expect the name of the command in argv[0].
We print an intentionally non-specific message on stderr, since it
isn't clear if there will be some global output file argument to
replace.
We update the test suite atomically, since it relies on having the
same text in two files.
The main motivation here is allow the fast dumping of tag data for
messages having certain tags. In practice it seems too slow to pipe
dump to grep.
All dump-restore tests should be working now, so we update test/dump-restore
accordingly
We rename 'has_more' to 'valid' so that it can function whether
iterating in a forward or reverse direction. We also rename
'advance' to 'move_to_next' to setup parallel naming with
the proposed functions 'move_to_first', 'move_to_last', and
'move_to_previous'.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
The library interface now allows the caller to do incremental searches,
(such as one page of results at a time). Next we'll just need to hook
this up to "notmuch search" and the emacs interface.
All of the following commands:
notmuch dump
notmuch reply
notmuch restore
notmuch search
notmuch show
notmuch tag
were calling notmuch_database_open with an argument of NULL. This was
a legitimate call until the recent addition of configuration, after
which it is expected that all commands will lookup the correct path in
the configuration file. So fix all these commands to do that.
Also, while touching all of these commands, we fix them to use the
talloc context that is passed in rather than creating a local talloc
context. We also switch from using goto for return values, to doing
direct returns as soon as an error is detected, (which can be leak
free thanks to talloc).
Now that the client sources are alone here in their own directory,
(with all the library sources down inside the lib directory), we can
break the client up into multiple files without mixing the files up.
The hope is that these smaller files will be easier to manage and
maintain.