Rather than shelling out once per message to get the list of files
corresponding to tags, it is much faster (although potentially a bit
memory intensive) to read them all at once.
The original nmbug format (now called version 0) creates 1
subdirectory of 'tags/' per message. This causes problems for more
than (roughly) 100k messages.
Version 1 introduces 2 layers of hashed directories. This scheme was
chose to balance the number of subdirectories with the number of extra
directories (and git objects) created via hashing.
This should be upward compatible in the sense that old repositories
will continue to work with the updated notmuch-git.
Commits or checkouts that modify a large fraction of the messages in
the database should be relatively rare (and in some automated process,
probably non-existent). For initial setup, where such operations are
expected, the user can pass --force.
This is probably more convenient than always passing a command line
argument.
Use notmuch-config for consistency with other notmuch CLI tools.
Now that there is something relevant in the config files, test the
--config option.
The previous defaults were not suitable for personal (i.e. not
bugtracking for notmuch development) use.
Provide two ways for the user to select nmbug compatible defaults;
command line argument and checking the name of the script.
If the private index file matches a previously known revision of the
database, we can update the index incrementally using the recorded
lastmod counter. This is typically much faster than a full update,
although it could be slower in the case of large changes to the
database.
The "git-read-tree HEAD" is also a bottleneck, but unfortunately
sometimes is needed. Cache the index checksum and hash to reduce the
number of times the operation is run. The overall design is a
simplified version of the PrivateIndex class.
Perf will show which binaries are using the CPU cycles, and standard
python profilers will show which python functions, but neither is
great at finding which call to an external binary is taking time, or
locating I/O hotspots.
Unlike the (current) infix query parser provided by Xapian, the
notmuch specific sexp query parser supports prefixed wildcard queries,
so use those. In addition to being somewhat faster, this avoids
needing to escape all of the user's tags to pass via the shell.
Although the code required to support both new and old environment
variables is small, it complicates the semantics of configuration, and
make the documentation harder to follow.
For folks that want to start versioning a new tag-space, instead of
cloning one that someone else has already started.
The empty-blob hash-object call avoids errors like:
$ nmbug commit
error: invalid object 100644 e69de29bb2 for
'tags/...'
fatal: git-write-tree: error building trees
'git HASH(0x9ef3eb8) write-tree' exited with nonzero value
David Bremner suggested [1]:
$ git hash-object -w /dev/null
instead of my Python version of:
$ git hash-object -w --stdin <&-
but I expect that closing stdin is more portable than the /dev/null
path (which doesn't exist on Windows, for example).
The --bare init and use of NMBGIT as the work tree (what could go
wrong with an empty commit?) are suggestions from Michal Sojka [2].
[1]: id:87y4vu6uvf.fsf@maritornes.cs.unb.ca
http://thread.gmane.org/gmane.mail.notmuch.general/18626/focus=18720
[2]: id:87a93a5or2.fsf@resox.2x.cz
http://thread.gmane.org/gmane.mail.notmuch.general/19495/focus=19767
Debian stable had python 3.4.2 3 releases ago (approximately 6 years
ago), so attempting to keep track of the changes in python is probably
no longer worthwhile. We already require python 3.5 for the
python-cffi bindings (although those are not yet used in notmuch-git).