Now each caller of notmuch_message_get_tags only gets a new iterator,
instead of a whole new list. In principle this could cause problems
with iterating while modifying tags, but through the magic of talloc
references, we keep the old tag list alive even after the cache in the
message object is invalidated.
This reduces my index search from the 3.102 seconds before the unified
metadata pass to 1.811 seconds (1.7X faster). Combined with the
thread search optimization in b3caef1f06,
that makes this query 2.5X faster than when I started.
Even if the caller never uses the file names, there is little cost to
simply fetching the file name terms. However, retrieving the full
paths requires additional database work, so the expansion from terms
to full paths is performed lazily.
This also simplifies clearing the filename cache, since that's now
handled by the generic metadata cache code.
This further reduces my inbox search from 3.102 seconds before the
unified metadata pass to 2.206 seconds (1.4X faster).
Replace _notmuch_convert_tags with this and simplify
_create_filenames_for_terms_with_prefix. This will also come in handy
shortly to get the message file name list.
This replaces the guts of the filename list and tag list, making those
interfaces simple iterators over the generic string list. The
directory, message filename, and tags-related code now build generic
string lists and then wraps them in specific iterators. The real wins
come in later patches, when we use these for even more generic
functionality.
As a nice side-effect, this also eliminates the annoying dependency on
GList in the tag list.
This performs a single pass over a message's term list to fetch the
thread ID, message ID, and reply-to, rather than requiring a pass for
each. Xapian decompresses the term list anew for each iteration, so
this reduces the amount of time spent decompressing message metadata.
This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X
faster).
Such as:
mkdir build
cd build
../configure
make
This is implemented by having the configure script set a srcdir
variable in Makefile.config, and then sprinkling $(srcdir) into
various make rules. We also use vpath directives to convince GNU make
to find the source files from the original source directory.
Don't require the caller of _notmuch_doc_id_set_init to pass in a
correct bound; instead compute it from the array. This simplifies the
caller and makes this interface easier to use correctly.
Remove the repeated "sizeof (doc_ids->bitmap[0])" that bothered cworth
by instead defining macros to compute the word and bit offset of a
given bit in the doc ID set bitmap.
With talloc, we were already freeing all memory by the time we exited
the loop, but that didn't help with excess use of memory inside the
loop, (which was mostly from tallocing some objects with the incorrect
parent).
Thanks to Andrew Tridgell for sitting next to me and teaching me to
use talloc_report_full to find these leaks.
A new "folder:" prefix in the query string can now be used to match
the directories in which mail files are stored.
The addition of this feature causes the recently added
search-by-folder tests to now pass.
Using the local talloc context ensures that the memory we are using
here will be freed shortly, (rather than hanging on for a long time
with the notmuch database object).
This reduces thread search's 1+2t Xapian queries (where t is the
number of matched threads) to 1+t queries and constructs exactly one
notmuch_message_t for each message instead of 2 to 3.
notmuch_query_search_threads eagerly fetches the docids of all
messages matching the user query instead of lazily constructing
message objects and fetching thread ID's from term lists.
_notmuch_thread_create takes a seed docid and the set of all matched
docids and uses a single Xapian query to expand this docid to its
containing thread, using the matched docid set to determine which
messages in the thread match the user query instead of using a second
Xapian query.
This reduces the amount of time required to load my inbox from 4.523
seconds to 3.025 seconds (1.5X faster).
We really want to change the thread subject at the same time we set
the date, (if the sort order indicates this is necessary). The
previous code for setting the thread subject was sensitive on the
query sort when adding matching messages. An independent bug fix is
about to change that query sort order, so we remove the dependency on
it here.
This was a misfeature where notmuch had extra code that just threw
away legitimate information. It was never indexing an initial "Re"
term in a subject. But some users have legitimately wanted to search
for this term.
The original code was written this way merely for strict compatiblity
with the indexing performed by sup, but we're not taking advantage of
that now anyway.
This is to prevent notmuch from destroying any information the user
has encoded as flags in the maildir filename. Tests are also added to
the test suite to verify the documented behavior.
Some people use notmuch with non-maildir files, (for example, email
messages in MH format, or else cool things like using sluk[*] to suck
down feeds into a format that notmuch can index).
To better support uses like that, don't do any renaming for files that
are not in a directory named either "new" or "cur".
[*] https://github.com/krl/sluk/
I had originally hoped for better semantics, such as doing nothing in
non-maildir directories, and preserving unknown maildir flags that
happen to be present.
We could still do those things, of course, but for now, remove them
from the documentation since the implementation does not do these
things yet.
It is totally legitimate for a non-maildir directory to be named "new"
(and not have a directory next to it named "cur"). To support this
case at least, be silent about any rename failure.
If a filename has no maildir info at all, (that is, it does not
contain the sequence ":2,"), we consider this distinct from a filename
with an empty maildir info, (the ":2," separator is present, but no
flags characters follow).
Specifically, we regard a missing info field as providing no
information, so tags will remain unchanged. On the other hand, an info
field that is present but has no flags set will cause various tags to
be cleared, (or in the case of "unread", added).
This fixes the "remove info" case of the maildir-sync tests in the
test suite.
Previously the documentation of notmuch_message_maildir_flags_to_tags
suggested that the presence of a flag would cause tags to be added,
(or in the case of "unread", removed). But the case of absent maildir
flags was not explicitly described.
What we actually want, is that for supported flags, the absence of the
flag in all messages causes the corresponding tag to be removed,
(or in the case of "unread", added). So document that explicitly.
This is the case recently added to the test suite as a failing test,
(so we'll need to do bug fixing before the documentation is honest
here).
We have tests to ensure that when the notmuch library renames a file
that that rename takes place immediately in the database, (without
requiring something like "notmuch new" to notice the change).
This was working when the code was first added, but recently broke in
the reworking of the maildir-synchronization interface since the
tags_to_maildir_flags function can no longer assume that it is being
called as part of _notmuch_message_sync.
Fortunately, the fix is as simple as adding an explicit call to
_notmuch_message_sync.
As documented, this function now iterates over all filenames for the
message, computing a logical OR of the flags set on the filenames,
then uses the final result to set tags on the message.
This change fixes 3 of the 10 maildir-sync tests that have been
failing since being added.
This augments the existing notmuch_message_get_filename by allowing
the caller access to all filenames in the case of multiple files for a
single message.
To support this, we split the iterator (notmuch_filenames_t) away from
the list storage (notmuch_filename_list_t) where previously these were
a single object (notmuch_filenames_t). Then, whenever the user asks
for a file or filename, the message object lazily creates a complete
notmuch_filename_list_t and then:
For notmuch_message_get_filename, returns the first filename
in the list.
For notmuch_message_get_filenames, creates and returns a new
iterator for the filename list.
The new implementation is simply a talloc-based list of strings. The
former support (a list of database terms with a common prefix) is
implemented by simply pre-iterating over the terms and populating the
list. This should provide no performance disadvantage as callers of
thigns like notmuch_directory_get_child_files are very likely to
always iterate over all filenames anyway.
This new implementation of notmuch_filenames_t is in preparation for
adding API to query all of the filenames for a single message.
This rather ugly hack was recently obviated by the removal of the
notmuch_database_set_maildir_sync function. Now, clients must make
explicit calls to do any syncrhonization between maildir flags and
tags. So the library no longer needs to worry about doing inconsistent
synchronization while a message is only partially added.
Instead of having an API for setting a library-wide flag for
synchronization (notmuch_database_set_maildir_sync) we instead
implement maildir synchronization with two new library functions:
notmuch_message_maildir_flags_to_tags
and notmuch_message_tags_to_maildir_flags
These functions are nicely documented here, (though the implementation
does not quite match the documentation yet---as plainly evidenced by
the current results of the test suite).
Tags in a notmuch database affect all messages with the identical
message-ID. But maildir tags affect individual files. And since
multiple files can contain the identical message-ID, there is not a
one-to-one correspondence between messages affected by tags and flags.
This is particularly dangerous with the 'T' (== "trashed") maildir
flag and the corresponding "deleted" tag in the notmuch
database. Since these flags/tags are often used to trigger
irreversible deletion operations, the lack of one-to-one
correspondence can be potentially dangerous.
For example, consider the following sequence:
1. A third-party application is used to identify duplicate messages
in the mail store, and mark all-but-one of each duplicate with
the 'T' flag for subsequent deletion.
2. A "notmuch new" operation reads that 'T' flag, adding the
"deleted" flag to the corresponding messages within the notmuch
database.
3. A subsequent notmuch operation, (such as a "notmuch dump; notmuch
restore" cycle) synchronized the "deleted" tag back to the mail
store, applying the 'T' flag to all(!) filenames with duplicate
message IDs.
4. A third-party application reads the 'T' flags and irreversibly
deletes all mail messages which had any duplicates(!).
In order to avoid this scenario, we simply refuse to synchronize the
'T' flag with the "deleted" tag. Instead, applications can set 'T' and
act on it to delete files, or can set "deleted" and act on it to
delete files. But in either case the semantics are clear and there is
never dangerous propagation through the one-to-many mapping of notmuch
message objects to files.
This adds group [maildir] and key 'synchronize_flags' to the
configuration file. Its value enables (true) or diables (false) the
synchronization between notmuch tags and maildir flags. By default,
the synchronization is disabled.
This patch allows bi-directional synchronization between maildir
flags and certain tags. The flag-to-tag mapping is defined by flag2tag
array.
The synchronization works this way:
1) Whenever notmuch new is executed, the following happens:
o New messages are tagged with configured new_tags.
o For new or renamed messages with maildir info present in the file
name, the tags defined in flag2tag are either added or removed
depending on the flags from the file name.
2) Whenever notmuch tag (or notmuch restore) is executed, a new set of
flags based on the tags is constructed for every message and a new
file name is prepared based on the old file name but with the new
flags. If the flags differs and the old message was in 'new'
directory then this is replaced with 'cur' in the new file name. If
the new and old file names differ, the file is renamed and notmuch
database is updated accordingly.
The rename happens before the database is updated. In case of crash
between rename and database update, the next run of notmuch new
brings the database in sync with the mail store again.
This prevents any of the private functions from being leaked out
through the library interface (at least when compiling with a
recent-enough gcc to support the visibility pragma).
These various functions and data are all used only locally, so should
be marked static. Ensuring we get these right will avoid us accidentally
leaking unintended symbols through the library interface.
This increment is for the recently-added functions:
notmuch_query_get_query_string
notmuch_query_get_sort
These were recently added to the library interface, but the library
version was not incremented at that time, (shame on me).
Hi,
If I want to build Debian package, it fails with the following message:
ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
make[1]: *** [install-lib] Error 1
The reason is that I build the package as a non-root user and make
install invokes ldconfig unconditionally. The following patch contains a
workaround, but I think that a more correct solution would be to check
the condition LIBDIR_IN_LDCONFIG directly when make install is invoked
rather than in configure as it is done now.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Previously, if the underlying search_messages hit an exception and returned
NULL, this function would ignore that and return a non-NULL, (but empty)
threads object. Fix this to properly propagate the error.
Thanks to the new git-based test suite, it's easy to run the whole
test suite in valgrind, (simply "make test OPTIONS="--valgrind"), and
doing so showed this obvious use-after-free bug, (triggered by the
thread-order tests).