Xapian::TermIterator::operator* returns std::string which is destroyed
as soon as (*i).c_str() finishes. The remembered pointer 'term' then
references invalid memory.
Signed-off-by: Vladimir Marek <vlmarek@volny.cz>
Previously, thread.cc built up a list of all messages, then
proceeded to tear it apart to transform it into a list of
top-level messages. Now we simply build a new list of top-level
messages.
This simplifies the interface to _notmuch_message_add_reply,
eliminates the pointer acrobatics from
_resolve_thread_relationships, and will enable us to do things
with the list of all messages in the following patches.
Previously, notmuch new only synchronized maildir flags to tags for
files with a maildir "info" part. Since messages in new/ don't have
an info part, notmuch would ignore them for flag-to-tag
synchronization.
This patch makes notmuch consider messages in new/ to be legitimate
maildir messages that simply have no maildir flags set. The most
visible effect of this is that such messages now automatically get the
unread tag.
Previously, we synchronized flags to tags for any message that looked
like it had maildir flags in its file name, regardless of whether it
was in a maildir-like directory structure. This was asymmetric with
tag-to-flag synchronization, which only applied to messages in
directories named new/ and cur/ (introduced by 95dd5fe5).
This change makes our interpretation stricter and addresses this
asymmetry by only synchronizing flags to tags for messages in
directories named new/ or cur/. It also prepares us to treat messages
in new/ as maildir messages, even though they lack maildir flags.
This way notmuch_message_maildir_flags_to_tags can call it. It makes
more sense for this to be just above all of the maildir
synchronization code rather than mixed in the middle.
Previously, if passed a filename with a directory that did not exist
in the database, _notmuch_message_remove_filename would needlessly
create that directory document. Fix it so that doesn't happen.
Now _notmuch_database_filename_to_direntry takes a flags argument and
can indicate if the necessary directory documents do not exist.
Again, callers have been updated, but retain their original behavior.
This is a rebase and cleanup of Istvan Marko's patch from
id:m3pqnj2j7a.fsf@zsu.kismala.com
Search retrieves these headers for every message in the search
results. Previously, this required opening and parsing every message
file. Storing them directly in the database significantly reduces IO
and computation, speeding up search by between 50% and 10X.
Taking full advantage of this requires a database rebuild, but it will
fall back to the old behavior for messages that do not have headers
stored in the database.
Previously, the functions notmuch_database_find_message() and
notmuch_database_find_message_by_filename() functions did not properly
report error condition to the library user.
For more information, read the thread on the notmuch mailing list
starting with my mail "id:871uv2unfd.fsf@gmail.com"
Make these functions accept a pointer to 'notmuch_message_t' as argument
and return notmuch_status_t which may be used to check for any error
condition.
restore: Modify for the new notmuch_database_find_message()
new: Modify for the new notmuch_database_find_message_by_filename()
Previously, notmuch_database_remove_message would remove the message
file name, sync the change to the message document, re-find the
message document, and then delete it if there were no more file names.
An interruption after sync'ing would result in a file-name-less,
permanently un-removable zombie message that would produce errors and
odd results in searches. We could wrap this in an atomic section, but
it's much simpler to eliminate the round-about approach and just
delete the message document instead of sync'ing it if we removed the
last filename.
Previously, this function would synchronize the folder list even if
removing the file name failed. Now it returns immediately if removing
the file name fails.
Add removal of all ZXFOLDER terms to removal of all XFOLDER terms for
each message filename removal.
The existing filename-list reindexing will put all the needed terms
back in. Test search-folder-coherence now passes.
Signed-off-by:Mark Anderson <ma.skies@gmail.com>
Various typo fixes in comments within the source code.
Signed-off-by: Pieter Praet <pieter@praet.org>
Edited-by: Carl Worth <cworth@cworth.org> Restricted to just
source-code comments, (and fixed fix of "descriptios" to "descriptors"
rather than "descriptions").
As of gcc 4.6, there are new warnings from -Wattributes along the lines of:
warning: ‘_notmuch_messages’ declared with greater visibility
than the type of its field ‘_notmuch_messages::iterator’
[-Wattributes]
To squelch these, we decorate all such containing structs with
__attribute__((visibility("default"))). We take care to let only the
C++ compiler see this, (since the C compiler would otherwise warn
about ignored visibility attributes on types).
gcc (at least as of version 4.6.0) is kind enough to point these out to us,
(when given -Wunused-but-set-variable explicitly or implicitly via -Wunused
or -Wall).
One of these cases was a legitimately unused variable. Two were simply
variables (named ignored) we were assigning only to squelch a warning about
unused function return values. I don't seem to be getting those warnings
even without setting the ignored variable. And the gcc docs. say that the
correct way to squelch that warning is with a cast to (void) anyway.
Now each caller of notmuch_message_get_tags only gets a new iterator,
instead of a whole new list. In principle this could cause problems
with iterating while modifying tags, but through the magic of talloc
references, we keep the old tag list alive even after the cache in the
message object is invalidated.
This reduces my index search from the 3.102 seconds before the unified
metadata pass to 1.811 seconds (1.7X faster). Combined with the
thread search optimization in b3caef1f06,
that makes this query 2.5X faster than when I started.
Even if the caller never uses the file names, there is little cost to
simply fetching the file name terms. However, retrieving the full
paths requires additional database work, so the expansion from terms
to full paths is performed lazily.
This also simplifies clearing the filename cache, since that's now
handled by the generic metadata cache code.
This further reduces my inbox search from 3.102 seconds before the
unified metadata pass to 2.206 seconds (1.4X faster).
Replace _notmuch_convert_tags with this and simplify
_create_filenames_for_terms_with_prefix. This will also come in handy
shortly to get the message file name list.
This replaces the guts of the filename list and tag list, making those
interfaces simple iterators over the generic string list. The
directory, message filename, and tags-related code now build generic
string lists and then wraps them in specific iterators. The real wins
come in later patches, when we use these for even more generic
functionality.
As a nice side-effect, this also eliminates the annoying dependency on
GList in the tag list.
This performs a single pass over a message's term list to fetch the
thread ID, message ID, and reply-to, rather than requiring a pass for
each. Xapian decompresses the term list anew for each iteration, so
this reduces the amount of time spent decompressing message metadata.
This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X
faster).
A new "folder:" prefix in the query string can now be used to match
the directories in which mail files are stored.
The addition of this feature causes the recently added
search-by-folder tests to now pass.
This reduces thread search's 1+2t Xapian queries (where t is the
number of matched threads) to 1+t queries and constructs exactly one
notmuch_message_t for each message instead of 2 to 3.
notmuch_query_search_threads eagerly fetches the docids of all
messages matching the user query instead of lazily constructing
message objects and fetching thread ID's from term lists.
_notmuch_thread_create takes a seed docid and the set of all matched
docids and uses a single Xapian query to expand this docid to its
containing thread, using the matched docid set to determine which
messages in the thread match the user query instead of using a second
Xapian query.
This reduces the amount of time required to load my inbox from 4.523
seconds to 3.025 seconds (1.5X faster).
This is to prevent notmuch from destroying any information the user
has encoded as flags in the maildir filename. Tests are also added to
the test suite to verify the documented behavior.
Some people use notmuch with non-maildir files, (for example, email
messages in MH format, or else cool things like using sluk[*] to suck
down feeds into a format that notmuch can index).
To better support uses like that, don't do any renaming for files that
are not in a directory named either "new" or "cur".
[*] https://github.com/krl/sluk/
It is totally legitimate for a non-maildir directory to be named "new"
(and not have a directory next to it named "cur"). To support this
case at least, be silent about any rename failure.
If a filename has no maildir info at all, (that is, it does not
contain the sequence ":2,"), we consider this distinct from a filename
with an empty maildir info, (the ":2," separator is present, but no
flags characters follow).
Specifically, we regard a missing info field as providing no
information, so tags will remain unchanged. On the other hand, an info
field that is present but has no flags set will cause various tags to
be cleared, (or in the case of "unread", added).
This fixes the "remove info" case of the maildir-sync tests in the
test suite.
We have tests to ensure that when the notmuch library renames a file
that that rename takes place immediately in the database, (without
requiring something like "notmuch new" to notice the change).
This was working when the code was first added, but recently broke in
the reworking of the maildir-synchronization interface since the
tags_to_maildir_flags function can no longer assume that it is being
called as part of _notmuch_message_sync.
Fortunately, the fix is as simple as adding an explicit call to
_notmuch_message_sync.
As documented, this function now iterates over all filenames for the
message, computing a logical OR of the flags set on the filenames,
then uses the final result to set tags on the message.
This change fixes 3 of the 10 maildir-sync tests that have been
failing since being added.
This augments the existing notmuch_message_get_filename by allowing
the caller access to all filenames in the case of multiple files for a
single message.
To support this, we split the iterator (notmuch_filenames_t) away from
the list storage (notmuch_filename_list_t) where previously these were
a single object (notmuch_filenames_t). Then, whenever the user asks
for a file or filename, the message object lazily creates a complete
notmuch_filename_list_t and then:
For notmuch_message_get_filename, returns the first filename
in the list.
For notmuch_message_get_filenames, creates and returns a new
iterator for the filename list.
This rather ugly hack was recently obviated by the removal of the
notmuch_database_set_maildir_sync function. Now, clients must make
explicit calls to do any syncrhonization between maildir flags and
tags. So the library no longer needs to worry about doing inconsistent
synchronization while a message is only partially added.
Instead of having an API for setting a library-wide flag for
synchronization (notmuch_database_set_maildir_sync) we instead
implement maildir synchronization with two new library functions:
notmuch_message_maildir_flags_to_tags
and notmuch_message_tags_to_maildir_flags
These functions are nicely documented here, (though the implementation
does not quite match the documentation yet---as plainly evidenced by
the current results of the test suite).
Tags in a notmuch database affect all messages with the identical
message-ID. But maildir tags affect individual files. And since
multiple files can contain the identical message-ID, there is not a
one-to-one correspondence between messages affected by tags and flags.
This is particularly dangerous with the 'T' (== "trashed") maildir
flag and the corresponding "deleted" tag in the notmuch
database. Since these flags/tags are often used to trigger
irreversible deletion operations, the lack of one-to-one
correspondence can be potentially dangerous.
For example, consider the following sequence:
1. A third-party application is used to identify duplicate messages
in the mail store, and mark all-but-one of each duplicate with
the 'T' flag for subsequent deletion.
2. A "notmuch new" operation reads that 'T' flag, adding the
"deleted" flag to the corresponding messages within the notmuch
database.
3. A subsequent notmuch operation, (such as a "notmuch dump; notmuch
restore" cycle) synchronized the "deleted" tag back to the mail
store, applying the 'T' flag to all(!) filenames with duplicate
message IDs.
4. A third-party application reads the 'T' flags and irreversibly
deletes all mail messages which had any duplicates(!).
In order to avoid this scenario, we simply refuse to synchronize the
'T' flag with the "deleted" tag. Instead, applications can set 'T' and
act on it to delete files, or can set "deleted" and act on it to
delete files. But in either case the semantics are clear and there is
never dangerous propagation through the one-to-many mapping of notmuch
message objects to files.
This adds group [maildir] and key 'synchronize_flags' to the
configuration file. Its value enables (true) or diables (false) the
synchronization between notmuch tags and maildir flags. By default,
the synchronization is disabled.
This patch allows bi-directional synchronization between maildir
flags and certain tags. The flag-to-tag mapping is defined by flag2tag
array.
The synchronization works this way:
1) Whenever notmuch new is executed, the following happens:
o New messages are tagged with configured new_tags.
o For new or renamed messages with maildir info present in the file
name, the tags defined in flag2tag are either added or removed
depending on the flags from the file name.
2) Whenever notmuch tag (or notmuch restore) is executed, a new set of
flags based on the tags is constructed for every message and a new
file name is prepared based on the old file name but with the new
flags. If the flags differs and the old message was in 'new'
directory then this is replaced with 'cur' in the new file name. If
the new and old file names differ, the file is renamed and notmuch
database is updated accordingly.
The rename happens before the database is updated. In case of crash
between rename and database update, the next run of notmuch new
brings the database in sync with the mail store again.
Previously we were using Xapian's add_document to allocate document ID
values for notmuch_message_t objects. This had the drawback of adding
a partially constructed mail document to the database. If notmuch was
subsequently interrupted before fully populating this document, then
later runs would be quite confused when seeing the partial documents.
There are reports from the wild of people hitting internal errors of
the form "Message ... has no thread ID" for example, (which is
currently an unrecoverable error).
We fix this by manually allocating document IDs without adding
documents. With this change, we never call Xapian's add_document
method, but only replace_document with either the current document ID
of a message or a new one that we have allocated.
message->authors contains the author's name (as we want to print it)
get / set methods are declared in notmuch-private.h
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
We rename 'has_more' to 'valid' so that it can function whether
iterating in a forward or reverse direction. We also rename
'advance' to 'move_to_next' to setup parallel naming with
the proposed functions 'move_to_first', 'move_to_last', and
'move_to_previous'.
The sequential identifiers have the advantage of being guaranteed to
be unique (until we overflow a 64-bit unsigned integer), and also take
up half as much space in the "notmuch search" output (16 columns
rather than 32).
This change also has the side effect of fixing a bug where notmuch
could block on /dev/random at startup (waiting for some entropy to
appear). This bug was hit hard by the test suite, (which could easily
exhaust the available entropy on common systems---resulting in large
delays of the test suite).