To fully complete the ghost-on-removal-when-shared-thread-exists
proposal, we need to clear all ghost messages when the last active
message is removed from a thread.
Amended by db: Remove the last test of T530, as it no longer makes sense
if we are garbage collecting ghost messages.
There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.
Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
implement ghost-on-removal, the solution to T590-thread-breakage.sh
that just adds a ghost message after removing each message.
It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.
Note that _resolve_message_id_to_thread_id already introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.
This adds a new document value that stores the revision of the last
modification to message metadata, where the revision number increases
monotonically with each database commit.
An alternative would be to store the wall-clock time of the last
modification of each message. In principle this is simpler and has
the advantage that any process can determine the current timestamp
without support from libnotmuch. However, even assuming a computer's
clock never goes backward and ignoring clock skew in networked
environments, this has a fatal flaw. Xapian uses (optimistic)
snapshot isolation, which means reads can be concurrent with writes.
Given this, consider the following time line with a write and two read
transactions:
write |-X-A--------------|
read 1 |---B---|
read 2 |---|
The write transaction modifies message X and records the wall-clock
time of the modification at A. The writer hangs around for a while
and later commits its change. Read 1 is concurrent with the write, so
it doesn't see the change to X. It does some query and records the
wall-clock time of its results at B. Transaction read 2 later starts
after the write commits and queries for changes since wall-clock time
B (say the reads are performing an incremental backup). Even though
read 1 could not see the change to X, read 2 is told (correctly) that
X has not changed since B, the time of the last read. In fact, X
changed before wall-clock time A, but the change was not visible until
*after* wall-clock time B, so read 2 misses the change to X.
This is tricky to solve in full-blown snapshot isolation, but because
Xapian serializes writes, we can use a simple, monotonically
increasing database revision number. Furthermore, maintaining this
revision number requires no more IO than a wall-clock time solution
because Xapian already maintains statistics on the upper (and lower)
bound of each value stream.
Previously, we updated the database copy of a message on every call to
_notmuch_message_sync, even if nothing had changed. In particular,
this always happens on a thaw, so a freeze/thaw pair with no
modifications between still caused a database update.
We only modify message documents in a handful of places, so keep track
of whether the document has been modified and only sync it when
necessary. This will be particularly important when we add message
revision tracking.
You may wonder why _notmuch_message_file_open_ctx has two parameters.
This is because we need sometime to use a ctx which is a
notmuch_message_t. While we could get the database from this, there is
no easy way in C to tell type we are getting.
This is not supposed to change any functionality from an end user
point of view. Note that it will eliminate some output to stderr. The
query debugging output is left as is; it doesn't really fit with the
current primitive logging model. The remaining "bad" fprintf will need
an internal API change.
Tamas Szakaly points out [1] that the bug fixed in 51b073c still
exists in at least one place. This change follows the suggestion of
[2] and creates a block scope temporary std::string to avoid the rules
of iterators temporaries.
[1]: id:20141226113755.GA64154@pamparam
[2]: id:20141226230655.GA41992@pamparam
This updates the message abstraction to support ghost messages: it
adds a message flag that distinguishes regular messages from ghost
messages, and an internal function for initializing a newly created
(blank) message as a ghost message.
In the interest of robustness, avoid undefined behavior of
sortable_unserialise if the date value is missing. This shouldn't
happen now, but ghost messages will have blank date values.
Previously, this was performed by notmuch_database_add_message. This
happens to be the only caller currently (which is why this was safe),
but we're about to introduce more callers, and it makes more sense to
put responsibility for ID compression in the lower-level function
rather than requiring each caller to handle it.
Previously, there was no protection against a caller invoking an
operation on an old database version that would effectively corrupt
the database by treating it like a newer version.
According to notmuch.h, any caller that opens the database in
read/write mode is supposed to check if the database needs upgrading
and perform an upgrade if it does. This would protect against this,
but nobody (even the CLI) actually does this.
However, with features, it's easy to protect against incompatible
operations on a fine-grained basis. This lightweight change allows
callers to safely operate on old database versions, while preventing
specific operations that would corrupt the database with an
informative error message.
Commit 567bcbc2 introduced support for storing various headers in
document values. However, doing so in a backwards-compatible way
meant that genuinely empty header values could not be distinguished
from the old behavior of not storing the headers at all, so these
required parsing the original message.
Now that we have database features, new databases can declare that all
messages have header values, so if we have this feature flag, we can
use the stored header value even if it's the empty string.
This requires slight cleanup to notmuch_message_get_header, since the
code previously couldn't distinguish between empty headers and headers
that are never stored in the database (previously this distinction
didn't matter).
Previously, we invalidated stored message metadata in
_notmuch_message_add_term and _notmuch_message_remove_term, but not in
_notmuch_message_gen_terms. This doesn't currently result in any bugs
because of our limited uses of _notmuch_message_gen_terms, but it may
could cause trouble in the future.
As noted in devel/STYLE, every private library function should start
with _notmuch. This patch corrects function naming that did not adhere
to this style in lib/notmuch-private.h. In particular, the old function
names that now begin with _notmuch are
notmuch_sha1_of_file
notmuch_sha1_of_string
notmuch_message_file_close
notmuch_message_file_get_header
notmuch_message_file_open
notmuch_message_get_author
notmuch_message_set_author
Signed-off-by: Charles Celerier <cceleri@cs.stanford.edu>
This adds a 100 termpos gap between all phrases indexed by
_notmuch_message_gen_terms. This fixes a bug where terms from the end
of one header and the beginning of another header could match together
in a single phrase and a separate bug where term positions of
un-prefixed terms overlapped.
This fix only affects newly indexed messages. Messages that are
already indexed won't benefit from this fix without re-indexing, but
the fix won't make things any worse for existing messages.
In xapian terms, convert folder: prefix from probabilistic to boolean
prefix, matching the paths, relative from the maildir root, of the
message files, ignoring the maildir new and cur leaf directories.
folder:foo matches all message files in foo, foo/new, and foo/cur.
folder:foo/new does *not* match message files in foo/new.
folder:"" matches all message files in the top level maildir and its
new and cur subdirectories.
This change constitutes a database change: bump the database version
and add database upgrade support for folder: terms. The upgrade also
adds path: terms.
Finally, fix the folder search test for literal folder: search, as
some of the folder: matching capabilities are lost in the
probabilistic to boolean prefix change.
The path: prefix is a literal boolean prefix matching the paths,
relative from the maildir root, of the message files.
path:foo matches all message files in foo (but not in foo/new or
foo/cur).
path:foo/new matches all message files in foo/new.
path:"" matches all message files in the top level maildir.
path:foo/** matches all message files in foo and recursively in all
subdirectories of foo.
path:** matches all message files recursively, i.e. all messages.
Currently if a Xapian exception happens in notmuch_message_get_header,
the exception is not caught leading to crash. In
notmuch_message_get_date the exception is caught, but an internal error
is raised, again leading to crash.
This patch fixes the error handling by making both functions catch the
Xapian exceptions, print an error and return NULL or 0.
The 'notmuch->exception_reported' is also set, as is done elsewhere,
even if I don't really get the idea of that field.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@iki.fi>
notmuch_message_tags_to_maildir_flags() unconditionally moves messages from
maildir directory "new/" to maildir directory "cur/", which makes messages lose
their "new" status in the MUA. However some users want to keep this "new"
status after, for instance, an auto-tagging of new messages.
However, as Austin mentioned and according to the maildir specification,
messages living in "new/" are not allowed to have flags, even if mutt allows it
to happen. For this reason, this patch prevents moving messages from "new/" to
"cur/", only if no flags have to be changed. It's hopefully enough to satisfy
mutt (and maybe other MUAs showing the "new" status) users checking the "new"
status.
Changelog:
* v2: Fix bool type as well as NULL returned despite having no errors (Austin
Clements)
* v4: Tag the related test (contributed by Michal Sojka) as working
Signed-off-by: Louis Rilling <l.rilling@av7.net>
[Condition for keeping messages in new/ was extended to satisfy all
tests from the previous patch. -Michal Sojka]
[Added by David Bremner, to keep the tests passing at each commit]
update insert tests for new maildir synchronization rules
As of id:1355952747-27350-4-git-send-email-sojkam1@fel.cvut.cz
we are more conservative about moving messages from ./new to ./cur.
This updates the insert tests to match
Xapian::TermIterator::operator* returns std::string which is destroyed
as soon as (*i).c_str() finishes. The remembered pointer 'term' then
references invalid memory.
Signed-off-by: Vladimir Marek <vlmarek@volny.cz>
Previously, thread.cc built up a list of all messages, then
proceeded to tear it apart to transform it into a list of
top-level messages. Now we simply build a new list of top-level
messages.
This simplifies the interface to _notmuch_message_add_reply,
eliminates the pointer acrobatics from
_resolve_thread_relationships, and will enable us to do things
with the list of all messages in the following patches.
Previously, notmuch new only synchronized maildir flags to tags for
files with a maildir "info" part. Since messages in new/ don't have
an info part, notmuch would ignore them for flag-to-tag
synchronization.
This patch makes notmuch consider messages in new/ to be legitimate
maildir messages that simply have no maildir flags set. The most
visible effect of this is that such messages now automatically get the
unread tag.
Previously, we synchronized flags to tags for any message that looked
like it had maildir flags in its file name, regardless of whether it
was in a maildir-like directory structure. This was asymmetric with
tag-to-flag synchronization, which only applied to messages in
directories named new/ and cur/ (introduced by 95dd5fe5).
This change makes our interpretation stricter and addresses this
asymmetry by only synchronizing flags to tags for messages in
directories named new/ or cur/. It also prepares us to treat messages
in new/ as maildir messages, even though they lack maildir flags.
This way notmuch_message_maildir_flags_to_tags can call it. It makes
more sense for this to be just above all of the maildir
synchronization code rather than mixed in the middle.
Previously, if passed a filename with a directory that did not exist
in the database, _notmuch_message_remove_filename would needlessly
create that directory document. Fix it so that doesn't happen.
Now _notmuch_database_filename_to_direntry takes a flags argument and
can indicate if the necessary directory documents do not exist.
Again, callers have been updated, but retain their original behavior.
This is a rebase and cleanup of Istvan Marko's patch from
id:m3pqnj2j7a.fsf@zsu.kismala.com
Search retrieves these headers for every message in the search
results. Previously, this required opening and parsing every message
file. Storing them directly in the database significantly reduces IO
and computation, speeding up search by between 50% and 10X.
Taking full advantage of this requires a database rebuild, but it will
fall back to the old behavior for messages that do not have headers
stored in the database.
Previously, the functions notmuch_database_find_message() and
notmuch_database_find_message_by_filename() functions did not properly
report error condition to the library user.
For more information, read the thread on the notmuch mailing list
starting with my mail "id:871uv2unfd.fsf@gmail.com"
Make these functions accept a pointer to 'notmuch_message_t' as argument
and return notmuch_status_t which may be used to check for any error
condition.
restore: Modify for the new notmuch_database_find_message()
new: Modify for the new notmuch_database_find_message_by_filename()
Previously, notmuch_database_remove_message would remove the message
file name, sync the change to the message document, re-find the
message document, and then delete it if there were no more file names.
An interruption after sync'ing would result in a file-name-less,
permanently un-removable zombie message that would produce errors and
odd results in searches. We could wrap this in an atomic section, but
it's much simpler to eliminate the round-about approach and just
delete the message document instead of sync'ing it if we removed the
last filename.
Previously, this function would synchronize the folder list even if
removing the file name failed. Now it returns immediately if removing
the file name fails.
Add removal of all ZXFOLDER terms to removal of all XFOLDER terms for
each message filename removal.
The existing filename-list reindexing will put all the needed terms
back in. Test search-folder-coherence now passes.
Signed-off-by:Mark Anderson <ma.skies@gmail.com>
Various typo fixes in comments within the source code.
Signed-off-by: Pieter Praet <pieter@praet.org>
Edited-by: Carl Worth <cworth@cworth.org> Restricted to just
source-code comments, (and fixed fix of "descriptios" to "descriptors"
rather than "descriptions").
As of gcc 4.6, there are new warnings from -Wattributes along the lines of:
warning: ‘_notmuch_messages’ declared with greater visibility
than the type of its field ‘_notmuch_messages::iterator’
[-Wattributes]
To squelch these, we decorate all such containing structs with
__attribute__((visibility("default"))). We take care to let only the
C++ compiler see this, (since the C compiler would otherwise warn
about ignored visibility attributes on types).
gcc (at least as of version 4.6.0) is kind enough to point these out to us,
(when given -Wunused-but-set-variable explicitly or implicitly via -Wunused
or -Wall).
One of these cases was a legitimately unused variable. Two were simply
variables (named ignored) we were assigning only to squelch a warning about
unused function return values. I don't seem to be getting those warnings
even without setting the ignored variable. And the gcc docs. say that the
correct way to squelch that warning is with a cast to (void) anyway.
Now each caller of notmuch_message_get_tags only gets a new iterator,
instead of a whole new list. In principle this could cause problems
with iterating while modifying tags, but through the magic of talloc
references, we keep the old tag list alive even after the cache in the
message object is invalidated.
This reduces my index search from the 3.102 seconds before the unified
metadata pass to 1.811 seconds (1.7X faster). Combined with the
thread search optimization in b3caef1f06,
that makes this query 2.5X faster than when I started.
Even if the caller never uses the file names, there is little cost to
simply fetching the file name terms. However, retrieving the full
paths requires additional database work, so the expansion from terms
to full paths is performed lazily.
This also simplifies clearing the filename cache, since that's now
handled by the generic metadata cache code.
This further reduces my inbox search from 3.102 seconds before the
unified metadata pass to 2.206 seconds (1.4X faster).