If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.
If we can decrypt, we add the property index.decryption=success.
If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.
Note that a single message may have both values of the
"index.decryption" property: "success" and "failure". For example,
consider a message that includes multiple layers of encryption. If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").
Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing. This means it will subsequently
correspond to the actual semantics of the stored index.
This allows us to create new properties that will be automatically set
during indexing, and cleared during re-indexing, just by choice of
property name.
C99 stdbool turned 18 this year. There really is no reason to use our
own, except in the library interface for backward
compatibility. Convert the lib internally to stdbool.
I considered a higher level interface where the caller passes a tag
name rather than a flag character, but the role of the "unread" tag is
particularly confusing with such an interface.
There are at least three places in notmuch that can trigger an
indexing action:
* notmuch new
* notmuch insert
* notmuch reindex
I have plans to add some indexing options (e.g. indexing the cleartext
of encrypted parts, external filters, automated property injection)
that should properly be available in all places where indexing
happens.
I also want those indexing options to be exposed by (and constrained
by) the libnotmuch C API.
This isn't yet an API break because we've never made a release with
notmuch_param_t.
These indexing options are relevant in the listed places (and in the
libnotmuch analogues), but they aren't relevant in the other kinds of
functionality that notmuch offers (e.g. dump/restore, tagging, search,
show, reply).
So i think a generic "param" object isn't well-suited for this case.
In particular:
* a param object sounds like it could contain parameters for some
other (non-indexing) operation. This sounds confusing -- why would
i pass non-indexing parameters to a function that only does
indexing?
* bremner suggests online a generic param object would actually be
passed as a list of param objects, argv-style. In this case (at
least in the obvious argv implementation), the params might be some
sort of generic string. This introduces a problem where the API of
the library doesn't grow as new options are added, which means that
when code outside the library tries to use a feature, it first has
to test for it, and have code to handle it not being available.
The indexopts approach proposed here instead makes it clear at
compile time and at dynamic link time that there is an explicit
dependency on that feature, which allows automated tools to keep
track of what's needed and keeps the actual code simple.
My proposal adds the notmuch_indexopts_t as an opaque struct, so that
we can extend the list of options without causing ABI breakage.
The cost of this proposal appears to be that the "boilerplate" API
increases a little bit, with a generic constructor and destructor
function for the indexopts struct.
More patches will follow that make use of this indexopts approach.
This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
This operation is relatively inexpensive, as the needed metadata is
already computed by our lazy metadata fetching. The goal is to support
better UI for messages with multipile files.
Commit d5523ead90 ("Mark some structures in the library interface
with visibility=default attribute.") fixed some mixed visibility
issues with structs. With the symbol default visibility reversed, this
is no longer a problem.
The index(3) function has been deprecated in POSIX since 2001 and
removed in 2008, and most code in notmuch already calls strchr(3).
This fixes a compilation error on Android whose libc does not have
index(3).
This function was deprecated in notmuch 0.21. We re-use the name for
a status returning version, and deprecate the _st name. One or two
remaining uses of the (removed) non-status returning version fixed at
the same time
The object where pointer to `data` was received was deleted before
it was used in _notmuch_string_list_append().
Relevant Coverity messages follow:
3: extract
Assigning: data = std::__cxx11::string(message->doc.()).c_str(),
which extracts wrapped state from temporary of type std::__cxx11::string.
4: dtor_free
The internal representation of temporary of type std::__cxx11::string
is freed by its destructor.
5: use after free:
Wrapper object use after free (WRAPPER_ESCAPE)
Using internal representation of destroyed object local data.
For reasons not completely understood at this time, gmime (as of
2.6.22) is returning a date before 1900 on bad date input. Since this
confuses some other software, we clamp such dates to 0,
i.e. 1970-01-01.
The retries are hardcoded to a small number, and error handling aborts
than propagating errors from notmuch_database_reopen. These are both
somewhat justified by the assumption that most things that can go
wrong in Xapian::Database::reopen are rare and fatal. Here's the brief
discussion with Xapian upstream:
24-02-2017 08:12:57 < bremner> any intuition about how likely
Xapian::Database::reopen is to fail? I'm catching a
DatabaseModifiedError somewhere where handling any further errors is
tricky, and wondering about treating a failed reopen as as "the
impossible happened, stopping"
24-02-2017 16:22:34 < olly> bremner: there should not be much scope for
failure - stuff like out of memory or disk errors, which are probably a
good enough excuse to stop
Many of the external links found in the notmuch source can be resolved
using https instead of http. This changeset addresses as many as i
could find, without touching the e-mail corpus or expected outputs
found in tests.
Cleaned the following whitespace in lib/* files:
lib/index.cc: 1 line: trailing whitespace
lib/database.cc 5 lines: 8 spaces at the beginning of line
lib/notmuch-private.h: 4 lines: 8 spaces at the beginning of line
lib/message.cc: 1 line: trailing whitespace
lib/sha1.c: 1 line: empty lines at the end of file
lib/query.cc: 2 lines: 8 spaces at the beginning of line
lib/gen-version-script.sh: 1 line: trailing whitespace
To fully complete the ghost-on-removal-when-shared-thread-exists
proposal, we need to clear all ghost messages when the last active
message is removed from a thread.
Amended by db: Remove the last test of T530, as it no longer makes sense
if we are garbage collecting ghost messages.
There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.
Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
implement ghost-on-removal, the solution to T590-thread-breakage.sh
that just adds a ghost message after removing each message.
It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.
Note that _resolve_message_id_to_thread_id already introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.
This adds a new document value that stores the revision of the last
modification to message metadata, where the revision number increases
monotonically with each database commit.
An alternative would be to store the wall-clock time of the last
modification of each message. In principle this is simpler and has
the advantage that any process can determine the current timestamp
without support from libnotmuch. However, even assuming a computer's
clock never goes backward and ignoring clock skew in networked
environments, this has a fatal flaw. Xapian uses (optimistic)
snapshot isolation, which means reads can be concurrent with writes.
Given this, consider the following time line with a write and two read
transactions:
write |-X-A--------------|
read 1 |---B---|
read 2 |---|
The write transaction modifies message X and records the wall-clock
time of the modification at A. The writer hangs around for a while
and later commits its change. Read 1 is concurrent with the write, so
it doesn't see the change to X. It does some query and records the
wall-clock time of its results at B. Transaction read 2 later starts
after the write commits and queries for changes since wall-clock time
B (say the reads are performing an incremental backup). Even though
read 1 could not see the change to X, read 2 is told (correctly) that
X has not changed since B, the time of the last read. In fact, X
changed before wall-clock time A, but the change was not visible until
*after* wall-clock time B, so read 2 misses the change to X.
This is tricky to solve in full-blown snapshot isolation, but because
Xapian serializes writes, we can use a simple, monotonically
increasing database revision number. Furthermore, maintaining this
revision number requires no more IO than a wall-clock time solution
because Xapian already maintains statistics on the upper (and lower)
bound of each value stream.
Previously, we updated the database copy of a message on every call to
_notmuch_message_sync, even if nothing had changed. In particular,
this always happens on a thaw, so a freeze/thaw pair with no
modifications between still caused a database update.
We only modify message documents in a handful of places, so keep track
of whether the document has been modified and only sync it when
necessary. This will be particularly important when we add message
revision tracking.
You may wonder why _notmuch_message_file_open_ctx has two parameters.
This is because we need sometime to use a ctx which is a
notmuch_message_t. While we could get the database from this, there is
no easy way in C to tell type we are getting.
This is not supposed to change any functionality from an end user
point of view. Note that it will eliminate some output to stderr. The
query debugging output is left as is; it doesn't really fit with the
current primitive logging model. The remaining "bad" fprintf will need
an internal API change.
Tamas Szakaly points out [1] that the bug fixed in 51b073c still
exists in at least one place. This change follows the suggestion of
[2] and creates a block scope temporary std::string to avoid the rules
of iterators temporaries.
[1]: id:20141226113755.GA64154@pamparam
[2]: id:20141226230655.GA41992@pamparam
This updates the message abstraction to support ghost messages: it
adds a message flag that distinguishes regular messages from ghost
messages, and an internal function for initializing a newly created
(blank) message as a ghost message.
In the interest of robustness, avoid undefined behavior of
sortable_unserialise if the date value is missing. This shouldn't
happen now, but ghost messages will have blank date values.
Previously, this was performed by notmuch_database_add_message. This
happens to be the only caller currently (which is why this was safe),
but we're about to introduce more callers, and it makes more sense to
put responsibility for ID compression in the lower-level function
rather than requiring each caller to handle it.
Previously, there was no protection against a caller invoking an
operation on an old database version that would effectively corrupt
the database by treating it like a newer version.
According to notmuch.h, any caller that opens the database in
read/write mode is supposed to check if the database needs upgrading
and perform an upgrade if it does. This would protect against this,
but nobody (even the CLI) actually does this.
However, with features, it's easy to protect against incompatible
operations on a fine-grained basis. This lightweight change allows
callers to safely operate on old database versions, while preventing
specific operations that would corrupt the database with an
informative error message.
Commit 567bcbc2 introduced support for storing various headers in
document values. However, doing so in a backwards-compatible way
meant that genuinely empty header values could not be distinguished
from the old behavior of not storing the headers at all, so these
required parsing the original message.
Now that we have database features, new databases can declare that all
messages have header values, so if we have this feature flag, we can
use the stored header value even if it's the empty string.
This requires slight cleanup to notmuch_message_get_header, since the
code previously couldn't distinguish between empty headers and headers
that are never stored in the database (previously this distinction
didn't matter).
Previously, we invalidated stored message metadata in
_notmuch_message_add_term and _notmuch_message_remove_term, but not in
_notmuch_message_gen_terms. This doesn't currently result in any bugs
because of our limited uses of _notmuch_message_gen_terms, but it may
could cause trouble in the future.
As noted in devel/STYLE, every private library function should start
with _notmuch. This patch corrects function naming that did not adhere
to this style in lib/notmuch-private.h. In particular, the old function
names that now begin with _notmuch are
notmuch_sha1_of_file
notmuch_sha1_of_string
notmuch_message_file_close
notmuch_message_file_get_header
notmuch_message_file_open
notmuch_message_get_author
notmuch_message_set_author
Signed-off-by: Charles Celerier <cceleri@cs.stanford.edu>
This adds a 100 termpos gap between all phrases indexed by
_notmuch_message_gen_terms. This fixes a bug where terms from the end
of one header and the beginning of another header could match together
in a single phrase and a separate bug where term positions of
un-prefixed terms overlapped.
This fix only affects newly indexed messages. Messages that are
already indexed won't benefit from this fix without re-indexing, but
the fix won't make things any worse for existing messages.
In xapian terms, convert folder: prefix from probabilistic to boolean
prefix, matching the paths, relative from the maildir root, of the
message files, ignoring the maildir new and cur leaf directories.
folder:foo matches all message files in foo, foo/new, and foo/cur.
folder:foo/new does *not* match message files in foo/new.
folder:"" matches all message files in the top level maildir and its
new and cur subdirectories.
This change constitutes a database change: bump the database version
and add database upgrade support for folder: terms. The upgrade also
adds path: terms.
Finally, fix the folder search test for literal folder: search, as
some of the folder: matching capabilities are lost in the
probabilistic to boolean prefix change.