This performs a single pass over a message's term list to fetch the
thread ID, message ID, and reply-to, rather than requiring a pass for
each. Xapian decompresses the term list anew for each iteration, so
this reduces the amount of time spent decompressing message metadata.
This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X
faster).
Such as:
mkdir build
cd build
../configure
make
This is implemented by having the configure script set a srcdir
variable in Makefile.config, and then sprinkling $(srcdir) into
various make rules. We also use vpath directives to convince GNU make
to find the source files from the original source directory.
Don't require the caller of _notmuch_doc_id_set_init to pass in a
correct bound; instead compute it from the array. This simplifies the
caller and makes this interface easier to use correctly.
Remove the repeated "sizeof (doc_ids->bitmap[0])" that bothered cworth
by instead defining macros to compute the word and bit offset of a
given bit in the doc ID set bitmap.
With talloc, we were already freeing all memory by the time we exited
the loop, but that didn't help with excess use of memory inside the
loop, (which was mostly from tallocing some objects with the incorrect
parent).
Thanks to Andrew Tridgell for sitting next to me and teaching me to
use talloc_report_full to find these leaks.
A new "folder:" prefix in the query string can now be used to match
the directories in which mail files are stored.
The addition of this feature causes the recently added
search-by-folder tests to now pass.
Using the local talloc context ensures that the memory we are using
here will be freed shortly, (rather than hanging on for a long time
with the notmuch database object).
This reduces thread search's 1+2t Xapian queries (where t is the
number of matched threads) to 1+t queries and constructs exactly one
notmuch_message_t for each message instead of 2 to 3.
notmuch_query_search_threads eagerly fetches the docids of all
messages matching the user query instead of lazily constructing
message objects and fetching thread ID's from term lists.
_notmuch_thread_create takes a seed docid and the set of all matched
docids and uses a single Xapian query to expand this docid to its
containing thread, using the matched docid set to determine which
messages in the thread match the user query instead of using a second
Xapian query.
This reduces the amount of time required to load my inbox from 4.523
seconds to 3.025 seconds (1.5X faster).
We really want to change the thread subject at the same time we set
the date, (if the sort order indicates this is necessary). The
previous code for setting the thread subject was sensitive on the
query sort when adding matching messages. An independent bug fix is
about to change that query sort order, so we remove the dependency on
it here.
This was a misfeature where notmuch had extra code that just threw
away legitimate information. It was never indexing an initial "Re"
term in a subject. But some users have legitimately wanted to search
for this term.
The original code was written this way merely for strict compatiblity
with the indexing performed by sup, but we're not taking advantage of
that now anyway.
This is to prevent notmuch from destroying any information the user
has encoded as flags in the maildir filename. Tests are also added to
the test suite to verify the documented behavior.
Some people use notmuch with non-maildir files, (for example, email
messages in MH format, or else cool things like using sluk[*] to suck
down feeds into a format that notmuch can index).
To better support uses like that, don't do any renaming for files that
are not in a directory named either "new" or "cur".
[*] https://github.com/krl/sluk/
I had originally hoped for better semantics, such as doing nothing in
non-maildir directories, and preserving unknown maildir flags that
happen to be present.
We could still do those things, of course, but for now, remove them
from the documentation since the implementation does not do these
things yet.
It is totally legitimate for a non-maildir directory to be named "new"
(and not have a directory next to it named "cur"). To support this
case at least, be silent about any rename failure.
If a filename has no maildir info at all, (that is, it does not
contain the sequence ":2,"), we consider this distinct from a filename
with an empty maildir info, (the ":2," separator is present, but no
flags characters follow).
Specifically, we regard a missing info field as providing no
information, so tags will remain unchanged. On the other hand, an info
field that is present but has no flags set will cause various tags to
be cleared, (or in the case of "unread", added).
This fixes the "remove info" case of the maildir-sync tests in the
test suite.
Previously the documentation of notmuch_message_maildir_flags_to_tags
suggested that the presence of a flag would cause tags to be added,
(or in the case of "unread", removed). But the case of absent maildir
flags was not explicitly described.
What we actually want, is that for supported flags, the absence of the
flag in all messages causes the corresponding tag to be removed,
(or in the case of "unread", added). So document that explicitly.
This is the case recently added to the test suite as a failing test,
(so we'll need to do bug fixing before the documentation is honest
here).
We have tests to ensure that when the notmuch library renames a file
that that rename takes place immediately in the database, (without
requiring something like "notmuch new" to notice the change).
This was working when the code was first added, but recently broke in
the reworking of the maildir-synchronization interface since the
tags_to_maildir_flags function can no longer assume that it is being
called as part of _notmuch_message_sync.
Fortunately, the fix is as simple as adding an explicit call to
_notmuch_message_sync.
As documented, this function now iterates over all filenames for the
message, computing a logical OR of the flags set on the filenames,
then uses the final result to set tags on the message.
This change fixes 3 of the 10 maildir-sync tests that have been
failing since being added.
This augments the existing notmuch_message_get_filename by allowing
the caller access to all filenames in the case of multiple files for a
single message.
To support this, we split the iterator (notmuch_filenames_t) away from
the list storage (notmuch_filename_list_t) where previously these were
a single object (notmuch_filenames_t). Then, whenever the user asks
for a file or filename, the message object lazily creates a complete
notmuch_filename_list_t and then:
For notmuch_message_get_filename, returns the first filename
in the list.
For notmuch_message_get_filenames, creates and returns a new
iterator for the filename list.
The new implementation is simply a talloc-based list of strings. The
former support (a list of database terms with a common prefix) is
implemented by simply pre-iterating over the terms and populating the
list. This should provide no performance disadvantage as callers of
thigns like notmuch_directory_get_child_files are very likely to
always iterate over all filenames anyway.
This new implementation of notmuch_filenames_t is in preparation for
adding API to query all of the filenames for a single message.
This rather ugly hack was recently obviated by the removal of the
notmuch_database_set_maildir_sync function. Now, clients must make
explicit calls to do any syncrhonization between maildir flags and
tags. So the library no longer needs to worry about doing inconsistent
synchronization while a message is only partially added.
Instead of having an API for setting a library-wide flag for
synchronization (notmuch_database_set_maildir_sync) we instead
implement maildir synchronization with two new library functions:
notmuch_message_maildir_flags_to_tags
and notmuch_message_tags_to_maildir_flags
These functions are nicely documented here, (though the implementation
does not quite match the documentation yet---as plainly evidenced by
the current results of the test suite).
Tags in a notmuch database affect all messages with the identical
message-ID. But maildir tags affect individual files. And since
multiple files can contain the identical message-ID, there is not a
one-to-one correspondence between messages affected by tags and flags.
This is particularly dangerous with the 'T' (== "trashed") maildir
flag and the corresponding "deleted" tag in the notmuch
database. Since these flags/tags are often used to trigger
irreversible deletion operations, the lack of one-to-one
correspondence can be potentially dangerous.
For example, consider the following sequence:
1. A third-party application is used to identify duplicate messages
in the mail store, and mark all-but-one of each duplicate with
the 'T' flag for subsequent deletion.
2. A "notmuch new" operation reads that 'T' flag, adding the
"deleted" flag to the corresponding messages within the notmuch
database.
3. A subsequent notmuch operation, (such as a "notmuch dump; notmuch
restore" cycle) synchronized the "deleted" tag back to the mail
store, applying the 'T' flag to all(!) filenames with duplicate
message IDs.
4. A third-party application reads the 'T' flags and irreversibly
deletes all mail messages which had any duplicates(!).
In order to avoid this scenario, we simply refuse to synchronize the
'T' flag with the "deleted" tag. Instead, applications can set 'T' and
act on it to delete files, or can set "deleted" and act on it to
delete files. But in either case the semantics are clear and there is
never dangerous propagation through the one-to-many mapping of notmuch
message objects to files.
This adds group [maildir] and key 'synchronize_flags' to the
configuration file. Its value enables (true) or diables (false) the
synchronization between notmuch tags and maildir flags. By default,
the synchronization is disabled.
This patch allows bi-directional synchronization between maildir
flags and certain tags. The flag-to-tag mapping is defined by flag2tag
array.
The synchronization works this way:
1) Whenever notmuch new is executed, the following happens:
o New messages are tagged with configured new_tags.
o For new or renamed messages with maildir info present in the file
name, the tags defined in flag2tag are either added or removed
depending on the flags from the file name.
2) Whenever notmuch tag (or notmuch restore) is executed, a new set of
flags based on the tags is constructed for every message and a new
file name is prepared based on the old file name but with the new
flags. If the flags differs and the old message was in 'new'
directory then this is replaced with 'cur' in the new file name. If
the new and old file names differ, the file is renamed and notmuch
database is updated accordingly.
The rename happens before the database is updated. In case of crash
between rename and database update, the next run of notmuch new
brings the database in sync with the mail store again.
This prevents any of the private functions from being leaked out
through the library interface (at least when compiling with a
recent-enough gcc to support the visibility pragma).
These various functions and data are all used only locally, so should
be marked static. Ensuring we get these right will avoid us accidentally
leaking unintended symbols through the library interface.
This increment is for the recently-added functions:
notmuch_query_get_query_string
notmuch_query_get_sort
These were recently added to the library interface, but the library
version was not incremented at that time, (shame on me).
Hi,
If I want to build Debian package, it fails with the following message:
ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
make[1]: *** [install-lib] Error 1
The reason is that I build the package as a non-root user and make
install invokes ldconfig unconditionally. The following patch contains a
workaround, but I think that a more correct solution would be to check
the condition LIBDIR_IN_LDCONFIG directly when make install is invoked
rather than in configure as it is done now.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Previously, if the underlying search_messages hit an exception and returned
NULL, this function would ignore that and return a non-NULL, (but empty)
threads object. Fix this to properly propagate the error.
Thanks to the new git-based test suite, it's easy to run the whole
test suite in valgrind, (simply "make test OPTIONS="--valgrind"), and
doing so showed this obvious use-after-free bug, (triggered by the
thread-order tests).
Various users were confused as to why they couldn't run notmuch
immediately after "make install", (with linker errors saying that
libnotmuch.so could not be found). The errors came from two different
causes:
1. The user had installed to a system library directory, but had not
yet run ldconfig.
2. The user had installed to some non-system directory, and had not
set the LD_LIBRARY_PATH variable.
With this change we fix both problems (on Linux) without the user
having to do anything additional. We first use ldconfig to find the
system library directories. If the user is installing to one of these,
then we run ldconfig as part of "make install".
For case (2) we use the -rpath and --enable-new-dtags linker options
to install a DT_RUNPATH entry in the binary. This entry tells the
dynamic linker where to find libnotmuch. Without the
--enable-new-dtags option only a DT_RPATH option would be installed,
(which has the drawback of not allowing any override with the
LD_LIBRARY_PATH variable).
Distributions (such as Debian and Fedora) don't want to see binaries
packaged with a DT_RPATH or DT_RUNPATH entry. This should be avoided
automatically as long as the packages install to standard locations,
(such as /usr/lib).
Scott Henson reported an internal error that occurred when he tried to
add a message that referenced another message with a message ID well
over 300 characters in length. The bug here was running into a Xapian
limit for the length of metadata key names, (which is even more
restrictive than the Xapian limit for the length of terms).
We fix this by noticing long message ID values and instead using a
message ID of the form "notmuch-sha1-<sha1_sum_of_message_id>". That
is, we use SHA1 to generate a compressed, (but still unique), version
of the message ID.
We add support to the test suite to exercise this fix. The tests add a
message referencing the long message ID, then add the message with the
long message ID, then finally add another message referencing the long
ID. Each of these tests exercise different code paths where the
special handling is implemented.
A final test ensures that all three messages are stitched together
into a single thread---guaranteeing that the three code paths all act
consistently.
Previously we were using Xapian's add_document to allocate document ID
values for notmuch_message_t objects. This had the drawback of adding
a partially constructed mail document to the database. If notmuch was
subsequently interrupted before fully populating this document, then
later runs would be quite confused when seeing the partial documents.
There are reports from the wild of people hitting internal errors of
the form "Message ... has no thread ID" for example, (which is
currently an unrecoverable error).
We fix this by manually allocating document IDs without adding
documents. With this change, we never call Xapian's add_document
method, but only replace_document with either the current document ID
of a message or a new one that we have allocated.
Admittedly, an author name ending in ',' guarantees this is spam, and
indeed this was triggered by a spam email, but that doesn't mean we
shouldn't handle this case correctly.
We now check that there is actually a component of the name (presumably
the first name) after the comma in the author name.
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Just before releasing 0.3 we received reports of crashes that were
bisected to the commit adding thread-author moving. Sure enough,
valgrind pointed to buffer overruns in _thread_move_matched_author.
Rather than trying to make sense of all the by strncpy, strchr, +1,
and +2 of that code, I reimplemented thread-author ordering with a
pair of hash tables and an array.
Valgrind is at least happy now on the test cases it was complaining
about previously.
With this patch the Received: header becomes special in the way
we treat headers - this is the only header for which we concatenate
all the instances we find (instead of just returning the first one).
This will be used in the From guessing code for replies as we need to
be able to walk ALL of the Received: headers in a message to have a
good chance to guess which mailbox this email was delivered to.
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
This patch only addresses the typical Outlook/Exchange case
where we have "Last, First" <first.last@company.com> or
"Last, First MI" <first.mi.last@company.com>.
In the future we should be more fexible as to the formats
we recognize, but for now we address this one as it is the
Exchange default setting and therefore the most common one.
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
When displaying threads as result of a search it makes sense to list those
authors first who match the search. The matching authors are separated from the
non-matching ones with a '|' instead of a ','
Imagine the default "+inbox" query. Those mails in the thread that
match the query are actually "new" (whatever that means). And some
people seem to think that it would be much better to see those author
names first. For example, imagine a long and drawn out thread that once
was started by me; you have long read the older part of the thread and
removed the inbox tag. Whenever a new email comes in on this thread,
prior to this patch the author column in the search display will first show
"Dirk Hohndel" - I think it should first show the actual author(s) of the new
mail(s).
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
message->authors contains the author's name (as we want to print it)
get / set methods are declared in notmuch-private.h
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Our current approach is for top-level entry poitns in the library
to have try/catch blocks that catch any Xapian exception and print
a message. Add a few missing blocks and fix up the documentation.
Sebastian Spaeth reported [*] a segfault within libnotmuch when
running notmuch operations while an asyncronous offlineimap job had
removed some files from the mail store. Avoid this by handling all
cases where notmuch_message_get_header could return NULL.
[*] See message id:87d3xqti3o.fsf@SSpaeth.de on notmuch@notmuchmail.org
Previously, we always sorted the returned results by some string value,
(newest-to-oldest by default), however in some cases (as when applying
tags to a search result) we are not interested in any special order.
This introduces a NOTMUCH_SORT_UNSORTED value that does just that. It is
not used at the moment anywhere in the code.
Signed-off-by: Sebastian Spaeth <Sebastian@SSpaeth.de>
The thread-naming feature depends on the matched messages being passed
down in a precise order, (the order of the top-level search). We fix
the feature by passing that sort order down.
We know that matched messages are always added in order, so we can
always just grab the subject from the first message. This is the same
approach that was used previously in _thread_add_message. That is, the
recent feature of renaming a thread based on the subject of the
"first" matched message is as simple as moving the subject assignment
from _thread_add_message to _thread_add_matched_message.
At the moment all threads are named based on the name of the first message
in the thread. However, this can cause problems if people either start
new threads by replying-all (as unfortunately, many out there do) or
change the subject of their mails to reflect a shift in a thread on a
list.
This patch names threads based on (a) matches for the query, and (b) the
search order. If the search order is oldest-first (as in the default
inbox) it chooses the oldest matching message as the subject. If the
search order is newest-first it chooses the newest one.
Reply prefixes ("Re: ", "Aw: ", "Sv: ", "Vs: ") are ignored
(case-insensitively) so a Re: won't change the subject.
Note that this adds a "sort" argument to _notmuch_thread_create and
_thread_add_matched_message, so that when constructing the thread we can
be aware of the sort order.
Signed-off-by: Jesse Rosenthal <jrosenthal@jhu.edu>
When constructing a thread, we usually run a nested query to find all
messages in the thread that match the original search string. However,
we need to have special-case handling of an original search string of
"*" now that that is a supported means of specifying all messages.
The special-case ends up bein quite simple---we do less work, (just
skipping the nested search since we know that all messages must
match). I had been wanting to write this identical code to more
efficiently handle "notmuch search thread:<foo>" which was previously
running two identical searches. So that case is now more efficient as
well.
This encodes the library version into the library, where the linking
binary can pick it up, and the linker can even enforce mismatches in
the minor release, (such as linking a binary against version 1.2 and
then attempting to run it against version 1.1).
I'm not sure which system Aaron used, but on the machine I have access
to, (Darwin 8.11.0), the -shared and -dylib_install_name options are
not recognized. Instead I use -dynamic_lib and -install_name as
documented here:
http://www.finkproject.org/doc/porting/shared.php
This patch adds a configure check for OS X (actually Darwin),
and sets up the Makefiles to build a proper shared library on
that platform.
Signed-off-by: Aaron Ecay <aaronecay@gmail.com>
notmuch previously unconditionally checked mime parts for various
properties, but not for NULL, which is the case if libgmime encounters
an empty mime part.
Upon encounter of an empty mime part, the following is printed to
stderr (the second line due to my patch):
(process:17197): gmime-CRITICAL **: g_mime_message_get_mime_part: assertion `GMIME_IS_MESSAGE (message)' failed
Warning: Not indexing empty mime part.
This is probably a bug that should get addressed in libgmime, but for
not, my patch is an acceptable workaround.
Signed-off-by: martin f. krafft <madduck@madduck.net>
Reviewed-by: Carl Worth <cworth@cworth.org>:
The original proposal for having different open modes used the name
WRITABLE. I didn't like that name, (easy to misspell as WRITEABLE even
for native English speakers). So we renamed it to READ_WRITE
immediately, but apparently some of the documentation held the old
name for a while.
Previously, we were only adding the reference terms for cases where
the referenced message did not yet exist in the database. For thread
presentation, it's useful to have the connection information provided
by the references, even when the messages are present. So add this
term unconditionally.
This function was recently modified, (to include a metadata lookup for
a message's thread ID before looking for parent/child thread IDs), but
the documentation wasn't updated. Fix that.
There are two primary cases in this function, (the message exists in
the database or it does not). Previously the code for these two cases
was split and intermingled with goto-spaghetti connections.
This allows us to thread messages even when we receive them out of
order, or never receive the root.
The thread ids for messages that aren't present but are referred to are
stored as metadata in the database and then retrieved if we ever get
that message.
When determining the thread id for a message we also check for this
metadata so that we can thread descendants of a message together before
we receive it.
Edited by Carl Worth <cworth@cworth.org>: Split this portion of the
commit from the earlier-applied portion adding test cases.
This seems like a generally useful thing to support, (but the previous
support through an empty string was not convenient for some users,
(such as the command-line client).
fix notmuch_message_file_get_header to always return the first instance
of the header you are looking for
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
With the original quiet function, there's an actual purpose (hiding
excessively long compiler command lines so that warnings and errors
from the compiler can be seen).
But with things like quiet_symlink there's nothing quieter. In fact
"SYMLINK" is longer than "ln -sf". So all this is doing is hiding the
actual command from the user for no real benefit.
The only actual reason we implemented the quiet_* functions was to be
able to neatly right-align the command name and left-align the arguments.
Let's give up on that, and just left-align everything, simplifying the
Makefiles considerably. Now, the only instances of a captialized command
name in the output is if there's some actually shortening of the command
itself.
We add a magic line to the beginning of each Makefile.local file to
help the editor know that it should use makefile mode for editing the
file, (even though the filename isn't exactly "Makefile").
Edited-by: Carl Worth <cworth@cworth.org>: Expand treatment from
emacs/Makefile.local to each instance of Makefile.local.
The idea here is to allow a new user of notmuch to be able to run
notmuch immediately after compiling, (without having to install
the shared library first). This also ensures that the test suite
tests the locally compiled library, and not whatever installled
version of the library the dynamic linker happens to find.
When I wanted to create a debian package from the current master, make
install failed because of non-existent include directory. This patch
fixes this minor issue.
We had a fairly ugly violation of modularity with the top-level
Makefile.local isntalling everything, (even when the build commands
for the library were down in lib/Makefile.local).