This reduces thread search's 1+2t Xapian queries (where t is the
number of matched threads) to 1+t queries and constructs exactly one
notmuch_message_t for each message instead of 2 to 3.
notmuch_query_search_threads eagerly fetches the docids of all
messages matching the user query instead of lazily constructing
message objects and fetching thread ID's from term lists.
_notmuch_thread_create takes a seed docid and the set of all matched
docids and uses a single Xapian query to expand this docid to its
containing thread, using the matched docid set to determine which
messages in the thread match the user query instead of using a second
Xapian query.
This reduces the amount of time required to load my inbox from 4.523
seconds to 3.025 seconds (1.5X faster).
We really want to change the thread subject at the same time we set
the date, (if the sort order indicates this is necessary). The
previous code for setting the thread subject was sensitive on the
query sort when adding matching messages. An independent bug fix is
about to change that query sort order, so we remove the dependency on
it here.
These various functions and data are all used only locally, so should
be marked static. Ensuring we get these right will avoid us accidentally
leaking unintended symbols through the library interface.
Admittedly, an author name ending in ',' guarantees this is spam, and
indeed this was triggered by a spam email, but that doesn't mean we
shouldn't handle this case correctly.
We now check that there is actually a component of the name (presumably
the first name) after the comma in the author name.
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Just before releasing 0.3 we received reports of crashes that were
bisected to the commit adding thread-author moving. Sure enough,
valgrind pointed to buffer overruns in _thread_move_matched_author.
Rather than trying to make sense of all the by strncpy, strchr, +1,
and +2 of that code, I reimplemented thread-author ordering with a
pair of hash tables and an array.
Valgrind is at least happy now on the test cases it was complaining
about previously.
This patch only addresses the typical Outlook/Exchange case
where we have "Last, First" <first.last@company.com> or
"Last, First MI" <first.mi.last@company.com>.
In the future we should be more fexible as to the formats
we recognize, but for now we address this one as it is the
Exchange default setting and therefore the most common one.
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
When displaying threads as result of a search it makes sense to list those
authors first who match the search. The matching authors are separated from the
non-matching ones with a '|' instead of a ','
Imagine the default "+inbox" query. Those mails in the thread that
match the query are actually "new" (whatever that means). And some
people seem to think that it would be much better to see those author
names first. For example, imagine a long and drawn out thread that once
was started by me; you have long read the older part of the thread and
removed the inbox tag. Whenever a new email comes in on this thread,
prior to this patch the author column in the search display will first show
"Dirk Hohndel" - I think it should first show the actual author(s) of the new
mail(s).
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Sebastian Spaeth reported [*] a segfault within libnotmuch when
running notmuch operations while an asyncronous offlineimap job had
removed some files from the mail store. Avoid this by handling all
cases where notmuch_message_get_header could return NULL.
[*] See message id:87d3xqti3o.fsf@SSpaeth.de on notmuch@notmuchmail.org
The thread-naming feature depends on the matched messages being passed
down in a precise order, (the order of the top-level search). We fix
the feature by passing that sort order down.
We know that matched messages are always added in order, so we can
always just grab the subject from the first message. This is the same
approach that was used previously in _thread_add_message. That is, the
recent feature of renaming a thread based on the subject of the
"first" matched message is as simple as moving the subject assignment
from _thread_add_message to _thread_add_matched_message.
At the moment all threads are named based on the name of the first message
in the thread. However, this can cause problems if people either start
new threads by replying-all (as unfortunately, many out there do) or
change the subject of their mails to reflect a shift in a thread on a
list.
This patch names threads based on (a) matches for the query, and (b) the
search order. If the search order is oldest-first (as in the default
inbox) it chooses the oldest matching message as the subject. If the
search order is newest-first it chooses the newest one.
Reply prefixes ("Re: ", "Aw: ", "Sv: ", "Vs: ") are ignored
(case-insensitively) so a Re: won't change the subject.
Note that this adds a "sort" argument to _notmuch_thread_create and
_thread_add_matched_message, so that when constructing the thread we can
be aware of the sort order.
Signed-off-by: Jesse Rosenthal <jrosenthal@jhu.edu>
When constructing a thread, we usually run a nested query to find all
messages in the thread that match the original search string. However,
we need to have special-case handling of an original search string of
"*" now that that is a supported means of specifying all messages.
The special-case ends up bein quite simple---we do less work, (just
skipping the nested search since we know that all messages must
match). I had been wanting to write this identical code to more
efficiently handle "notmuch search thread:<foo>" which was previously
running two identical searches. So that case is now more efficient as
well.
We rename 'has_more' to 'valid' so that it can function whether
iterating in a forward or reverse direction. We also rename
'advance' to 'move_to_next' to setup parallel naming with
the proposed functions 'move_to_first', 'move_to_last', and
'move_to_previous'.
As per Carl's request, this patch corrects the only value defined under
the notmuch_message_flag_t enum typedef to match the name of the type.
Signed-off-by: Bart Trojanowski <bart@jukie.net>
When _notmuch_thread_create() is given a query string, it can return more
messages than just those matching the query. To distinguish those that
matched the query expression, the MATCHING_SEARCH flag is set
appropriately.
Signed-off-by: Bart Trojanowski <bart@jukie.net>
This is what most people want for a _search_ command. It's often
different for actually reading mail in an inbox, (where it makes more
sense to have results displayed in chronological order), but in such a
case, ther user is likely using an interface that can simply pass the
--sort=oldest-first option to "notmuch search".
Here we're also change the sort enum from NOTMUCH_SORT_DATE and
NOTMUCH_SORT_DATE_REVERSE to NOTMUCH_SORT_OLDEST_FIRST and
NOTMUCH_SORT_NEWEST_FIRST. Similarly we replace the --reverse option
to "notmuch search" with two options: --sort=oldest-first and
--sort=newest-first.
Finally, these changes are all tracked in the emacs interface, (which
has no change in its behavior).
The message file header parsing code parses only enough of the file to
find the desired header fields, then it leaves the file open until the
next header parsing call or when the message is no longer in use. If a
large number of messages end up being active, this will quickly run
out of file descriptors.
Here, we add support to explicitly close the message file within a
message, (_notmuch_message_close) and call that from thread
construction code.
Signed-off-by: Keith Packard <keithp@keithp.com>
Edited-by: Carl Worth <cworth@cworth.org>:
Many portions of Keith's original patch have since been solved other
ways, (such as the code that changed the handling of the In-Reply-To
header). So the final version is clean enough that I think even Keith
would be happy to have his name on it.
This reduces our reliance on open message_file objects, (so is a step
toward fixing the "too many open files" bug), but more importantly, it
means we don't load a self-referencing in-reply-to header, (since we
weed those out before adding any replyto terms to the database).
We now properly analyze the in-reply-to headers to create a proper
tree representing the actual thread and present the messages in this
correct thread order. Also, there's a new "depth:" value added to the
"message{" header so that clients can format the thread as desired,
(such as by indenting replies).
We were properly sorting the threads based only on matched messages,
but we were displaying the date based on the total messages in the
thread, which led to inconsistent and very confusing results.
Note that the difference between thread results in date order and
thread results in reverse-date order is not simply a matter of
reversing the final results. When sorting in date order, the threads
are sorted by the oldest message in the thread. When sorting in
reverse-date order, the threads are sorted by the newest message in
the thread.
This difference means that we might want an explicit option in the
interface to reverse the order, (even though the default will be to
display the inbox in date order and global searches in reverse-date
order).
Note that we don't print the number of *unread* messages, but instead
the number of messages that matched the search terms. This is in
keeping with our philosophy that the inbox is nothing more than a
search view. If we search for messages with an inbox tag, then that's
what we'll get a count of. (And if somebody does want to see unread
counts, then they can search for the "unread" tag.)
Getting the number of matched messages is really nice when doing
historical searches. For example in a search like:
notmuch search tag:sent
(where the "sent" tag has been applied to all messages originating
from the user's email address)---here it's really nice to be able to
see a thread where the user just mentioned one point [1/13] vs. really
getting involved in the discussion [10/29].
We add a hash to the thread object so that we can detect author names
that have already been added to the list, and avoid adding them
redundantly. This avoids the giant chain of "bugzilla-daemon,
bugzilla-daemon, bugzilla-daemon, bugzilla-daemon, ..." author lists
that we would get otherwise, for example.
We've now expanded the notmuch_thread_create function to fire off a
secondary database query to find all the messages that belong to this
particular thread. This allows us to now have the complete authors'
list for the thread, and will also make it trivial to print accurate
message counts for threads in the future.
It's important to have the names present for determining whether a
thread is worth reading or not. We may want to think about
abbreviating the list somehow if it is excessively long (or redundant
as in bugzilla-daemon, bugzilla-daemon, bugzilla-daemon, etc.).