Since Xapian has a limit on the maximum length of a term, we have
to check for that before trying to add the message ID as a term.
This fixes the bug reported by Mike Hommey here:
<20091120132625.GA19246@glandium.org>
I've also constructed 20 files with a range of message ID lengths
centered around the Xapian term-length limit which I'll use to seed a
new test suite soon.
If an earlier exception occurred, then it's not unexpected for the
flush to fail as well. So in that case, we'll silently catch the
exception. Otherwise, make some noise about things going wrong at the
time of flush.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
In my script containing a series of queries to be run on new mail for
setting up tags, it's nice to see which query I typed wrong.
Signed-off-by: Eric Anholt <eric@anholt.net>
More fallout from _get_header now returning "" for missing headers.
The bug here is that we would no longer detect that a file is not an
email message and give up on it like we should.
And this time, I actually audited all callers to
notmuch_message_get_header, so hopefully we're done fixing this
bug over and over.
There's been a fair amount of fallout from when we changed
message_file_get_header from returning NULL to returning "" for
missing headers. This is yet more fallout from that, (where we were
accepting an empty message-ID rather than generating one like we want
to).
This eliminates a crash when a message (either corrupted or a non-mail
file that wasn't properly detected as not being mail) has no In-Reply-To
header, (and so few terms that trying to skip to the prefix of the
In-Reply-To terms actually brings us to the end of the termlist).
As suggested by Keith in FLAG_PURE_NOT allows for expressions like:
notmuch search NOT tag:inbox
Note that this way a search like:
notmuch search foobar NOT tag:inbox
should not be written instead:
notmuch search foobar AND NOT tag:inbox
In my opinion, the latter feels more natural and is somewhat more explicit.
It gives a better clue of what the search is about instead of assuming that
an implicit AND operator is there.
This was recently introduced in commit:
64c03ae97f
which was adding extra checks to avoid adding a self-referencing
message.
How many times am I going to fix a dumb regression like this and say
"we really need a test suite" before I actually sit down and write the
test suite?
This is what most people want for a _search_ command. It's often
different for actually reading mail in an inbox, (where it makes more
sense to have results displayed in chronological order), but in such a
case, ther user is likely using an interface that can simply pass the
--sort=oldest-first option to "notmuch search".
Here we're also change the sort enum from NOTMUCH_SORT_DATE and
NOTMUCH_SORT_DATE_REVERSE to NOTMUCH_SORT_OLDEST_FIRST and
NOTMUCH_SORT_NEWEST_FIRST. Similarly we replace the --reverse option
to "notmuch search" with two options: --sort=oldest-first and
--sort=newest-first.
Finally, these changes are all tracked in the emacs interface, (which
has no change in its behavior).
We had exposed this to the internal implementation for a short time,
(only while we had the silly code fetching In-Reply-To values from
message files instead of from the database). Make this private again
as it should be.
Maybe ths lack of this documentation is why I forgot we were actually
storing this and wrote the ugly code to fetch In-Reply-To from message
files rather than from the database.
Which is more consistent with the XREFERENCE prefix used in the terms
in the database. Also remove some stale documentation describing the
removal of resolved references from the database (we no longer do
this).
Calling continue here worked only because we set a flag before the
continue, and, check the flag at the beginning of the loop, and *then*
break. It's much more clear to just break in the first place.
The message file header parsing code parses only enough of the file to
find the desired header fields, then it leaves the file open until the
next header parsing call or when the message is no longer in use. If a
large number of messages end up being active, this will quickly run
out of file descriptors.
Here, we add support to explicitly close the message file within a
message, (_notmuch_message_close) and call that from thread
construction code.
Signed-off-by: Keith Packard <keithp@keithp.com>
Edited-by: Carl Worth <cworth@cworth.org>:
Many portions of Keith's original patch have since been solved other
ways, (such as the code that changed the handling of the In-Reply-To
header). So the final version is clean enough that I think even Keith
would be happy to have his name on it.
In our scheme it's illegal for any message to refer to itself, (nor
would it be useful for anything anyway). Cut these self-references off
at the source, before they trip up any internal errors.
This case was happening when a message had its own message ID in its
In-Reply-To header. The thread-resolution code would find the
partially constructed message, (with no thread ID yet), get garbage
from this function, and then march right along with that garbage.
With this commit, a self-cyclic message like this will now trigger an
internal error rather than marching along silienty. (And a subsequent
commit will remove the call to this function in this case.)
This function has only one caller, and that one caller was passing the
same value for both talloc_owner and the notmuch database. Dropping
the redundant argument simplifies the documentation of this function
considerably.
This reduces our reliance on open message_file objects, (so is a step
toward fixing the "too many open files" bug), but more importantly, it
means we don't load a self-referencing in-reply-to header, (since we
weed those out before adding any replyto terms to the database).
When this function was originally written, the 'message' object was
always destroyed locally, so I thought it would be good to use a NULL
talloc context to make it more obvious if there was any leak.
Since then, however, this function has been changed to optionally
return the added message, and in that case we *don't* free the message
locally, so let's let the database be the talloc context.
We now properly analyze the in-reply-to headers to create a proper
tree representing the actual thread and present the messages in this
correct thread order. Also, there's a new "depth:" value added to the
"message{" header so that clients can format the thread as desired,
(such as by indenting replies).
The existing notmuch_message_get_header is *almost* good enough for
this, except that we also need to remove the '<' and '>'
delimiters. We'll probably want to implement this function with
database storage in the future rather than loading the email message.
This prototype has been sitting around for a while with no function
implementing it. I wonder if there's a compiler warning I could turn
on to catch these things.
Previously, the notmuch_messages_t object was a linked list built on
top of a linked-list node with the odd name of notmuch_message_list_t.
Now, we've got much more sane naming with notmuch_message_list_t being
a list built on a linked-list node named notmuch_message_node_t. And
now the public notmuch_messages_t object is a separate iterator based
on notmuch_message_node_t. This means the interfaces for the new
notmuch_message_list_t object are now made available to the library
internals.
The new object is simply a linked-list of notmuch_message_t objects,
(unlike the old object which contained a couple of Xapian iterators).
This works now by the query code immediately iterator over all results
and creating notmuch_message_t objects for them, (rather than waiting
to create the objects until the notmuch_messages_get call as we did
earlier).
The point of this change is to allow other instances of lists of
messages, (such as in notmuch_thread_t), that are not directly related
to Xapian search results.
Previously, an excess call would have caused a crash. Now it simply
does nothing. Also, make notmuch_tags_get use a similar, consistent
early return for a NULL iterator.
We were properly sorting the threads based only on matched messages,
but we were displaying the date based on the total messages in the
thread, which led to inconsistent and very confusing results.
Note that the difference between thread results in date order and
thread results in reverse-date order is not simply a matter of
reversing the final results. When sorting in date order, the threads
are sorted by the oldest message in the thread. When sorting in
reverse-date order, the threads are sorted by the newest message in
the thread.
This difference means that we might want an explicit option in the
interface to reverse the order, (even though the default will be to
display the inbox in date order and global searches in reverse-date
order).
Note that we don't print the number of *unread* messages, but instead
the number of messages that matched the search terms. This is in
keeping with our philosophy that the inbox is nothing more than a
search view. If we search for messages with an inbox tag, then that's
what we'll get a count of. (And if somebody does want to see unread
counts, then they can search for the "unread" tag.)
Getting the number of matched messages is really nice when doing
historical searches. For example in a search like:
notmuch search tag:sent
(where the "sent" tag has been applied to all messages originating
from the user's email address)---here it's really nice to be able to
see a thread where the user just mentioned one point [1/13] vs. really
getting involved in the discussion [10/29].
We add a hash to the thread object so that we can detect author names
that have already been added to the list, and avoid adding them
redundantly. This avoids the giant chain of "bugzilla-daemon,
bugzilla-daemon, bugzilla-daemon, bugzilla-daemon, ..." author lists
that we would get otherwise, for example.
We've now expanded the notmuch_thread_create function to fire off a
secondary database query to find all the messages that belong to this
particular thread. This allows us to now have the complete authors'
list for the thread, and will also make it trivial to print accurate
message counts for threads in the future.
I thought it would be safe enough to return a few extra threads,
(since we happened to already get the relevant messages out of the
database). The problem is that then requires the caller to carefully
read the number of threads returned and adjust its next "first" value
accordingly. The interface is much simpler to use if we simply return
exactly what is asked for and no more.
This serves me right for committing untested code. The
notmuch_query_search_threads was totally broken, (it didn't properly
treat -1 as being unlimited and instead returned no threads in that
case).
The library interface now allows the caller to do incremental searches,
(such as one page of results at a time). Next we'll just need to hook
this up to "notmuch search" and the emacs interface.
It's important to have the names present for determining whether a
thread is worth reading or not. We may want to think about
abbreviating the list somehow if it is excessively long (or redundant
as in bugzilla-daemon, bugzilla-daemon, bugzilla-daemon, etc.).