Our old notmuch-index-message.cc code had this, but I originally
left it out when adding indexing back in. I was concerned primarily
with mistakenly detecting signature markers and omitting important
text, (for example, I often do long lines of "----" as section
separators).
But now I see that there's a performance benefit to skippint the
quotations, (about 120 files/sec. instead of 95 files/sec.). I mitigated
the bogus signature checking by recognizing nothing other than the
all-time classic "-- ".
This avoids us wasting a bunch of time doing an expensive SHA-1 over a large
file only to discover later that it doesn't even *look* like an email message.
We put these is as a separate term so that they can be extracted.
We don't actually need this for searching, since typing an email
address in as a search term will already trigger a phrase search
that does exactly what's wanted.
This is based on the old notmuch-index-message.cc from early in
the history of notmuch, but considerably cleaned up now that
we have some experience with Xapian and know just what we want
to index, (rather than just blindly trying to index exactly
what sup does).
This does slow down notmuch_database_add_message a *lot*, but I've
got some ideas for getting some time back.
The original documentation of implicit AND is what we want, but
Xapian doesn't actually let us get that today. So be honest about
what the user can actually expect. And let's hope the Xapian
wizards give us the feature we want soon:
http://trac.xapian.org/ticket/402
"notmuch tag" is implemented now and seems to work great (and fast).
As for the race condition, as noted in the description we're removing
it's not exposed directly in the API, but only in a client that
allows for looping over search results and removing the inbox tag
from all of them. But then, that's exactly what the "notmuch tag"
command does. So, as discussed, we've now documented that command
to highlight the issue. Problem resolved, (as well as we can).
Putting all of our documentation into a single help message was getting
a bit unwieldy. Now, the simple output of "notmuch help" is a reasonable
reminder and a quick reference. Then we now support a new syntax of:
"notmuch help <command>" for the more detailed help messages.
This gives us freedom to put more detailed caveats, etc. into some
sub-commands without worrying about the usage statement getting too
long.
We were nicely reporting the lock-aquisition failure, but then marching
along trying to use the database object and just crashing badly.
So don't do that.
It's convenient to be able to do things like:
notmuch tag -inbox thread:<thread-id>
(even though this can run into a race condition as noted in TODO--the fix
for the race is simply to not run "notmuch new" between reading a thread
with the (not yet existent) "notmuch show" and removing its inbox tag
with a command like the above). So we now allow such a thing.
This uses the same search functionality as "notmuch search" so
it should be quite powerful. And this global search might be
quick enough to be used for "automatic" adding of tags to new
messages.
Of course, this will all be a lot more useful when we can search
for actual text of messages and not just tags.
The recent, disastrous failure of "notmuch new" would have been
avoided with this change. The new_command function was basically
assuming that it would only get a message object on success so
wasn't destroying the message in the other cases.
This would have helped with the recent bug causing "notmuch new"
to not record any results in the database. I'm not sure why
the explicit flush would be required, (shouldn't the destructor
always ensure that things flush?), but perhaps some outstanding
references from the leak prevented that.
In any case, an explicit flush on close() seems to make sense.
I'm trying to stick to a habit of fixing previously-introduced bugs
on side branches off of the commit that introduced the bug. The
idea here is to make it easy to find the commits to cherry pick
if bisecting in the future lands on one of the broken commits.
We were incorrectly only destroying messages in the case of
successful addition to the database, and not in other cases,
(such as failure due to FILE_NOT_EMAIL).
I'm still not entirely sure why this was performing abysmally, (as in
making an operation that should take a small fraction of a second take
10 seconds), nor why it was causing the database to entirely fail to
get new results.
But fortunately, this all seems to work now.
The recent addition of support for automatically adding tags to
new messages for "notmuch new" caused "notmuch setup" to segfault.
The fix is simple, (just need to move a destroy function to inside
a nearby if block).
Did I mention recently we need to add a test suite?
Interstingly, it's our simple "notmuch" client that's going to be the
most difficult to fix. There's just not as much information preserved
in the textual representation from "notmuch search" as there is in the
objects returned from notmuch_query_search_threads.
The archive-thread race condition doesn't even exist now because there's
no command for modifying tags at the level of a thread (just individual
messages).
This means that the restore operation will now properly pick up the
removal of tags indicated by the tag just not being present in the
dump file.
We added a few new public functions in order to support this:
notmuch_message_freeze
notmuch_message_remove_all_tags
notmuch_message_thaw
Somehow this naming with an underscore crept in, (but only in the
private header, so notmuch.c was compiling with no prototype). Fix
to be the notmuch_thread_get_subject originally intended.
The previous functions were always called together, so we might as
well just have one function for this. Also, the reset() name was
poor, and prepare_iterator() is much more descriptive.
We want to be able to iterate over tags stored in various ways, so
the previous TermIterator-based tags object just wasn't general
enough. The new interface is nice and simple, and involves only
C datatypes.
We will soon be wanting multiple different implementations of
notmuch_tags_t iterators, so we need to keep the actual structure
as an implementation detail inside of tags.cc.
We want to start using this from both message.cc and thread.cc so we
need it in a place we can share the code. This also requires a new
notmuch-private-cxx.h header file for interfaces that include
C++-specific datatypes (such as Xapian::Document).
We had documented both notmuch_thread_results_get and
notmuch_message_results_get to return NULL if (! has_more)
but we hadn't actually implemented that. Fix.
We've now got a new notmuch_query_search_threads and a
notmuch_threads_result_t iterator. The thread object itself
doesn't do much yet, (just allows one to get the thread_id),
but that's at least enough to see that "notmuch search" is
actually doing something now, (since it has been converted
to print thread IDs instead of message IDs).
And maybe that's all we need. Getting the messages belonging
to a thread is as simple as a notmuch_query_search_messages
with a string of "thread:<thread-id>".
Though it would be convenient to add notmuch_thread_get_messages
which could use the existing notmuch_message_results_t iterator.
Now we just need an implementation of "notmuch show" and we'll
have something somewhat usable.
Along with renaming notmuch_results_t to notmuch_message_results_t.
The new type is quite a mouthful, but I don't expect it to be
used much other than the for-loop idiom in the documentation,
(which does at least fit nicely within 80 columns).
This is all in preparation for the addition of a new
notmuch_query_search_threads of course.
Even with the recent warnings work, gcc didn't tell me about a static
function that I'm not calling? Apparently I get "defined but not
used" in C files, but not C++ files. That's bogus, and yet one more
reason for me to push the C++ to a minimal lower layer.
I didn't notice this because `xapian-config -cxxflags` gives empty
output on my system. But for someone with the xapian library
installed in some non-standard location this would be important.
I didn't end up adding any of the warnings options that aren't allowed
for C++, (such as -Wold-style-definition, -Wnested-externs,
-Werror-implicit-function-declaration, -Wstrict-prototypes,
-Wmissing-prototypes, or -Wbad-function-cast). So for now we can
drop the separate C and C++ variables for warnings.
Having to enumerate all the enum values at every switch is annoying,
but this warning actually found a bug, (missing support for
NOTMUCH_STATUS_OUT_OF_MEMORY in notmuch_status_to_string).
When adding -Wextra we also add -Wno-ununsed-parameters since that
function means well enough, but is really annoying in practice.
So the warnings we fix here are basically all comparsions between
signed and unsigned values.
Instead of supporting multiple thread IDs, we now merge together
thread IDs if one message is ever found to belong to more than one
thread. This allows for constructing complete threads when, for
example, a child message doesn't include a complete list of References
headers back to the beginning of the thread.
It also simplifies dealing with mapping a message ID to a thread ID
which is now a simple get_thread_id just like get_message_id, (and no
longer an iterator-based thing like get_tags).