With the recent addition of full-text indexing, printing only once per
1000 files just isn't often enough. The new timer-based approach will
be reliable regardless of the speed of adding message.
The original documentation of implicit AND is what we want, but
Xapian doesn't actually let us get that today. So be honest about
what the user can actually expect. And let's hope the Xapian
wizards give us the feature we want soon:
http://trac.xapian.org/ticket/402
Putting all of our documentation into a single help message was getting
a bit unwieldy. Now, the simple output of "notmuch help" is a reasonable
reminder and a quick reference. Then we now support a new syntax of:
"notmuch help <command>" for the more detailed help messages.
This gives us freedom to put more detailed caveats, etc. into some
sub-commands without worrying about the usage statement getting too
long.
This uses the same search functionality as "notmuch search" so
it should be quite powerful. And this global search might be
quick enough to be used for "automatic" adding of tags to new
messages.
Of course, this will all be a lot more useful when we can search
for actual text of messages and not just tags.
I'm trying to stick to a habit of fixing previously-introduced bugs
on side branches off of the commit that introduced the bug. The
idea here is to make it easy to find the commits to cherry pick
if bisecting in the future lands on one of the broken commits.
We were incorrectly only destroying messages in the case of
successful addition to the database, and not in other cases,
(such as failure due to FILE_NOT_EMAIL).
I'm still not entirely sure why this was performing abysmally, (as in
making an operation that should take a small fraction of a second take
10 seconds), nor why it was causing the database to entirely fail to
get new results.
But fortunately, this all seems to work now.
The recent addition of support for automatically adding tags to
new messages for "notmuch new" caused "notmuch setup" to segfault.
The fix is simple, (just need to move a destroy function to inside
a nearby if block).
Did I mention recently we need to add a test suite?
This means that the restore operation will now properly pick up the
removal of tags indicated by the tag just not being present in the
dump file.
We added a few new public functions in order to support this:
notmuch_message_freeze
notmuch_message_remove_all_tags
notmuch_message_thaw
Somehow this naming with an underscore crept in, (but only in the
private header, so notmuch.c was compiling with no prototype). Fix
to be the notmuch_thread_get_subject originally intended.
We want to be able to iterate over tags stored in various ways, so
the previous TermIterator-based tags object just wasn't general
enough. The new interface is nice and simple, and involves only
C datatypes.
We've now got a new notmuch_query_search_threads and a
notmuch_threads_result_t iterator. The thread object itself
doesn't do much yet, (just allows one to get the thread_id),
but that's at least enough to see that "notmuch search" is
actually doing something now, (since it has been converted
to print thread IDs instead of message IDs).
And maybe that's all we need. Getting the messages belonging
to a thread is as simple as a notmuch_query_search_messages
with a string of "thread:<thread-id>".
Though it would be convenient to add notmuch_thread_get_messages
which could use the existing notmuch_message_results_t iterator.
Now we just need an implementation of "notmuch show" and we'll
have something somewhat usable.
Along with renaming notmuch_results_t to notmuch_message_results_t.
The new type is quite a mouthful, but I don't expect it to be
used much other than the for-loop idiom in the documentation,
(which does at least fit nicely within 80 columns).
This is all in preparation for the addition of a new
notmuch_query_search_threads of course.
Having to enumerate all the enum values at every switch is annoying,
but this warning actually found a bug, (missing support for
NOTMUCH_STATUS_OUT_OF_MEMORY in notmuch_status_to_string).
When adding -Wextra we also add -Wno-ununsed-parameters since that
function means well enough, but is really annoying in practice.
So the warnings we fix here are basically all comparsions between
signed and unsigned values.
We were previously just doing fprintf;exit at each point, but I
wanted to add file and line-number details to all messages, so it
makes sense to use a single macro for that.
The "notmuch setup" output was getting overwhelmingly verbose.
Also, some people might not have a lot of mail, so might never need
this optimization. It's much better to move the hint to the time
when the user could actually benefit from it, (it's easy to detect
that "notmuch new" took more than 1 second, and we know if there
are any read-only directories there or not).
This isn't behaving at all like it's documented yet, (for example,
it's returning message IDs not thread IDs[*]). In fact, the output
code is just a copy of the body of "notmuch dump", so all you
get for now is message ID and tags.
But this should at least be enough to start exercising the query
functionality, (which is currently very buggy).
[*] I'll want to convert the databse to store thread documents
before fixing that.
With some recent testing, the timestamp was failing, (overflowing
the term limit), and reporting an error, but the top-level notmuch
command was still returning a success return value.
I think it's high time to add a test suite, (and the code base is
small enough that if we add it now it shouldn't be *too* hard to
shoot for a very high coverage percentage).
With this, "notmuch new" is now plenty fast even with large archives
spanning many sub-directories. Document this both in "notmuch help"
and also in the output of notmuch setup.
Finally, I can get new messages into my notmuch database without
having to run a complete "notmuch setup" again. This takes
advantage of the recent timestamp capabilities in the database
to avoid looking into directories that haven't changed since the
last time "notmuch new" was run.
Get rid of a useless leading 0 on the seconds value, and make a
distinction between "files" and "messages", (we process many
files, but not all of them are recongized as messages). Finally,
add a summary line at the end saying how many unique messages
were added to the database. Since this comes right after the
total number of files, it gives the user at least a hint as
to how many messages were encountered with duplicate message IDs.
Some people might argue for more initializers to be "safer",
but I actually prefer to leave things this way. It saves
typing, but the real benefit is that the things that do
require initialization stand out so we know to watch them
carefully. And with valgrind, we actually get to catch
errors earlier if we *don't* initialize them. So that can
be "safer" ironically enough.
This helps the user gauge the severity of the error.
For example, when restoring my sup tags I see a bunch of tags missing
for message IDs of the form "sup-faked-...". That's not surprising
since I know that sup generates these with the md5sum of the message
header while notmuch uses the sha-1 of the entire message. But how
much will this hurt?
Well, now that I can see that most of the missing tags are just
"attachment", then I'm not concerned, (I'll be automatically creating
that tag in the future based on the message contents). But if a
missing tag is "inbox" then that's more concerning because that's data
that I can't easily regenerate outside of sup.
It's pretty easy to do with all the right infrastructure in place.
Now that I can get my tags from sup to notmuch, maybe I'll be able
to start reading mail again.
This is to help keep the report looking clean when a new report
is shorter than a previous reports, (say, when crossing the
boundary from over one minute remaining to less than one minute
remaining).
This used to be here, but I must have accidentally dropped it
when reformatting the progress report recently.
Using the address of a static char* was clever, but really
unnecessary. An empty string is much less magic, and even
easier to understand as the way to query everything from
the database.