For very large mail boxes, it is desirable to know which files are being
processed e.g. when a crash occurs to know which one was the cause. Also,
it may be interesting to have a better idea of how the operation is
progressing when processing mailboxes with big messages.
This patch adds support for printing messages as they are processed by
"notmuch new":
* The "new" command now supports a "--verbose" flag.
* When running in verbose mode, the file path of the message about to be
processed is printed in the following format:
current/total: /path/to/message/file
Where "current" is the number of messages processed so far and "total" is
the total count of files to be processed.
The status line is erased using an ANSI sequence "\033[K" (erase current
line from the cursor to the end of line) each time it is refreshed. This
should not pose a problem because nearly every terminal supports it.
* The signal handler for SIGALRM and the timer are not enabled when running
in verbose mode, because we are already printing progress with each file,
periodical reports are not neccessary.
We take the recently created text from the notmuch manual page and
update the "notmuch help" command to use similar text. In particular,
we add a new "notmuch help search-terms" for documenting the search
syntax that is common to several commands.
If this is run first, it will run "notmuch setup" directly. After that
is successful, it will look for a databae and tell the user to run
"notmuch new" if the database doesn't exist yet. Finally, if the
database is present, it will provide some example "notmuch search"
commands for the user to try.
It's quite possible for someone to read the documentation and run
"notmuch setup" rather than just "notmuch". In that case, we don't
want to be any less welcoming.
This will allow for things like the database path to be specified
without any cheesy NOTMUCH_BASE environment variable. It also will
allow "notmuch reply" to recognize the user's email address when
constructing a reply in order to do the right thing, (that is, to use
the user's address to which mail was sent as From:, and not to reply
to the user's own addresses).
With this change, the "notmuch setup" command is now strictly for
changing the configuration of notmuch. It no longer creates the
database, but instead instructs the user to call "notmuch new" to do
that.
The advantage here is that we actually get the necessary folding of
long headers, (particularly the References header, but also things
like Subject). This also gives us parsed recipient addresses so that
we can easily elide the sender's address(es) from the recipient list
(just as soon as we have a configured value for the recipient's
address(es)).
Reviewed-by: Carl Worth <cworth@cworth.org>
Keith wrote all the code here against notmuch before notmuch.c was
split up into multiple files. So I've pushed the code around in
various ways to match the new code structure, but have generally tried
to avoid making any changes to the behavior of the code.
I did fix one bug---a missing call to g_mime_stream_file_set_owner in
show_part which would cause "notmuch show" to go off into the weeds
when trying to show multiple messages, (since the first stream would
fclose stdout).
Now that the client sources are alone here in their own directory,
(with all the library sources down inside the lib directory), we can
break the client up into multiple files without mixing the files up.
The hope is that these smaller files will be easier to manage and
maintain.
I recently added a print of the subject line for use as part of a
two-line summary in the emacs client. But of course, the subject was
already being printed on the next line. So I didn't really need to add
anything, I could have just stopped hiding what was already
printed. Anyway, we now avoid printing it twice in a row.
The more general command is more consistent, and more useful.
We also fix "notmuch search" to output copy-and-pasteable search terms
for the thread with "thread:" prepended already. Similarly, the
message-ID in the output of "notmuch show" is also now printed as a
valid search term, ("id:<message-id>" rather than "ID: <message-id>").
Naturally, the emacs code is also changed to track these changes.
We were inadvertently calling g_object_unref on a wild pointer leading
to the following error message:
GLib-GObject-CRITICAL **: g_object_unref: assertion
`G_IS_OBJECT (object)' failed
Now, why glib doesn't abort on critical errors, I'll never understand.
I previously had a hack that special-cased the "unread" tag and
printed it on the same line as the message ID. But now that we are
printing all tags at the end of the one-line summary we don't need
this anymore. Get rid of it, and just read "unread" from the list of
tags just like any other tag.
Also hide all markers.
From here, all we really need for legibility is the following:
* Hide away citations and signatures
* Call out the one-line summary some way, (larger font size?)
* Add nesting for replies
We were previously using things like "%message{" which were not
guaranteed to never appear in an email message. Using a control
character (^L or '\f' instead of '%') gives us better assurance that
our delimiter doesn't show up in an original email message.
This still isn't entirely safe since we're decoding encoded text in
the body of the email message so almost all bets are off really.
I had noticed several times earlier that having a talloc context
passed in would make things more convenient. I'm not exercising
that convenience yet, but the context is there now, (and there's
one fewer item on our TODO list).
We were aware of this bug when we wrote the function, (that a date
six days in the past would be treated as the "Friday" or as the
"Oct. 23" case depending on whether its time was before or after
the current time today). We thought it wouldn't be a problem, but
in practice it is. In scanning search results with this output,
the transition between formats makes it look like a day boundary,
(so it would be easy to mistakenly think "Oct. 23" is Thursday).
Fix this to avoid confusion, (still being careful to never print
"Thursday" for a date 7 days in the past when today is Thursday).
We're using a delimiter syntax that Keith is optimistic about
being able to easily parse in emacs. Note: We're not escaping
any occurrence of the delimiters in the message yet, so we'll
need to fix that.
With the recent addition of full-text indexing, printing only once per
1000 files just isn't often enough. The new timer-based approach will
be reliable regardless of the speed of adding message.
The original documentation of implicit AND is what we want, but
Xapian doesn't actually let us get that today. So be honest about
what the user can actually expect. And let's hope the Xapian
wizards give us the feature we want soon:
http://trac.xapian.org/ticket/402
Putting all of our documentation into a single help message was getting
a bit unwieldy. Now, the simple output of "notmuch help" is a reasonable
reminder and a quick reference. Then we now support a new syntax of:
"notmuch help <command>" for the more detailed help messages.
This gives us freedom to put more detailed caveats, etc. into some
sub-commands without worrying about the usage statement getting too
long.
This uses the same search functionality as "notmuch search" so
it should be quite powerful. And this global search might be
quick enough to be used for "automatic" adding of tags to new
messages.
Of course, this will all be a lot more useful when we can search
for actual text of messages and not just tags.
I'm trying to stick to a habit of fixing previously-introduced bugs
on side branches off of the commit that introduced the bug. The
idea here is to make it easy to find the commits to cherry pick
if bisecting in the future lands on one of the broken commits.
We were incorrectly only destroying messages in the case of
successful addition to the database, and not in other cases,
(such as failure due to FILE_NOT_EMAIL).
I'm still not entirely sure why this was performing abysmally, (as in
making an operation that should take a small fraction of a second take
10 seconds), nor why it was causing the database to entirely fail to
get new results.
But fortunately, this all seems to work now.
The recent addition of support for automatically adding tags to
new messages for "notmuch new" caused "notmuch setup" to segfault.
The fix is simple, (just need to move a destroy function to inside
a nearby if block).
Did I mention recently we need to add a test suite?
This means that the restore operation will now properly pick up the
removal of tags indicated by the tag just not being present in the
dump file.
We added a few new public functions in order to support this:
notmuch_message_freeze
notmuch_message_remove_all_tags
notmuch_message_thaw
Somehow this naming with an underscore crept in, (but only in the
private header, so notmuch.c was compiling with no prototype). Fix
to be the notmuch_thread_get_subject originally intended.
We want to be able to iterate over tags stored in various ways, so
the previous TermIterator-based tags object just wasn't general
enough. The new interface is nice and simple, and involves only
C datatypes.
We've now got a new notmuch_query_search_threads and a
notmuch_threads_result_t iterator. The thread object itself
doesn't do much yet, (just allows one to get the thread_id),
but that's at least enough to see that "notmuch search" is
actually doing something now, (since it has been converted
to print thread IDs instead of message IDs).
And maybe that's all we need. Getting the messages belonging
to a thread is as simple as a notmuch_query_search_messages
with a string of "thread:<thread-id>".
Though it would be convenient to add notmuch_thread_get_messages
which could use the existing notmuch_message_results_t iterator.
Now we just need an implementation of "notmuch show" and we'll
have something somewhat usable.
Along with renaming notmuch_results_t to notmuch_message_results_t.
The new type is quite a mouthful, but I don't expect it to be
used much other than the for-loop idiom in the documentation,
(which does at least fit nicely within 80 columns).
This is all in preparation for the addition of a new
notmuch_query_search_threads of course.
Having to enumerate all the enum values at every switch is annoying,
but this warning actually found a bug, (missing support for
NOTMUCH_STATUS_OUT_OF_MEMORY in notmuch_status_to_string).
When adding -Wextra we also add -Wno-ununsed-parameters since that
function means well enough, but is really annoying in practice.
So the warnings we fix here are basically all comparsions between
signed and unsigned values.
We were previously just doing fprintf;exit at each point, but I
wanted to add file and line-number details to all messages, so it
makes sense to use a single macro for that.
The "notmuch setup" output was getting overwhelmingly verbose.
Also, some people might not have a lot of mail, so might never need
this optimization. It's much better to move the hint to the time
when the user could actually benefit from it, (it's easy to detect
that "notmuch new" took more than 1 second, and we know if there
are any read-only directories there or not).
This isn't behaving at all like it's documented yet, (for example,
it's returning message IDs not thread IDs[*]). In fact, the output
code is just a copy of the body of "notmuch dump", so all you
get for now is message ID and tags.
But this should at least be enough to start exercising the query
functionality, (which is currently very buggy).
[*] I'll want to convert the databse to store thread documents
before fixing that.
With some recent testing, the timestamp was failing, (overflowing
the term limit), and reporting an error, but the top-level notmuch
command was still returning a success return value.
I think it's high time to add a test suite, (and the code base is
small enough that if we add it now it shouldn't be *too* hard to
shoot for a very high coverage percentage).
With this, "notmuch new" is now plenty fast even with large archives
spanning many sub-directories. Document this both in "notmuch help"
and also in the output of notmuch setup.
Finally, I can get new messages into my notmuch database without
having to run a complete "notmuch setup" again. This takes
advantage of the recent timestamp capabilities in the database
to avoid looking into directories that haven't changed since the
last time "notmuch new" was run.
Get rid of a useless leading 0 on the seconds value, and make a
distinction between "files" and "messages", (we process many
files, but not all of them are recongized as messages). Finally,
add a summary line at the end saying how many unique messages
were added to the database. Since this comes right after the
total number of files, it gives the user at least a hint as
to how many messages were encountered with duplicate message IDs.
Some people might argue for more initializers to be "safer",
but I actually prefer to leave things this way. It saves
typing, but the real benefit is that the things that do
require initialization stand out so we know to watch them
carefully. And with valgrind, we actually get to catch
errors earlier if we *don't* initialize them. So that can
be "safer" ironically enough.
This helps the user gauge the severity of the error.
For example, when restoring my sup tags I see a bunch of tags missing
for message IDs of the form "sup-faked-...". That's not surprising
since I know that sup generates these with the md5sum of the message
header while notmuch uses the sha-1 of the entire message. But how
much will this hurt?
Well, now that I can see that most of the missing tags are just
"attachment", then I'm not concerned, (I'll be automatically creating
that tag in the future based on the message contents). But if a
missing tag is "inbox" then that's more concerning because that's data
that I can't easily regenerate outside of sup.
It's pretty easy to do with all the right infrastructure in place.
Now that I can get my tags from sup to notmuch, maybe I'll be able
to start reading mail again.
This is to help keep the report looking clean when a new report
is shorter than a previous reports, (say, when crossing the
boundary from over one minute remaining to less than one minute
remaining).
This used to be here, but I must have accidentally dropped it
when reformatting the progress report recently.
Using the address of a static char* was clever, but really
unnecessary. An empty string is much less magic, and even
easier to understand as the way to query everything from
the database.
Previously we were leaking[*] memory in that the memory footprint of
a "notmuch dump" run would continue to grow until the output was
complete, and then finally all the memory would be freed.
Now, the memory footprint is small and constant, O(1) rather than
O(n) in the number of messages.
[*] Not leaking in a valgrind sense---every byte was still carefully
being accounted for and freed eventually.
This is a fairly big milestone for notmuch. It's our first command
to do anything besides building the index, so it proves we can
actually read valid results out from the index.
It also puts in place almost all of the API and infrastructure we
will need to allow searching of the database.
Finally, with this change we are now using talloc inside of notmuch
which is truly a delight to use. And now that I figured out how
to use C++ objects with talloc allocation, (it requires grotty
parts of C++ such as "placement new" and "explicit destructors"),
we are valgrind-clean for "notmuch dump", (as in "no leaks are
possible").
The recent change from GIOChannel to getline, (with a semantic
change of the newline terminator now being included in the
result that setup_command sees), broke this.
Since we allow the user to enter a custom directory, we need to
let the user know how to make this persistent. Of course, a better
answer would be to take what the user entered and shove it into
a ~/.notmuch-config file or so, but for now this will have to do.
When documenting these functions I described support for a
NOTMUCH_BASE environment variable to be consulted in the case
of a NULL path. Only, I had forgotten to actually write the
code.
This code exists now, with a new, exported function:
notmuch_database_default_path
The big update here is the addition of the dump and restore commands
which are next on my list. Also, I've now come up with a syntax for
documenting the arguments of sub-commands.
This is helpful for things like indexes that other mail programs
may have left around. It also means we can make the initial
instructions much easier, (the user need not worry about moving
away auxiliary files from some other email program).
I noticed this style during a recent Debian install and I liked
how much less busy it is compared to what we had before, (while
still telling the user everything she might want).
Looks like we can copy in a hash-table implementation, (from cairo,
say), and then a few _ascii_ functions from glib, (we'll need to
switch a few current uses if things like isspace, etc. to locale-
independent versions as well). So not too hard to free ourselves
of glib for now, (until we add GMime back in later, of course).
This is the beginning of the notmuch library as well, with its
interface in notmuch.h. So far we've got create, open, close, and
add_message (all with a notmuch_database prefix).
The current add_message function has already been whittled down from
what we have in notmuch-index-message to add only references,
message-id, and thread-id to the index, (that is---just enough to do
thread-linkage but nothing for full-text searching).
The concept here is to do something quickly so that the user can get
some data into notmuch and start using it. (The most interesting stuff
is then thread-linkage and labels like inbox and unread.) We can
defer the full-text indexing of the body of the messages for later,
(such as in the background while the user is reading mail).
The initial thread-stitching step is still slower than I would like.
We may have to stop using libgmime for this step as its overhead is
not worth it for the simple case of just parsing the message-id,
references, and in-reply-to headers.