To properly support sorting in notmuch_query we know use an
Enquire object. We also throw in a QueryParser too, so we're
really close to being able to support arbitrary full-text
searches.
I took a look at the supported QueryParser syntax and chose
a set of flags for everything I like, (such as supporting
Boolean operators in either case ("AND" or "and"), supporting
phrase searching, supporting + and - to include/preclude terms,
and supporting a trailing * on any term as a wildcard).
This is to help keep the report looking clean when a new report
is shorter than a previous reports, (say, when crossing the
boundary from over one minute remaining to less than one minute
remaining).
This used to be here, but I must have accidentally dropped it
when reformatting the progress report recently.
Using the address of a static char* was clever, but really
unnecessary. An empty string is much less magic, and even
easier to understand as the way to query everything from
the database.
Previously we were leaking[*] memory in that the memory footprint of
a "notmuch dump" run would continue to grow until the output was
complete, and then finally all the memory would be freed.
Now, the memory footprint is small and constant, O(1) rather than
O(n) in the number of messages.
[*] Not leaking in a valgrind sense---every byte was still carefully
being accounted for and freed eventually.
None of these are strictly necessary, (everything was leak-free
without them), but notmuch_message_destroy can actually be useful
for when one query has many message results, but only one is needed
to be live at a time.
The destroy functions for results and tags are fairly gratuitous, as
there's unlikely to be any benefit from calling them. But they're all
easy to add, (all of these functions are just wrappers for talloc_free),
and we do so for consistency and completeness.
This is a fairly big milestone for notmuch. It's our first command
to do anything besides building the index, so it proves we can
actually read valid results out from the index.
It also puts in place almost all of the API and infrastructure we
will need to allow searching of the database.
Finally, with this change we are now using talloc inside of notmuch
which is truly a delight to use. And now that I figured out how
to use C++ objects with talloc allocation, (it requires grotty
parts of C++ such as "placement new" and "explicit destructors"),
we are valgrind-clean for "notmuch dump", (as in "no leaks are
possible").
This is in preparation for a new, public notmuch_message_t.
Eventually, the public notmuch_message_t is going to grow enough
features to need to be file-backed and will likely need everything
that's now in message-file.c. So we may fold these back into one
object/implementation in the future.
The recent change from GIOChannel to getline, (with a semantic
change of the newline terminator now being included in the
result that setup_command sees), broke this.
I'm trying to chase down 3 still-reachable pointers to glib hash
tables.
This change didn't help with that, but I think destroy might be a
better semantic match for what I actually want. (It shouldn't matter
though since I never take any additional references.)
We were properly feeing this memory when the thread-ids list was not
empty, but leaking it when it was.
Thanks, of course, to valgrind along with the G_SLICE=always-malloc
environment variable which makes leak checking with glib almost
bearable.
We were careful to free this memory when we finished parsing the
headers, but we missed it for the case of closing the message
without ever parsing all of the headers.
I was incorrectly using the return value of stat (-1) instead of
errno (ENOENT) to try to construct the error message here.
Also, while we're here, reword the error message to not have
"stat" in it, which in spite of what a Unix programmer will
tell you, is not actually a word.
Since we allow the user to enter a custom directory, we need to
let the user know how to make this persistent. Of course, a better
answer would be to take what the user entered and shove it into
a ~/.notmuch-config file or so, but for now this will have to do.
When documenting these functions I described support for a
NOTMUCH_BASE environment variable to be consulted in the case
of a NULL path. Only, I had forgotten to actually write the
code.
This code exists now, with a new, exported function:
notmuch_database_default_path
A simple bug meant that the correct value was being inserted into
the hash table, but a NULL value would be returned in some cases.
(If the value was already in the hash table at the beginning of
the call the the correct value would be returned, but if the
function had to parse to reach it then it would return NULL.)
This was tripping up the recently-added code to ignore messages
with NULL From:, Subject:, and To: headers, (which is fortunate
since otherwise the broken parsing might have stayed hidden for
longer).
The big update here is the addition of the dump and restore commands
which are next on my list. Also, I've now come up with a syntax for
documenting the arguments of sub-commands.
This is helpful for things like indexes that other mail programs
may have left around. It also means we can make the initial
instructions much easier, (the user need not worry about moving
away auxiliary files from some other email program).
These were just little tests while getting comfortable with
GMime and xapian. I'll likely use pieces of these as notmuch
continues, but for now let's not distract anyone looking
at notmuch with these.
And the code will live on in the history if I need to look
at it.
I noticed this style during a recent Debian install and I liked
how much less busy it is compared to what we had before, (while
still telling the user everything she might want).
The line-based parsing can be a bit awkward when wanting to peek
ahead, (say, for folded header values), but it's so convenient
to be able to trust that a string terminator exists on every
line so it cleans up the code considerably.
Looks like we can copy in a hash-table implementation, (from cairo,
say), and then a few _ascii_ functions from glib, (we'll need to
switch a few current uses if things like isspace, etc. to locale-
independent versions as well). So not too hard to free ourselves
of glib for now, (until we add GMime back in later, of course).
Now completing the process of making this function "our own".
The documentation is deleted here, because we already have
the documentation we want in notmuch-private.h.
The original code expected this to be set by running configure.
We'll just manually set it here for now. This isn't as portable
as if we were doing some compile-time examination of the current
system, but I don't need portability now.
When someone comes along that wants to port notmuch to another
system, they will already have all the #ifdefs in place and
will simply need to add the appropriate machinery to set the
defines.
This change is gratuitous. For now, notmuch is still linking
against glib, so I don't have any requirement to remove this,
(unlike the last few changes where good taste really did
require the changes).
The motivation here is two-fold:
1. I'm considering switching away from all glib-based allocation
soon so that I can more easily verify that the memory management
is solid. I want valgrind to say "no leaks are possible" not
"there is tons of memory still allocated, but probably reachable
so who knows if there are leaks or not?". And glib seems to make
that impossible.
2. I don't think there's anything performance-sensitive about the
allocation here. (In fact, if there is, then the right answer
would be to do this parsing without any allocation whatsoever.)
While this is surely one of the most innocent typedefs, it still
annoys me to have basic types like 'int' re-defined like this.
It just makes it harder to copy the code between projects, with
very little benefit in readability.
For readability, predicate functions and variables should be
obviously Boolean-natured by their actual *names*.
That's got to be one of the hardest macro names to read, ever,
(it's phrased with an implicit negative in the condition,
rather than something simple like "assert").
Plus, it's evil, since it's a macro with a return in it.
And finally, it's actually *longer* than just typing "if"
and "return". So what's the point of this ugly idiom?
We can't rely on any gmime-internal headers, (and fortunately we
don't need to). We also aren't burdened with any autconf machinery
so don't reference any of that.
We're sucking in one gmime implementation file just to get the
piece that parses an RFC 822 date, because I don't want to go
through the pain of replicating that.
Since we're currently just trying to stitch together In-Reply-To
and References headers we don't need that much sophistication.
It's when we later add full-text searching that GMime will be
useful.
So for now, even though my own code here is surely very buggy
compared to GMime it's also a lot faster. And speed is what
we're after for the initial index creation.
This is the beginning of the notmuch library as well, with its
interface in notmuch.h. So far we've got create, open, close, and
add_message (all with a notmuch_database prefix).
The current add_message function has already been whittled down from
what we have in notmuch-index-message to add only references,
message-id, and thread-id to the index, (that is---just enough to do
thread-linkage but nothing for full-text searching).
The concept here is to do something quickly so that the user can get
some data into notmuch and start using it. (The most interesting stuff
is then thread-linkage and labels like inbox and unread.) We can
defer the full-text indexing of the body of the messages for later,
(such as in the background while the user is reading mail).
The initial thread-stitching step is still slower than I would like.
We may have to stop using libgmime for this step as its overhead is
not worth it for the simple case of just parsing the message-id,
references, and in-reply-to headers.
This was for some time testing, (to see how fast xapian could be
if we were strictly adding documents and not doing any other IO
or computation). The answer is that xapian is quite fast, (on
the order of 1000 documents per second).
Of course, there's not much that this program does yet. It's got
some structure for some sub-commands that don't do anything. And
it has a main command that prints some explanatory text and then
counts all the regular files in your mail archive.
These were more significant than the previous leak because these were
in the loop and leaking memory for every message being parsed. It
turns out that g_hash_table_new should probably be named
g_hash_table_new_and_leak_memory_please. The actually useful function
is g_hash_table_new_full which lets us pass a free function, (to free
keys when inserting duplicates into the hash table). And after all,
weeding out duplicates is the only reason we are using this hash table
in the first place.
It almost goes without saying, valgrind found these leaks.
This was a single object in main outside any loops, so there was
no impact on performance or anything, but obviously we still want
to patch this.
Of course, valgrind gets the credit for seeing this.
When looking for a trailing ':' to introduce a quotation we peek at
the last character before a newline. But for blank lines, that's not
where we want to look. And when the first line in our buffer is a
blank line, we're underrunning our buffer. The fix is easy---just
bail early on blank lines since they have no terms anyway.
Thanks to valgrind for pointing out this error.