Commit graph

50 commits

Author SHA1 Message Date
Carl Worth
6b20dbff86 add_message: Pull the thread-stitching portion out into new _notmuch_database_link_message
The function was getting too long-winded before. Add since I'm about
to change how we handle the thread linking, it's convenient to have
it in an isolated function.
2009-10-25 11:03:55 -07:00
Carl Worth
7b227a6bf7 Add an INTERNAL_ERROR macro and use it for all internal errors.
We were previously just doing fprintf;exit at each point, but I
wanted to add file and line-number details to all messages, so it
makes sense to use a single macro for that.
2009-10-25 10:54:49 -07:00
Carl Worth
3b8e3ab666 add_message: Propagate error status from notmuch_message_create_for_message_id
What a great feeling to remove an XXX comment.
2009-10-25 10:54:43 -07:00
Carl Worth
32ecfe72a1 Add comment documenting our current database schema.
I've got schemes to change this schema somewhat dramatically, so I
want a place to be able to record and review those changes.
2009-10-25 08:57:09 -07:00
Carl Worth
1c2bac747e Drop the storage of thread ID(s) in a value.
Now that we are iterating over the thread terms instead, we can
drop this redundant storage (which should shrink our database a
tiny bit).
2009-10-25 00:31:20 -07:00
Carl Worth
9ec68aa9c4 Shuffle the value numbers around in the database.
First, it's nice that for now we don't have any users yet, so we
can make incompatible changes to the database layout like this
without causing trouble. ;-)

There are a few reasons for this change. First, we now use value 0
uniformly as a timestamp for both mail and timestamp documents, (which
lets us cleanup an ugly and fragile bare 0 in the add_value and
get_value calls in the timestamp code).

Second, I want to drop the thread value entirely, so putting it at the
end of the list means we can drop it as compatible change in the
future. (I almost want to drop the message-ID value too, but it's nice
to be able to sort on it to get diff-able output from "notmuch dump".)

But the thread value we never use as a value, (we would never sort on
it, for example). And it's totally redundant with the thread terms we
store already. So expect it to disappear soon.
2009-10-24 23:05:08 -07:00
Carl Worth
65a272832e Invent our own prefix values.
We're now dropping all pretense of keeping the database directly
compatible with sup's current xapian backend. (But perhaps someone
might write a new nothmuch backend for sup in the future.)

In coming up with the prefix values here, I tried to follow the
conventions of http://xapian.org/docs/omega/termprefixes.html as
closely as makes sense, (with some domain translation from "web"
to "email archive").
2009-10-24 22:57:47 -07:00
Carl Worth
0aa355cc8f Split BOOLEAN_PREFIX into INTERNAL and EXTERNAL subsets.
The idea here is that only some of the prefix names (such as "id" and
"tag") actually make sense in external user-supplied query
strings. Other things like "type" are internal implementation details
of how we store things in the database. So internal machinery will add
those terms to the database and we don't need to support them in the
string itself.

With this, we can now simply loop over the external prefix values to
let the quiery parser know about them. So as we add prefixes in the
future, we'll only need to add them to this list.
2009-10-24 22:38:43 -07:00
Carl Worth
2a9b4fce7c Change all occurrences of "msgid" to "id".
What's good for the user is good for the internals.
2009-10-24 22:29:49 -07:00
Carl Worth
aa46a683a8 Add the magic to allow searches such as "tag:inbox".
The key for this is call add_boolean_prefix on the QueryParser
object. That tells the query parser to take something like "tag:inbox"
and transform it into the "Linbox" term and do what it needs to do to
make this term a requirement of the search. We're starting to have a
real system here.

Also, I didn't want to expose the ugly name of "msgid" to the user, so
we add a prefix name of simply "id" instead.
2009-10-24 22:23:58 -07:00
Carl Worth
0bc73af96c Fix timestamp generation to avoid overflowing the term limit
The previous code was only correct as long as the timestamp prefix
was only a single character. But with the recent change to a
multi-character prefix, this broke. So fix it now.
2009-10-24 22:10:03 -07:00
Carl Worth
f281f4b677 Trim down prefix list to things we are actually using.
I've decided not to try for sup compatibility at the leve of the
xapian datbase. There's just too much about sup's usage of the
database that I don't like, (beyond the embedded ruby data structures
there is redundant storage of message IDs, thread IDs, and dates (in
both terms and values)).

I'm going to fix that up in the database of notmuch, with some other
changes as well. (I plan to drop "reference" terms once linkage to a
thread ID through the reference is established.  I also plan to add
actual documents to represent threads.)

So with all that incompatibility, I might as well make my own prefix
values. And while doing that, I should try to be as compatible as
possible with the conventions described here:

http://xapian.org/docs/omega/termprefixes.html
2009-10-24 22:04:59 -07:00
Carl Worth
e37b7cc2da Move the prefix-string arrays back into database.cc from message.cc
Yes, I'm being wishy-washy here, moving code back and forth. But
this is where these really do belong.
2009-10-24 21:52:48 -07:00
Carl Worth
b3cbcea8fd Add NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID
And document that notmuch_database_add_message can return this
value. This pushes the hard decision of what to do with duplicate
messages out to the user, but that's OK. (We weren't really doing
anything with these ourselves, and this way the user is at least
informed of the issue, rather than it just getting papered over
internally.)
2009-10-23 14:40:33 -07:00
Carl Worth
5ebb21600e Clarify documentation and error string for NOTMUCH_STATUS_TAG_TOO_LONG
It's helpful to point out NOTMUCH_STATUS_TAG_MAX for users.
2009-10-23 14:36:38 -07:00
Carl Worth
68a10091d6 Add notmuch_database_set_timestamp and notmuch_database_get_timestamp
These will be very helpful to implement an efficient "notmuch new"
command which imports new mail messages that have appeared.
2009-10-23 14:31:01 -07:00
Carl Worth
668f20bdfb database: Add private find_unique_doc_id and find_unique_document functions
These are a generalization of the unique-ness testing of
notmuch_database_find_message. More preparation for
firectory timestamps.
2009-10-23 14:24:07 -07:00
Carl Worth
edbf7f645c database: Similarly rename find_message_by_docid to find_document_for_doc_id
Again preferring notmuch_database_t* over Xapian::Database*.

Also, we're standardizing on "doc_id" rather than "docid" locally, (as
an analoge to "message_id"), in spite of the "Xapian::docid" name,
(which, fortunately, we can ignore and just us "unsigned int" instead).
2009-10-23 14:12:06 -07:00
Carl Worth
9fc4a365d6 database: Rename internal find_messages_by_term to find_doc_ids
This name is a more accurate description of what it does, and
the more general naming will make sense as we start storing
non-message documents in the database (such as directory
timestamps).

Also, don't pass around a Xapian::Database where it's more our
style to pass a notmuch_database_t*.
2009-10-23 14:06:24 -07:00
Carl Worth
6ccdffcd87 add_message: Fix to not add multiple documents with the same message ID
Here's the second big fix to message-ID handling, (the first was to
generate message IDs when an email contained none). Now, with no
document missing a message ID, and no two documents having the same
message ID, we have a nice consistent database where the message ID
can be used as a unique key.
2009-10-23 06:00:10 -07:00
Carl Worth
31044d10ed add_message: Re-order the code a bit (find message-id first).
We're preparing for being able to deal with files with duplicate
message IDs here. The plan is to create a notmuch_message_t object in
add_message that may or may not reference a document that exists in
the database. So to do this, we have to find the message ID before we
do any manipulation of the doc.
2009-10-23 05:30:37 -07:00
Carl Worth
c78358fa8a Move thread_id generation code from database.cc to message.cc
It's really up to the message to decide how to generate these.
2009-10-23 05:25:58 -07:00
Carl Worth
1ecdef59f5 add_message: Rename message to message_file
I still don't like the name message_file at all, but we're about
to start using a notmuch_message_t in this function so we need
to do something to keep the identifiers separate for now.

Eventually, it probably makes sense to push the message-parsing
code from database.cc to message.cc.
2009-10-23 05:13:42 -07:00
Carl Worth
77f9d3ee0e Don't forget the "to" header when restrict parsing to certain headers
We recently started discarding files as "not email" if they have none
of Subject, From, nor To. Apaprently, my mail collection contains a
number of messages that I sent, that are saved without Subject and
From, (perhaps these were drafts?).

Anyway, it's fortunate I had those since they alerted me to this bug,
where we were not parsing the "To" header in some cases.
2009-10-22 15:34:47 -07:00
Carl Worth
90f93fc9c7 Fix missing error check.
The notmuch_message_file_open function is perfectly capable of
returning NULL. So check for it.
2009-10-22 15:33:56 -07:00
Carl Worth
6a4992bc61 Generate message ID (using SHA1) when a mail message contains none.
This is important as we're using the message ID as the unique key
in our database. So previously, all messages with no message ID
would be treated as the same message---not good at all.
2009-10-22 15:31:56 -07:00
Carl Worth
84480738a5 Merge branch from fixing up bugs after bisecting.
I'm glad that when I implemented "notmuch restore" I went through the
extra effort to take the code I had written in one sitting into over a
dozen commits. Sure enough, I hadn't tested well enough and had
totally broken "notmuch setup", (segfaults and bogus thread_id
values).

With the little commits I had made, git bisect saved the day, and I
went back to make the fixes right on top of the commits that
introduced the bugs. So now we octopus merge those in.
2009-10-21 23:23:44 -07:00
Carl Worth
c58ee818b5 Bring back the insert_thread_id function.
We deleted this in favor of our fancy new thread_ids iterator
from the message object. But one of the previous callers of
insert_thread_id isn't using notmuch_message_t yet. I made
the mistake of thinking I could just call g_hash_table_insert
directly, but the problem was that nobody was splitting
up the thread_id string at its commas.

So with this, we were inserting bogus comma-separated IDs
into the hash table, so thread_id values were ballooning
out of control. Should be much better now.
2009-10-21 23:21:12 -07:00
Carl Worth
302d54834d Add notmuch_status_to_string function.
Be kind and let the user print error messages, not just error
codes.
2009-10-21 16:12:53 -07:00
Carl Worth
defd216487 Add notmuch_message_add_tag and notmuch_message_remove_tag
With these two added, we now have enough functionality in the
library to implement "notmuch restore".
2009-10-21 15:56:33 -07:00
Carl Worth
6c5054ebee database: Add new notmuch_database_find_message
With this function, and the recently added support for
notmuch_message_get_thread_ids, we now recode the find_thread_ids
function to work just the way we expect a user of the public
notmuch API to work. Not too bad really.
2009-10-21 15:40:20 -07:00
Carl Worth
22b2265cac Rename NOTMUCH_MAX_TERM to NOTMUCH_TERM_MAX
Just better consistency with our naming schemes.
2009-10-21 14:10:00 -07:00
Carl Worth
6142216132 Move find_prefix function from database.cc to message.cc
It's definitely a better fit there for now, (and can likely
eventually be made static as add_term moves from database
to message as well).
2009-10-21 14:07:40 -07:00
Carl Worth
17b3c214ea Convert notmuch_database_t to start using talloc.
This will be handy as we can hang future talloc allocations off
of the datbase now.
2009-10-21 14:00:37 -07:00
Carl Worth
d29a6ec791 notmuch setup: Collapse internal whitespace within message-id
I'm too lazy to see what the RFC says, but I know that having
whitespace inside a message-ID is sure to confuse things. And
besides, this makes things more compatible with sup so that
I have some hope of importing sup labels.
2009-10-21 10:07:34 -07:00
Carl Worth
65baa4f4e7 notmuch dump: Fix the sorting of results.
To properly support sorting in notmuch_query we know use an
Enquire object. We also throw in a QueryParser too, so we're
really close to being able to support arbitrary full-text
searches.

I took a look at the supported QueryParser syntax and chose
a set of flags for everything I like, (such as supporting
Boolean operators in either case ("AND" or "and"), supporting
phrase searching, supporting + and - to include/preclude terms,
and supporting a trailing * on any term as a wildcard).
2009-10-21 00:35:56 -07:00
Carl Worth
6a3b68edef add_message: Add a type:mail ("Kmail") term to all documents.
This gives us an easy way to specify "all mail messages" in a search
query. We simply look for this term.
2009-10-21 00:34:36 -07:00
Carl Worth
50144fb354 database: Remove two little bits of dead code. 2009-10-20 23:12:53 -07:00
Carl Worth
466a7bbf62 Implement 'notmuch dump'.
This is a fairly big milestone for notmuch. It's our first command
to do anything besides building the index, so it proves we can
actually read valid results out from the index.

It also puts in place almost all of the API and infrastructure we
will need to allow searching of the database.

Finally, with this change we are now using talloc inside of notmuch
which is truly a delight to use. And now that I figured out how
to use C++ objects with talloc allocation, (it requires grotty
parts of C++ such as "placement new" and "explicit destructors"),
we are valgrind-clean for "notmuch dump", (as in "no leaks are
possible").
2009-10-20 21:21:39 -07:00
Carl Worth
cd4a8734d3 Rename private notmuch_message_t to notmuch_message_file_t
This is in preparation for a new, public notmuch_message_t.

Eventually, the public notmuch_message_t is going to grow enough
features to need to be file-backed and will likely need everything
that's now in message-file.c. So we may fold these back into one
object/implementation in the future.
2009-10-20 15:09:51 -07:00
Carl Worth
5a84df0f15 add_message: Fix memory leak of thread_ids GPtrArray.
We were properly feeing this memory when the thread-ids list was not
empty, but leaking it when it was.

Thanks, of course, to valgrind along with the G_SLICE=always-malloc
environment variable which makes leak checking with glib almost
bearable.
2009-10-20 13:05:45 -07:00
Carl Worth
e6236b88fd database.cc: Document better pieces of glib that we're using. 2009-10-20 12:49:32 -07:00
Carl Worth
968feafbad notmuch_database_open: Fix error message for file-not-found.
I was incorrectly using the return value of stat (-1) instead of
errno (ENOENT) to try to construct the error message here.

Also, while we're here, reword the error message to not have
"stat" in it, which in spite of what a Unix programmer will
tell you, is not actually a word.
2009-10-20 10:14:00 -07:00
Carl Worth
55c8ee9a86 notmuch_database_create/open: Fix to handle NULL as documented.
When documenting these functions I described support for a
NOTMUCH_BASE environment variable to be consulted in the case
of a NULL path. Only, I had forgotten to actually write the
code.

This code exists now, with a new, exported function:

     notmuch_database_default_path
2009-10-20 09:58:40 -07:00
Carl Worth
ad784f38ce notmuch: Ignore files that don't look like email messages.
This is helpful for things like indexes that other mail programs
may have left around. It also means we can make the initial
instructions much easier, (the user need not worry about moving
away auxiliary files from some other email program).
2009-10-19 23:16:05 -07:00
Carl Worth
45f0d7bcab Don't hash headers we won't end up using.
Just saving a little work here.
2009-10-19 13:48:13 -07:00
Carl Worth
c5eea2b77e Document which pieces of glib we're still using.
Looks like we can copy in a hash-table implementation, (from cairo,
say), and then a few _ascii_ functions from glib, (we'll need to
switch a few current uses if things like isspace, etc. to locale-
independent versions as well). So not too hard to free ourselves
of glib for now, (until we add GMime back in later, of course).
2009-10-19 13:40:56 -07:00
Carl Worth
fa562fa22b Hook up our fancy new notmuch_parse_date function.
With all the de-glib-ification out of the way, we can now use it
to allow for date-based sorting of Xapian search results.
2009-10-19 13:35:29 -07:00
Carl Worth
0e777a8f80 notmuch: Switch from gmime to custom, ad-hoc parsing of headers.
Since we're currently just trying to stitch together In-Reply-To
and References headers we don't need that much sophistication.
It's when we later add full-text searching that GMime will be
useful.

So for now, even though my own code here is surely very buggy
compared to GMime it's also a lot faster. And speed is what
we're after for the initial index creation.
2009-10-19 13:00:43 -07:00
Carl Worth
10c176ba0e notmuch: Start actually adding messages to the index.
This is the beginning of the notmuch library as well, with its
interface in notmuch.h. So far we've got create, open, close, and
add_message (all with a notmuch_database prefix).

The current add_message function has already been whittled down from
what we have in notmuch-index-message to add only references,
message-id, and thread-id to the index, (that is---just enough to do
thread-linkage but nothing for full-text searching).

The concept here is to do something quickly so that the user can get
some data into notmuch and start using it. (The most interesting stuff
is then thread-linkage and labels like inbox and unread.)  We can
defer the full-text indexing of the body of the messages for later,
(such as in the background while the user is reading mail).

The initial thread-stitching step is still slower than I would like.
We may have to stop using libgmime for this step as its overhead is
not worth it for the simple case of just parsing the message-id,
references, and in-reply-to headers.
2009-10-18 20:56:30 -07:00