Commit graph

68 commits

Author SHA1 Message Date
Carl Worth
a1135f0b7e Fix add_message and get_filename to strip/re-add the database path.
We now store only a relative path inside the database so the database
is not nicely relocatable.
2009-10-28 16:51:56 -07:00
Carl Worth
cfa228a3d4 notmuch_database_add_message: Sanity check the file as the first thing
This avoids us wasting a bunch of time doing an expensive SHA-1 over a large
file only to discover later that it doesn't even *look* like an email message.
2009-10-28 13:35:10 -07:00
Carl Worth
81861514c9 Tweak formatting of internal error messages.
Was neglecting to print the phrase "Internal error: " before, and for
the duplicate message-ID error it's nice to actually see the duplicate
IDs.
2009-10-28 13:13:23 -07:00
Carl Worth
3a91df21ca index: Store "Full Name <user@example.com>" addressses in the database
We put these is as a separate term so that they can be extracted.
We don't actually need this for searching, since typing an email
address in as a search term will already trigger a phrase search
that does exactly what's wanted.
2009-10-28 13:09:08 -07:00
Carl Worth
f9bbd7baa0 Add full-text indexing using the GMime library for parsing.
This is based on the old notmuch-index-message.cc from early in
the history of notmuch, but considerably cleaned up now that
we have some experience with Xapian and know just what we want
to index, (rather than just blindly trying to index exactly
what sup does).

This does slow down notmuch_database_add_message a *lot*, but I've
got some ideas for getting some time back.
2009-10-28 12:50:10 -07:00
Carl Worth
07aa759b68 Fix segfault in case of the database lock not being available.
We were nicely reporting the lock-aquisition failure, but then marching
along trying to use the database object and just crashing badly.
So don't do that.
2009-10-27 23:57:37 -07:00
Carl Worth
5eaec1e316 Update prefix so that "thread:" can be used in search strings.
It's convenient to be able to do things like:

     notmuch tag -inbox thread:<thread-id>

(even though this can run into a race condition as noted in TODO--the fix
for the race is simply to not run "notmuch new" between reading a thread
with the (not yet existent) "notmuch show" and removing its inbox tag
with a command like the above). So we now allow such a thing.
2009-10-27 23:55:08 -07:00
Carl Worth
203a717d64 notmuch_database_add_message: Do not return a message on failure.
The recent, disastrous failure of "notmuch new" would have been
avoided with this change. The new_command function was basically
assuming that it would only get a message object on success so
wasn't destroying the message in the other cases.
2009-10-27 16:19:20 -07:00
Carl Worth
854f82fb91 notmuch_database_close: Explicitly flush the Xapian database.
This would have helped with the recent bug causing "notmuch new"
to not record any results in the database. I'm not sure why
the explicit flush would be required, (shouldn't the destructor
always ensure that things flush?), but perhaps some outstanding
references from the leak prevented that.

In any case, an explicit flush on close() seems to make sense.
2009-10-27 16:17:22 -07:00
Carl Worth
31db02a8c1 notmuch restore: Fix to remove all tags before adding tags.
This means that the restore operation will now properly pick up the
removal of tags indicated by the tag just not being present in the
dump file.

We added a few new public functions in order to support this:

	notmuch_message_freeze
	notmuch_message_remove_all_tags
	notmuch_message_thaw
2009-10-26 22:53:39 -07:00
Carl Worth
ae0bd3f503 add_message: Add an optional parameter for getting the just-added message.
We use this to implement the addition of "inbox" and "unread" tags
for all messages added by "notmuch new".
2009-10-26 21:44:05 -07:00
Carl Worth
8e96a87fff Remove all calls to g_strdup_printf
Replacing them with calls to talloc_asprintf if possible, otherwise
to asprintf (with it's painful error-handling leaving the pointer
undefined).
2009-10-26 15:17:10 -07:00
Carl Worth
70f9d0ad42 Drop dead function add_term.
Even with the recent warnings work, gcc didn't tell me about a static
function that I'm not calling? Apparently I get "defined but not
used" in C files, but not C++ files. That's bogus, and yet one more
reason for me to push the C++ to a minimal lower layer.
2009-10-25 16:14:07 -07:00
Carl Worth
3bd4a2eaaa Add -Wswitch-enum and fix warnings.
Having to enumerate all the enum values at every switch is annoying,
but this warning actually found a bug, (missing support for
NOTMUCH_STATUS_OUT_OF_MEMORY in notmuch_status_to_string).
2009-10-25 16:03:45 -07:00
Carl Worth
c7482b4dce Add -Wmising-declarations and fix warnings.
Wow, lots of missing 'static' on internal functions.
2009-10-25 15:58:05 -07:00
Carl Worth
be9e3ee313 _notmuch_database_linke_message: Fix error-status propagation.
The _notmuch_database_link_message_to_parents function was void
in an earlier draft. Now, ensure that we don't miss any error
return value from it.
2009-10-25 15:01:20 -07:00
Carl Worth
a360670c03 Change database to store only a single thread ID per message.
Instead of supporting multiple thread IDs, we now merge together
thread IDs if one message is ever found to belong to more than one
thread. This allows for constructing complete threads when, for
example, a child message doesn't include a complete list of References
headers back to the beginning of the thread.

It also simplifies dealing with mapping a message ID to a thread ID
which is now a simple get_thread_id just like get_message_id, (and no
longer an iterator-based thing like get_tags).
2009-10-25 14:54:13 -07:00
Carl Worth
ec77f6b50c link_message: Remove dead code.
We dropped the THREAD_ID value from the database a while back, but here
is code that's carefully computing that value and then never doing
anything with it. Delete, delete, delete.
2009-10-25 11:05:16 -07:00
Carl Worth
6b20dbff86 add_message: Pull the thread-stitching portion out into new _notmuch_database_link_message
The function was getting too long-winded before. Add since I'm about
to change how we handle the thread linking, it's convenient to have
it in an isolated function.
2009-10-25 11:03:55 -07:00
Carl Worth
7b227a6bf7 Add an INTERNAL_ERROR macro and use it for all internal errors.
We were previously just doing fprintf;exit at each point, but I
wanted to add file and line-number details to all messages, so it
makes sense to use a single macro for that.
2009-10-25 10:54:49 -07:00
Carl Worth
3b8e3ab666 add_message: Propagate error status from notmuch_message_create_for_message_id
What a great feeling to remove an XXX comment.
2009-10-25 10:54:43 -07:00
Carl Worth
32ecfe72a1 Add comment documenting our current database schema.
I've got schemes to change this schema somewhat dramatically, so I
want a place to be able to record and review those changes.
2009-10-25 08:57:09 -07:00
Carl Worth
1c2bac747e Drop the storage of thread ID(s) in a value.
Now that we are iterating over the thread terms instead, we can
drop this redundant storage (which should shrink our database a
tiny bit).
2009-10-25 00:31:20 -07:00
Carl Worth
9ec68aa9c4 Shuffle the value numbers around in the database.
First, it's nice that for now we don't have any users yet, so we
can make incompatible changes to the database layout like this
without causing trouble. ;-)

There are a few reasons for this change. First, we now use value 0
uniformly as a timestamp for both mail and timestamp documents, (which
lets us cleanup an ugly and fragile bare 0 in the add_value and
get_value calls in the timestamp code).

Second, I want to drop the thread value entirely, so putting it at the
end of the list means we can drop it as compatible change in the
future. (I almost want to drop the message-ID value too, but it's nice
to be able to sort on it to get diff-able output from "notmuch dump".)

But the thread value we never use as a value, (we would never sort on
it, for example). And it's totally redundant with the thread terms we
store already. So expect it to disappear soon.
2009-10-24 23:05:08 -07:00
Carl Worth
65a272832e Invent our own prefix values.
We're now dropping all pretense of keeping the database directly
compatible with sup's current xapian backend. (But perhaps someone
might write a new nothmuch backend for sup in the future.)

In coming up with the prefix values here, I tried to follow the
conventions of http://xapian.org/docs/omega/termprefixes.html as
closely as makes sense, (with some domain translation from "web"
to "email archive").
2009-10-24 22:57:47 -07:00
Carl Worth
0aa355cc8f Split BOOLEAN_PREFIX into INTERNAL and EXTERNAL subsets.
The idea here is that only some of the prefix names (such as "id" and
"tag") actually make sense in external user-supplied query
strings. Other things like "type" are internal implementation details
of how we store things in the database. So internal machinery will add
those terms to the database and we don't need to support them in the
string itself.

With this, we can now simply loop over the external prefix values to
let the quiery parser know about them. So as we add prefixes in the
future, we'll only need to add them to this list.
2009-10-24 22:38:43 -07:00
Carl Worth
2a9b4fce7c Change all occurrences of "msgid" to "id".
What's good for the user is good for the internals.
2009-10-24 22:29:49 -07:00
Carl Worth
aa46a683a8 Add the magic to allow searches such as "tag:inbox".
The key for this is call add_boolean_prefix on the QueryParser
object. That tells the query parser to take something like "tag:inbox"
and transform it into the "Linbox" term and do what it needs to do to
make this term a requirement of the search. We're starting to have a
real system here.

Also, I didn't want to expose the ugly name of "msgid" to the user, so
we add a prefix name of simply "id" instead.
2009-10-24 22:23:58 -07:00
Carl Worth
0bc73af96c Fix timestamp generation to avoid overflowing the term limit
The previous code was only correct as long as the timestamp prefix
was only a single character. But with the recent change to a
multi-character prefix, this broke. So fix it now.
2009-10-24 22:10:03 -07:00
Carl Worth
f281f4b677 Trim down prefix list to things we are actually using.
I've decided not to try for sup compatibility at the leve of the
xapian datbase. There's just too much about sup's usage of the
database that I don't like, (beyond the embedded ruby data structures
there is redundant storage of message IDs, thread IDs, and dates (in
both terms and values)).

I'm going to fix that up in the database of notmuch, with some other
changes as well. (I plan to drop "reference" terms once linkage to a
thread ID through the reference is established.  I also plan to add
actual documents to represent threads.)

So with all that incompatibility, I might as well make my own prefix
values. And while doing that, I should try to be as compatible as
possible with the conventions described here:

http://xapian.org/docs/omega/termprefixes.html
2009-10-24 22:04:59 -07:00
Carl Worth
e37b7cc2da Move the prefix-string arrays back into database.cc from message.cc
Yes, I'm being wishy-washy here, moving code back and forth. But
this is where these really do belong.
2009-10-24 21:52:48 -07:00
Carl Worth
b3cbcea8fd Add NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID
And document that notmuch_database_add_message can return this
value. This pushes the hard decision of what to do with duplicate
messages out to the user, but that's OK. (We weren't really doing
anything with these ourselves, and this way the user is at least
informed of the issue, rather than it just getting papered over
internally.)
2009-10-23 14:40:33 -07:00
Carl Worth
5ebb21600e Clarify documentation and error string for NOTMUCH_STATUS_TAG_TOO_LONG
It's helpful to point out NOTMUCH_STATUS_TAG_MAX for users.
2009-10-23 14:36:38 -07:00
Carl Worth
68a10091d6 Add notmuch_database_set_timestamp and notmuch_database_get_timestamp
These will be very helpful to implement an efficient "notmuch new"
command which imports new mail messages that have appeared.
2009-10-23 14:31:01 -07:00
Carl Worth
668f20bdfb database: Add private find_unique_doc_id and find_unique_document functions
These are a generalization of the unique-ness testing of
notmuch_database_find_message. More preparation for
firectory timestamps.
2009-10-23 14:24:07 -07:00
Carl Worth
edbf7f645c database: Similarly rename find_message_by_docid to find_document_for_doc_id
Again preferring notmuch_database_t* over Xapian::Database*.

Also, we're standardizing on "doc_id" rather than "docid" locally, (as
an analoge to "message_id"), in spite of the "Xapian::docid" name,
(which, fortunately, we can ignore and just us "unsigned int" instead).
2009-10-23 14:12:06 -07:00
Carl Worth
9fc4a365d6 database: Rename internal find_messages_by_term to find_doc_ids
This name is a more accurate description of what it does, and
the more general naming will make sense as we start storing
non-message documents in the database (such as directory
timestamps).

Also, don't pass around a Xapian::Database where it's more our
style to pass a notmuch_database_t*.
2009-10-23 14:06:24 -07:00
Carl Worth
6ccdffcd87 add_message: Fix to not add multiple documents with the same message ID
Here's the second big fix to message-ID handling, (the first was to
generate message IDs when an email contained none). Now, with no
document missing a message ID, and no two documents having the same
message ID, we have a nice consistent database where the message ID
can be used as a unique key.
2009-10-23 06:00:10 -07:00
Carl Worth
31044d10ed add_message: Re-order the code a bit (find message-id first).
We're preparing for being able to deal with files with duplicate
message IDs here. The plan is to create a notmuch_message_t object in
add_message that may or may not reference a document that exists in
the database. So to do this, we have to find the message ID before we
do any manipulation of the doc.
2009-10-23 05:30:37 -07:00
Carl Worth
c78358fa8a Move thread_id generation code from database.cc to message.cc
It's really up to the message to decide how to generate these.
2009-10-23 05:25:58 -07:00
Carl Worth
1ecdef59f5 add_message: Rename message to message_file
I still don't like the name message_file at all, but we're about
to start using a notmuch_message_t in this function so we need
to do something to keep the identifiers separate for now.

Eventually, it probably makes sense to push the message-parsing
code from database.cc to message.cc.
2009-10-23 05:13:42 -07:00
Carl Worth
77f9d3ee0e Don't forget the "to" header when restrict parsing to certain headers
We recently started discarding files as "not email" if they have none
of Subject, From, nor To. Apaprently, my mail collection contains a
number of messages that I sent, that are saved without Subject and
From, (perhaps these were drafts?).

Anyway, it's fortunate I had those since they alerted me to this bug,
where we were not parsing the "To" header in some cases.
2009-10-22 15:34:47 -07:00
Carl Worth
90f93fc9c7 Fix missing error check.
The notmuch_message_file_open function is perfectly capable of
returning NULL. So check for it.
2009-10-22 15:33:56 -07:00
Carl Worth
6a4992bc61 Generate message ID (using SHA1) when a mail message contains none.
This is important as we're using the message ID as the unique key
in our database. So previously, all messages with no message ID
would be treated as the same message---not good at all.
2009-10-22 15:31:56 -07:00
Carl Worth
84480738a5 Merge branch from fixing up bugs after bisecting.
I'm glad that when I implemented "notmuch restore" I went through the
extra effort to take the code I had written in one sitting into over a
dozen commits. Sure enough, I hadn't tested well enough and had
totally broken "notmuch setup", (segfaults and bogus thread_id
values).

With the little commits I had made, git bisect saved the day, and I
went back to make the fixes right on top of the commits that
introduced the bugs. So now we octopus merge those in.
2009-10-21 23:23:44 -07:00
Carl Worth
c58ee818b5 Bring back the insert_thread_id function.
We deleted this in favor of our fancy new thread_ids iterator
from the message object. But one of the previous callers of
insert_thread_id isn't using notmuch_message_t yet. I made
the mistake of thinking I could just call g_hash_table_insert
directly, but the problem was that nobody was splitting
up the thread_id string at its commas.

So with this, we were inserting bogus comma-separated IDs
into the hash table, so thread_id values were ballooning
out of control. Should be much better now.
2009-10-21 23:21:12 -07:00
Carl Worth
302d54834d Add notmuch_status_to_string function.
Be kind and let the user print error messages, not just error
codes.
2009-10-21 16:12:53 -07:00
Carl Worth
defd216487 Add notmuch_message_add_tag and notmuch_message_remove_tag
With these two added, we now have enough functionality in the
library to implement "notmuch restore".
2009-10-21 15:56:33 -07:00
Carl Worth
6c5054ebee database: Add new notmuch_database_find_message
With this function, and the recently added support for
notmuch_message_get_thread_ids, we now recode the find_thread_ids
function to work just the way we expect a user of the public
notmuch API to work. Not too bad really.
2009-10-21 15:40:20 -07:00
Carl Worth
22b2265cac Rename NOTMUCH_MAX_TERM to NOTMUCH_TERM_MAX
Just better consistency with our naming schemes.
2009-10-21 14:10:00 -07:00