notmuch

mirror of https://git.notmuchmail.org/git/notmuch synced 2024-11-22 19:08:09 +01:00

Author	SHA1	Message	Date
Carl Worth	32ecfe72a1	Add comment documenting our current database schema. I've got schemes to change this schema somewhat dramatically, so I want a place to be able to record and review those changes.	2009-10-25 08:57:09 -07:00
Carl Worth	1c2bac747e	Drop the storage of thread ID(s) in a value. Now that we are iterating over the thread terms instead, we can drop this redundant storage (which should shrink our database a tiny bit).	2009-10-25 00:31:20 -07:00
Carl Worth	9ec68aa9c4	Shuffle the value numbers around in the database. First, it's nice that for now we don't have any users yet, so we can make incompatible changes to the database layout like this without causing trouble. ;-) There are a few reasons for this change. First, we now use value 0 uniformly as a timestamp for both mail and timestamp documents, (which lets us cleanup an ugly and fragile bare 0 in the add_value and get_value calls in the timestamp code). Second, I want to drop the thread value entirely, so putting it at the end of the list means we can drop it as compatible change in the future. (I almost want to drop the message-ID value too, but it's nice to be able to sort on it to get diff-able output from "notmuch dump".) But the thread value we never use as a value, (we would never sort on it, for example). And it's totally redundant with the thread terms we store already. So expect it to disappear soon.	2009-10-24 23:05:08 -07:00
Carl Worth	65a272832e	Invent our own prefix values. We're now dropping all pretense of keeping the database directly compatible with sup's current xapian backend. (But perhaps someone might write a new nothmuch backend for sup in the future.) In coming up with the prefix values here, I tried to follow the conventions of http://xapian.org/docs/omega/termprefixes.html as closely as makes sense, (with some domain translation from "web" to "email archive").	2009-10-24 22:57:47 -07:00
Carl Worth	0aa355cc8f	Split BOOLEAN_PREFIX into INTERNAL and EXTERNAL subsets. The idea here is that only some of the prefix names (such as "id" and "tag") actually make sense in external user-supplied query strings. Other things like "type" are internal implementation details of how we store things in the database. So internal machinery will add those terms to the database and we don't need to support them in the string itself. With this, we can now simply loop over the external prefix values to let the quiery parser know about them. So as we add prefixes in the future, we'll only need to add them to this list.	2009-10-24 22:38:43 -07:00
Carl Worth	2a9b4fce7c	Change all occurrences of "msgid" to "id". What's good for the user is good for the internals.	2009-10-24 22:29:49 -07:00
Carl Worth	aa46a683a8	Add the magic to allow searches such as "tag:inbox". The key for this is call add_boolean_prefix on the QueryParser object. That tells the query parser to take something like "tag:inbox" and transform it into the "Linbox" term and do what it needs to do to make this term a requirement of the search. We're starting to have a real system here. Also, I didn't want to expose the ugly name of "msgid" to the user, so we add a prefix name of simply "id" instead.	2009-10-24 22:23:58 -07:00
Carl Worth	0bc73af96c	Fix timestamp generation to avoid overflowing the term limit The previous code was only correct as long as the timestamp prefix was only a single character. But with the recent change to a multi-character prefix, this broke. So fix it now.	2009-10-24 22:10:03 -07:00
Carl Worth	f281f4b677	Trim down prefix list to things we are actually using. I've decided not to try for sup compatibility at the leve of the xapian datbase. There's just too much about sup's usage of the database that I don't like, (beyond the embedded ruby data structures there is redundant storage of message IDs, thread IDs, and dates (in both terms and values)). I'm going to fix that up in the database of notmuch, with some other changes as well. (I plan to drop "reference" terms once linkage to a thread ID through the reference is established. I also plan to add actual documents to represent threads.) So with all that incompatibility, I might as well make my own prefix values. And while doing that, I should try to be as compatible as possible with the conventions described here: http://xapian.org/docs/omega/termprefixes.html	2009-10-24 22:04:59 -07:00
Carl Worth	e37b7cc2da	Move the prefix-string arrays back into database.cc from message.cc Yes, I'm being wishy-washy here, moving code back and forth. But this is where these really do belong.	2009-10-24 21:52:48 -07:00
Carl Worth	b3cbcea8fd	Add NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID And document that notmuch_database_add_message can return this value. This pushes the hard decision of what to do with duplicate messages out to the user, but that's OK. (We weren't really doing anything with these ourselves, and this way the user is at least informed of the issue, rather than it just getting papered over internally.)	2009-10-23 14:40:33 -07:00
Carl Worth	5ebb21600e	Clarify documentation and error string for NOTMUCH_STATUS_TAG_TOO_LONG It's helpful to point out NOTMUCH_STATUS_TAG_MAX for users.	2009-10-23 14:36:38 -07:00
Carl Worth	68a10091d6	Add notmuch_database_set_timestamp and notmuch_database_get_timestamp These will be very helpful to implement an efficient "notmuch new" command which imports new mail messages that have appeared.	2009-10-23 14:31:01 -07:00
Carl Worth	668f20bdfb	database: Add private find_unique_doc_id and find_unique_document functions These are a generalization of the unique-ness testing of notmuch_database_find_message. More preparation for firectory timestamps.	2009-10-23 14:24:07 -07:00
Carl Worth	edbf7f645c	database: Similarly rename find_message_by_docid to find_document_for_doc_id Again preferring notmuch_database_t* over Xapian::Database*. Also, we're standardizing on "doc_id" rather than "docid" locally, (as an analoge to "message_id"), in spite of the "Xapian::docid" name, (which, fortunately, we can ignore and just us "unsigned int" instead).	2009-10-23 14:12:06 -07:00
Carl Worth	9fc4a365d6	database: Rename internal find_messages_by_term to find_doc_ids This name is a more accurate description of what it does, and the more general naming will make sense as we start storing non-message documents in the database (such as directory timestamps). Also, don't pass around a Xapian::Database where it's more our style to pass a notmuch_database_t*.	2009-10-23 14:06:24 -07:00
Carl Worth	6ccdffcd87	add_message: Fix to not add multiple documents with the same message ID Here's the second big fix to message-ID handling, (the first was to generate message IDs when an email contained none). Now, with no document missing a message ID, and no two documents having the same message ID, we have a nice consistent database where the message ID can be used as a unique key.	2009-10-23 06:00:10 -07:00
Carl Worth	31044d10ed	add_message: Re-order the code a bit (find message-id first). We're preparing for being able to deal with files with duplicate message IDs here. The plan is to create a notmuch_message_t object in add_message that may or may not reference a document that exists in the database. So to do this, we have to find the message ID before we do any manipulation of the doc.	2009-10-23 05:30:37 -07:00
Carl Worth	c78358fa8a	Move thread_id generation code from database.cc to message.cc It's really up to the message to decide how to generate these.	2009-10-23 05:25:58 -07:00
Carl Worth	1ecdef59f5	add_message: Rename message to message_file I still don't like the name message_file at all, but we're about to start using a notmuch_message_t in this function so we need to do something to keep the identifiers separate for now. Eventually, it probably makes sense to push the message-parsing code from database.cc to message.cc.	2009-10-23 05:13:42 -07:00
Carl Worth	77f9d3ee0e	Don't forget the "to" header when restrict parsing to certain headers We recently started discarding files as "not email" if they have none of Subject, From, nor To. Apaprently, my mail collection contains a number of messages that I sent, that are saved without Subject and From, (perhaps these were drafts?). Anyway, it's fortunate I had those since they alerted me to this bug, where we were not parsing the "To" header in some cases.	2009-10-22 15:34:47 -07:00
Carl Worth	90f93fc9c7	Fix missing error check. The notmuch_message_file_open function is perfectly capable of returning NULL. So check for it.	2009-10-22 15:33:56 -07:00
Carl Worth	6a4992bc61	Generate message ID (using SHA1) when a mail message contains none. This is important as we're using the message ID as the unique key in our database. So previously, all messages with no message ID would be treated as the same message---not good at all.	2009-10-22 15:31:56 -07:00
Carl Worth	84480738a5	Merge branch from fixing up bugs after bisecting. I'm glad that when I implemented "notmuch restore" I went through the extra effort to take the code I had written in one sitting into over a dozen commits. Sure enough, I hadn't tested well enough and had totally broken "notmuch setup", (segfaults and bogus thread_id values). With the little commits I had made, git bisect saved the day, and I went back to make the fixes right on top of the commits that introduced the bugs. So now we octopus merge those in.	2009-10-21 23:23:44 -07:00
Carl Worth	c58ee818b5	Bring back the insert_thread_id function. We deleted this in favor of our fancy new thread_ids iterator from the message object. But one of the previous callers of insert_thread_id isn't using notmuch_message_t yet. I made the mistake of thinking I could just call g_hash_table_insert directly, but the problem was that nobody was splitting up the thread_id string at its commas. So with this, we were inserting bogus comma-separated IDs into the hash table, so thread_id values were ballooning out of control. Should be much better now.	2009-10-21 23:21:12 -07:00
Carl Worth	302d54834d	Add notmuch_status_to_string function. Be kind and let the user print error messages, not just error codes.	2009-10-21 16:12:53 -07:00
Carl Worth	defd216487	Add notmuch_message_add_tag and notmuch_message_remove_tag With these two added, we now have enough functionality in the library to implement "notmuch restore".	2009-10-21 15:56:33 -07:00
Carl Worth	6c5054ebee	database: Add new notmuch_database_find_message With this function, and the recently added support for notmuch_message_get_thread_ids, we now recode the find_thread_ids function to work just the way we expect a user of the public notmuch API to work. Not too bad really.	2009-10-21 15:40:20 -07:00
Carl Worth	22b2265cac	Rename NOTMUCH_MAX_TERM to NOTMUCH_TERM_MAX Just better consistency with our naming schemes.	2009-10-21 14:10:00 -07:00
Carl Worth	6142216132	Move find_prefix function from database.cc to message.cc It's definitely a better fit there for now, (and can likely eventually be made static as add_term moves from database to message as well).	2009-10-21 14:07:40 -07:00
Carl Worth	17b3c214ea	Convert notmuch_database_t to start using talloc. This will be handy as we can hang future talloc allocations off of the datbase now.	2009-10-21 14:00:37 -07:00
Carl Worth	d29a6ec791	notmuch setup: Collapse internal whitespace within message-id I'm too lazy to see what the RFC says, but I know that having whitespace inside a message-ID is sure to confuse things. And besides, this makes things more compatible with sup so that I have some hope of importing sup labels.	2009-10-21 10:07:34 -07:00
Carl Worth	65baa4f4e7	notmuch dump: Fix the sorting of results. To properly support sorting in notmuch_query we know use an Enquire object. We also throw in a QueryParser too, so we're really close to being able to support arbitrary full-text searches. I took a look at the supported QueryParser syntax and chose a set of flags for everything I like, (such as supporting Boolean operators in either case ("AND" or "and"), supporting phrase searching, supporting + and - to include/preclude terms, and supporting a trailing * on any term as a wildcard).	2009-10-21 00:35:56 -07:00
Carl Worth	6a3b68edef	add_message: Add a type:mail ("Kmail") term to all documents. This gives us an easy way to specify "all mail messages" in a search query. We simply look for this term.	2009-10-21 00:34:36 -07:00
Carl Worth	50144fb354	database: Remove two little bits of dead code.	2009-10-20 23:12:53 -07:00
Carl Worth	466a7bbf62	Implement 'notmuch dump'. This is a fairly big milestone for notmuch. It's our first command to do anything besides building the index, so it proves we can actually read valid results out from the index. It also puts in place almost all of the API and infrastructure we will need to allow searching of the database. Finally, with this change we are now using talloc inside of notmuch which is truly a delight to use. And now that I figured out how to use C++ objects with talloc allocation, (it requires grotty parts of C++ such as "placement new" and "explicit destructors"), we are valgrind-clean for "notmuch dump", (as in "no leaks are possible").	2009-10-20 21:21:39 -07:00
Carl Worth	cd4a8734d3	Rename private notmuch_message_t to notmuch_message_file_t This is in preparation for a new, public notmuch_message_t. Eventually, the public notmuch_message_t is going to grow enough features to need to be file-backed and will likely need everything that's now in message-file.c. So we may fold these back into one object/implementation in the future.	2009-10-20 15:09:51 -07:00
Carl Worth	5a84df0f15	add_message: Fix memory leak of thread_ids GPtrArray. We were properly feeing this memory when the thread-ids list was not empty, but leaking it when it was. Thanks, of course, to valgrind along with the G_SLICE=always-malloc environment variable which makes leak checking with glib almost bearable.	2009-10-20 13:05:45 -07:00
Carl Worth	e6236b88fd	database.cc: Document better pieces of glib that we're using.	2009-10-20 12:49:32 -07:00
Carl Worth	968feafbad	notmuch_database_open: Fix error message for file-not-found. I was incorrectly using the return value of stat (-1) instead of errno (ENOENT) to try to construct the error message here. Also, while we're here, reword the error message to not have "stat" in it, which in spite of what a Unix programmer will tell you, is not actually a word.	2009-10-20 10:14:00 -07:00
Carl Worth	55c8ee9a86	notmuch_database_create/open: Fix to handle NULL as documented. When documenting these functions I described support for a NOTMUCH_BASE environment variable to be consulted in the case of a NULL path. Only, I had forgotten to actually write the code. This code exists now, with a new, exported function: notmuch_database_default_path	2009-10-20 09:58:40 -07:00
Carl Worth	ad784f38ce	notmuch: Ignore files that don't look like email messages. This is helpful for things like indexes that other mail programs may have left around. It also means we can make the initial instructions much easier, (the user need not worry about moving away auxiliary files from some other email program).	2009-10-19 23:16:05 -07:00
Carl Worth	45f0d7bcab	Don't hash headers we won't end up using. Just saving a little work here.	2009-10-19 13:48:13 -07:00
Carl Worth	c5eea2b77e	Document which pieces of glib we're still using. Looks like we can copy in a hash-table implementation, (from cairo, say), and then a few _ascii_ functions from glib, (we'll need to switch a few current uses if things like isspace, etc. to locale- independent versions as well). So not too hard to free ourselves of glib for now, (until we add GMime back in later, of course).	2009-10-19 13:40:56 -07:00
Carl Worth	fa562fa22b	Hook up our fancy new notmuch_parse_date function. With all the de-glib-ification out of the way, we can now use it to allow for date-based sorting of Xapian search results.	2009-10-19 13:35:29 -07:00
Carl Worth	0e777a8f80	notmuch: Switch from gmime to custom, ad-hoc parsing of headers. Since we're currently just trying to stitch together In-Reply-To and References headers we don't need that much sophistication. It's when we later add full-text searching that GMime will be useful. So for now, even though my own code here is surely very buggy compared to GMime it's also a lot faster. And speed is what we're after for the initial index creation.	2009-10-19 13:00:43 -07:00
Carl Worth	10c176ba0e	notmuch: Start actually adding messages to the index. This is the beginning of the notmuch library as well, with its interface in notmuch.h. So far we've got create, open, close, and add_message (all with a notmuch_database prefix). The current add_message function has already been whittled down from what we have in notmuch-index-message to add only references, message-id, and thread-id to the index, (that is---just enough to do thread-linkage but nothing for full-text searching). The concept here is to do something quickly so that the user can get some data into notmuch and start using it. (The most interesting stuff is then thread-linkage and labels like inbox and unread.) We can defer the full-text indexing of the body of the messages for later, (such as in the background while the user is reading mail). The initial thread-stitching step is still slower than I would like. We may have to stop using libgmime for this step as its overhead is not worth it for the simple case of just parsing the message-id, references, and in-reply-to headers.	2009-10-18 20:56:30 -07:00

47 commits