Commit graph

94 commits

Author SHA1 Message Date
Carl Worth
99cfa27030 Add support for folder-based searching.
A new "folder:" prefix in the query string can now be used to match
the directories in which mail files are stored.

The addition of this feature causes the recently added
search-by-folder tests to now pass.
2011-01-15 15:37:43 -08:00
Austin Clements
b3caef1f06 Optimize thread search using matched docid sets.
This reduces thread search's 1+2t Xapian queries (where t is the
number of matched threads) to 1+t queries and constructs exactly one
notmuch_message_t for each message instead of 2 to 3.
notmuch_query_search_threads eagerly fetches the docids of all
messages matching the user query instead of lazily constructing
message objects and fetching thread ID's from term lists.
_notmuch_thread_create takes a seed docid and the set of all matched
docids and uses a single Xapian query to expand this docid to its
containing thread, using the matched docid set to determine which
messages in the thread match the user query instead of using a second
Xapian query.

This reduces the amount of time required to load my inbox from 4.523
seconds to 3.025 seconds (1.5X faster).
2010-12-07 16:40:05 -08:00
Carl Worth
95dd5fe5d7 notmuch_message_tags_to_maildir_flags: Do nothing outside of "new" and "cur"
Some people use notmuch with non-maildir files, (for example, email
messages in MH format, or else cool things like using sluk[*] to suck
down feeds into a format that notmuch can index).

To better support uses like that, don't do any renaming for files that
are not in a directory named either "new" or "cur".

[*] https://github.com/krl/sluk/
2010-11-11 14:32:17 -08:00
Carl Worth
1d02dd64af lib: Add new, public notmuch_message_get_filenames
This augments the existing notmuch_message_get_filename by allowing
the caller access to all filenames in the case of multiple files for a
single message.

To support this, we split the iterator (notmuch_filenames_t) away from
the list storage (notmuch_filename_list_t) where previously these were
a single object (notmuch_filenames_t). Then, whenever the user asks
for a file or filename, the message object lazily creates a complete
notmuch_filename_list_t and then:

	For notmuch_message_get_filename, returns the first filename
	in the list.

	For notmuch_message_get_filenames, creates and returns a new
	iterator for the filename list.
2010-11-11 03:40:19 -08:00
Carl Worth
d87db88432 lib: Add new implementation of notmuch_filenames_t
The new implementation is simply a talloc-based list of strings. The
former support (a list of database terms with a common prefix) is
implemented by simply pre-iterating over the terms and populating the
list. This should provide no performance disadvantage as callers of
thigns like notmuch_directory_get_child_files are very likely to
always iterate over all filenames anyway.

This new implementation of notmuch_filenames_t is in preparation for
adding API to query all of the filenames for a single message.
2010-11-11 03:40:19 -08:00
Carl Worth
bb74e9dff8 lib: Rework interface for maildir_flags synchronization
Instead of having an API for setting a library-wide flag for
synchronization (notmuch_database_set_maildir_sync) we instead
implement maildir synchronization with two new library functions:

	notmuch_message_maildir_flags_to_tags
  and   notmuch_message_tags_to_maildir_flags

These functions are nicely documented here, (though the implementation
does not quite match the documentation yet---as plainly evidenced by
the current results of the test suite).
2010-11-11 03:40:19 -08:00
Michal Sojka
088801a14a Maildir synchronization
This patch allows bi-directional synchronization between maildir
flags and certain tags. The flag-to-tag mapping is defined by flag2tag
array.

The synchronization works this way:

1) Whenever notmuch new is executed, the following happens:
   o New messages are tagged with configured new_tags.
   o For new or renamed messages with maildir info present in the file
     name, the tags defined in flag2tag are either added or removed
     depending on the flags from the file name.

2) Whenever notmuch tag (or notmuch restore) is executed, a new set of
   flags based on the tags is constructed for every message and a new
   file name is prepared based on the old file name but with the new
   flags. If the flags differs and the old message was in 'new'
   directory then this is replaced with 'cur' in the new file name. If
   the new and old file names differ, the file is renamed and notmuch
   database is updated accordingly.

   The rename happens before the database is updated. In case of crash
   between rename and database update, the next run of notmuch new
   brings the database in sync with the mail store again.
2010-11-10 13:09:31 -08:00
Carl Worth
c81cecf620 lib: Add GCC visibility(hidden) pragmas to private header files.
This prevents any of the private functions from being leaked out
through the library interface (at least when compiling with a
recent-enough gcc to support the visibility pragma).
2010-11-01 22:35:48 -07:00
Carl Worth
7b78eb4af6 Add support (and tests) for messages with really long message IDs.
Scott Henson reported an internal error that occurred when he tried to
add a message that referenced another message with a message ID well
over 300 characters in length. The bug here was running into a Xapian
limit for the length of metadata key names, (which is even more
restrictive than the Xapian limit for the length of terms).

We fix this by noticing long message ID values and instead using a
message ID of the form "notmuch-sha1-<sha1_sum_of_message_id>". That
is, we use SHA1 to generate a compressed, (but still unique), version
of the message ID.

We add support to the test suite to exercise this fix. The tests add a
message referencing the long message ID, then add the message with the
long message ID, then finally add another message referencing the long
ID. Each of these tests exercise different code paths where the
special handling is implemented.

A final test ensures that all three messages are stitched together
into a single thread---guaranteeing that the three code paths all act
consistently.
2010-06-04 13:35:07 -07:00
Carl Worth
98845fdbb2 Avoid database corruption by not adding partially-constructed mail documents.
Previously we were using Xapian's add_document to allocate document ID
values for notmuch_message_t objects.  This had the drawback of adding
a partially constructed mail document to the database. If notmuch was
subsequently interrupted before fully populating this document, then
later runs would be quite confused when seeing the partial documents.

There are reports from the wild of people hitting internal errors of
the form "Message ... has no thread ID" for example, (which is
currently an unrecoverable error).

We fix this by manually allocating document IDs without adding
documents. With this change, we never call Xapian's add_document
method, but only replace_document with either the current document ID
of a message or a new one that we have allocated.
2010-06-04 10:16:53 -07:00
Dirk Hohndel
5b8b0377cb Make Received: header special in notmuch_message_file_get_header
With this patch the Received: header becomes special in the way
we treat headers - this is the only header for which we concatenate
all the instances we find (instead of just returning the first one).

This will be used in the From guessing code for replies as we need to
be able to walk ALL of the Received: headers in a message to have a
good chance to guess which mailbox this email was delivered to.

Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
2010-04-26 14:44:06 -07:00
Dirk Hohndel
57561414d7 Add authors member to message
message->authors contains the author's name (as we want to print it)
get / set methods are declared in notmuch-private.h

Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
2010-04-26 11:44:49 -07:00
Jesse Rosenthal
4971b85641 Name thread based on matching msgs instead of first msg.
At the moment all threads are named based on the name of the first message
in the thread. However, this can cause problems if people either start
new threads by replying-all (as unfortunately, many out there do) or
change the subject of their mails to reflect a shift in a thread on a
list.

This patch names threads based on (a) matches for the query, and (b) the
search order. If the search order is oldest-first (as in the default
inbox) it chooses the oldest matching message as the subject. If the
search order is newest-first it chooses the newest one.

Reply prefixes ("Re: ", "Aw: ", "Sv: ", "Vs: ") are ignored
(case-insensitively) so a Re: won't change the subject.

Note that this adds a "sort" argument to _notmuch_thread_create and
_thread_add_matched_message, so that when constructing the thread we can
be aware of the sort order.

Signed-off-by: Jesse Rosenthal <jrosenthal@jhu.edu>
2010-04-21 14:56:53 -07:00
Carl Worth
4e5d2f22db lib: Rename iterator functions to prepare for reverse iteration.
We rename 'has_more' to 'valid' so that it can function whether
iterating in a forward or reverse direction. We also rename
'advance' to 'move_to_next' to setup parallel naming with
the proposed functions 'move_to_first', 'move_to_last', and
'move_to_previous'.
2010-03-09 09:22:29 -08:00
Carl Worth
d12801c8b4 lib: Split the database upgrade into two phases for safer operation.
The first phase copies data from the old format to the new format
without deleting anything. This allows an old notmuch to still use the
database if the upgrade process gets interrupted. The second phase
performs the deletion (after updating the database version number). If
the second phase is interrupted, there will be some unused data in the
database, but it shouldn't cause any actual harm.
2010-01-09 11:13:12 -08:00
Carl Worth
909f52bd8c lib: Implement versioning in the database and provide upgrade function.
The recent support for renames in the database is our first time
(since notmuch has had more than a single user) that we have a
database format change. To support smooth upgrades we now encode a
database format version number in the Xapian metadata.

Going forward notmuch will emit a warning if used to read from a
database with a newer version than it natively supports, and will
refuse to write to a database with a newer version.

The library also provides functions to query the database format
version:

	notmuch_database_get_version

to ask if notmuch wants a newer version than that:

	notmuch_database_needs_upgrade

and a function to actually perform that upgrade:

	notmuch_database_upgrade
2010-01-07 18:26:31 -08:00
Carl Worth
807aef93d3 Prefer READ_ONLY consistently over READONLY.
Previously we had NOTMUCH_DATABASE_MODE_READ_ONLY but
NOTMUCH_STATUS_READONLY_DATABASE which was ugly and confusing. Rename
the latter to NOTMUCH_STATUS_READ_ONLY_DATABASE for consistency.
2010-01-07 10:29:05 -08:00
Carl Worth
f93b7218c3 lib: Consolidate checks for read-only database.
Previously, many checks were deep in the library just before a cast
operation. These have now been replaced with internal errors and new
checks have instead been added at the beginning of all top-levelentry
points requiring a read-write database.

The new checks now also use a single function for checking and
printing the error message. This will give us a convenient location to
extend the check, (such as based on database version as well).
2010-01-07 10:19:44 -08:00
Carl Worth
d807e28f43 lib: Implement new notmuch_directory_t API.
This new directory ojbect provides all the infrastructure needed to
detect when files or directories are deleted or renamed. There's still
code needed on top of this (within "notmuch new") to actually do that
detection.
2010-01-06 10:32:06 -08:00
Carl Worth
498edff503 database: Abstract _filename_to_direntry from _add_message
The code to map a filename to a direntry is something that we're going
to want in a future _remove_message function, so put it in a new
function _notmuch_database_filename_to_direntry .
2010-01-06 10:32:05 -08:00
Carl Worth
1376a90db6 database: Allowing storing multiple filenames for a single message ID.
The library interface is unchanged so far, (still just
notmuch_database_add_message), but internally, the old
_set_filename function is now _add_filename instead.
2010-01-06 10:32:05 -08:00
Carl Worth
6ca6c089e9 database: Store mail filename as a new 'direntry' term, not as 'data'.
Instead of storing the complete message filename in the data portion
of a mail document we now store a 'direntry' term that contains the
document ID of a directory document and also the basename of the
message filename within that directory. This will allow us to easily
store multple filenames for a single message, and will also allow us
to find mail documents for files that previously existed in a
directory but that have since been deleted.
2010-01-06 10:32:05 -08:00
Carl Worth
84742d86ab database: Split _find_parent_id into _split_path and _find_directory_id
Some pending commits want the _split_path functionality separate from
mapping a directory to a document ID. The split_path function now
returns the basename as well as the directory name.
2010-01-06 10:32:05 -08:00
Carl Worth
406ec4b15d database: Export _notmuch_database_find_parent_id for internal use.
We'll soon have mail documents referring to their parent directory's
directory documents, so we'll need access to _find_parent_id in files
such as message.cc.
2010-01-06 10:32:05 -08:00
Carl Worth
ba12bf1f26 lib: Abstract the extraction of a relative path from set_filename
We'll soon be having multiple entry points that accept a filename
path, so we want common code for getting a relative path from a
potentially absolute path.
2010-01-06 10:32:05 -08:00
Carl Worth
8c6b7d311c lib: Add missing value to notmuch_private_status_t enum.
And fix the initialization such that the private enum will always have
distinct values from the public enum even if we similarly miss the
addition of a new public value in the future.
2010-01-06 10:32:05 -08:00
Fernando Carrijo
db68eea013 Nuke the remainings of _notmuch_message_add_thread_id.
The function _notmuch_message_add_thread_id has been removed
from the private interface of notmuch. There's no reason for
one to keep a declaration of its prototype in the code base.
Also, lets update a commentary that referenced that function
and escaped from previous scrutiny.

Signed-off-by: Fernando Carrijo <fcarrijo@yahoo.com.br>
2009-12-09 12:09:55 -08:00
Jeffrey C. Ollie
95f97540a0 Remove unused notmuch_parse_date function prototype.
notmuch_parse_date is not implemented, so remove the unused function
prototype.

Signed-off-by: Jeffrey C. Ollie <jeff@ocjtech.us>
2009-12-03 17:07:22 -08:00
Carl Worth
880b21a097 Makefile: Incorporate getline implementation into the build.
It's unconditional for a very short time. We expect to soon be
building it only if necessary.
2009-12-01 16:33:17 -08:00
Carl Worth
70962fabf9 lib/messages.c: Make message searches stream as well.
Xapian provides an interator-based interface to all search results.
So it was natural to make notmuch_messages_t be iterator-based as
well. Which we did originally.

But we ran into a problem when we added two APIs, (_get_replies and
_get_toplevel_messages), that want to return a messages iterator
that's *not* based on a Xapian search result. My original compromise
was to use notmuch_message_list_t as the basis for all returned
messages iterators in the public interface.

This had the problem of introducing extra latency at the beginning
of a search for messages, (the call would block while iterating over
all results from Xapian, converting to a message list).

In this commit, we remove that initial conversion and instead provide
two alternate implementations of notmuch_messages_t (one on top of a
Xapian iterator and one on top of a message list).

With this change, I tested a "notmuch search" returning *many* results
as previously taking about 7 seconds before results started appearing,
and now taking only 2 seconds.
2009-11-24 11:33:09 -08:00
Chris Wilson
f379aa5284 Permit opening the notmuch database in read-only mode.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
2009-11-21 22:04:49 +01:00
Ingmar Vanhassel
2ce25b93a7 Typsos 2009-11-18 03:21:36 -08:00
Carl Worth
0da0131096 database: Make _parse_message_id static once again.
We had exposed this to the internal implementation for a short time,
(only while we had the silly code fetching In-Reply-To values from
message files instead of from the database). Make this private again
as it should be.
2009-11-17 18:50:13 -08:00
Keith Packard
d025e89ac7 Fix "too many open files" bug by closing message files when done with them.
The message file header parsing code parses only enough of the file to
find the desired header fields, then it leaves the file open until the
next header parsing call or when the message is no longer in use. If a
large number of messages end up being active, this will quickly run
out of file descriptors.

Here, we add support to explicitly close the message file within a
message, (_notmuch_message_close) and call that from thread
construction code.

Signed-off-by: Keith Packard <keithp@keithp.com>

Edited-by: Carl Worth <cworth@cworth.org>:

Many portions of Keith's original patch have since been solved other
ways, (such as the code that changed the handling of the In-Reply-To
header). So the final version is clean enough that I think even Keith
would be happy to have his name on it.
2009-11-17 18:37:13 -08:00
Carl Worth
24a25ffba9 Remove the talloc_owner argument from create_for_message_id.
This function has only one caller, and that one caller was passing the
same value for both talloc_owner and the notmuch database. Dropping
the redundant argument simplifies the documentation of this function
considerably.
2009-11-17 17:42:32 -08:00
Carl Worth
933caf814f notmuch show: Implement proper thread ordering/nesting of messages.
We now properly analyze the in-reply-to headers to create a proper
tree representing the actual thread and present the messages in this
correct thread order. Also, there's a new "depth:" value added to the
"message{" header so that clients can format the thread as desired,
(such as by indenting replies).
2009-11-15 20:41:45 -08:00
Carl Worth
d136a1e2cf Add _notmuch_message_get_in_reply_to.
The existing notmuch_message_get_header is *almost* good enough for
this, except that we also need to remove the '<' and '>'
delimiters. We'll probably want to implement this function with
database storage in the future rather than loading the email message.
2009-11-15 20:36:51 -08:00
Carl Worth
b97756926f Remove obsolete notmuch_message_get_subject prototype.
This prototype has been sitting around for a while with no function
implementing it. I wonder if there's a compiler warning I could turn
on to catch these things.
2009-11-15 20:34:24 -08:00
Carl Worth
f970d8078c lib/messages: Add new notmuch_message_list_t to internal interface.
Previously, the notmuch_messages_t object was a linked list built on
top of a linked-list node with the odd name of notmuch_message_list_t.

Now, we've got much more sane naming with notmuch_message_list_t being
a list built on a linked-list node named notmuch_message_node_t. And
now the public notmuch_messages_t object is a separate iterator based
on notmuch_message_node_t. This means the interfaces for the new
notmuch_message_list_t object are now made available to the library
internals.
2009-11-15 20:31:30 -08:00
Carl Worth
9b1c6c250b Export _parse_message_id to the library implementation.
Not exported through the public interface, but the thread code is
going to want to be able to parse In-Reply-To headers so needs access
to this code.
2009-11-15 20:21:43 -08:00
Carl Worth
d3349358c6 lib: Move notmuch_messages_t code from query.cc to new messages.c
The new object is simply a linked-list of notmuch_message_t objects,
(unlike the old object which contained a couple of Xapian iterators).
This works now by the query code immediately iterator over all results
and creating notmuch_message_t objects for them, (rather than waiting
to create the objects until the notmuch_messages_get call as we did
earlier).

The point of this change is to allow other instances of lists of
messages, (such as in notmuch_thread_t), that are not directly related
to Xapian search results.
2009-11-14 23:05:17 -08:00
Carl Worth
c168e24174 notmuch search: Print the number of matched/total messages for each thread.
Note that we don't print the number of *unread* messages, but instead
the number of messages that matched the search terms. This is in
keeping with our philosophy that the inbox is nothing more than a
search view. If we search for messages with an inbox tag, then that's
what we'll get a count of. (And if somebody does want to see unread
counts, then they can search for the "unread" tag.)

Getting the number of matched messages is really nice when doing
historical searches. For example in a search like:

	notmuch search tag:sent

(where the "sent" tag has been applied to all messages originating
from the user's email address)---here it's really nice to be able to
see a thread where the user just mentioned one point [1/13] vs. really
getting involved in the discussion [10/29].
2009-11-12 22:01:44 -08:00
Carl Worth
ec6d3506db notmuch search: Print all authors contributing to a thread.
We've now expanded the notmuch_thread_create function to fire off a
secondary database query to find all the messages that belong to this
particular thread. This allows us to now have the complete authors'
list for the thread, and will also make it trivial to print accurate
message counts for threads in the future.
2009-11-12 21:09:54 -08:00
Carl Worth
1465493210 libify: Move library sources down into lib directory.
A "make" invocation still works from the top-level, but not from
down inside the lib directory yet.
2009-11-09 16:24:03 -08:00
Renamed from notmuch-private.h (Browse further)