Commit graph

130 commits

Author SHA1 Message Date
Carl Worth
ccf2e0cc42 lib: Add non-content terms with a WDF value of 0.
The WDF is the "within-document frequency" value for a particular
term. It's intended to provide an indication of how frequent a term is
within a document, (for use in computing relevance). Xapian's term
generator already computes WDF values when we use that, (which we do
for indexing all mail content).

We don't use the term generator when adding single terms for things
that don't actually appear in the mail document, (such as tags, the
filename, etc.). In this case, the WDF value for these terms doesn't
matter much.

But Xapian's flint backend can be more efficient with changes to terms
that don't affect the document "length". So there's a performance
advantage for manipulating tags (with the flint backend) if the WDF of
these terms is 0.
2010-01-09 11:18:27 -08:00
Carl Worth
d12801c8b4 lib: Split the database upgrade into two phases for safer operation.
The first phase copies data from the old format to the new format
without deleting anything. This allows an old notmuch to still use the
database if the upgrade process gets interrupted. The second phase
performs the deletion (after updating the database version number). If
the second phase is interrupted, there will be some unused data in the
database, but it shouldn't cause any actual harm.
2010-01-09 11:13:12 -08:00
Carl Worth
909f52bd8c lib: Implement versioning in the database and provide upgrade function.
The recent support for renames in the database is our first time
(since notmuch has had more than a single user) that we have a
database format change. To support smooth upgrades we now encode a
database format version number in the Xapian metadata.

Going forward notmuch will emit a warning if used to read from a
database with a newer version than it natively supports, and will
refuse to write to a database with a newer version.

The library also provides functions to query the database format
version:

	notmuch_database_get_version

to ask if notmuch wants a newer version than that:

	notmuch_database_needs_upgrade

and a function to actually perform that upgrade:

	notmuch_database_upgrade
2010-01-07 18:26:31 -08:00
Carl Worth
f93b7218c3 lib: Consolidate checks for read-only database.
Previously, many checks were deep in the library just before a cast
operation. These have now been replaced with internal errors and new
checks have instead been added at the beginning of all top-levelentry
points requiring a read-write database.

The new checks now also use a single function for checking and
printing the error message. This will give us a convenient location to
extend the check, (such as based on database version as well).
2010-01-07 10:19:44 -08:00
Carl Worth
a274848f95 notmuch_message_get_filename: Support old-style filename storage.
When a notmuch database is upgraded to the new database format, (to
support file rename and deletion), any message documents corresponding
to deleted files will not currently be upgraded. This means that a
search matching these documents will find no filenames in the expected
place.

Go ahead and return the filename as originally stored, (rather than
aborting with an internal error), in this case.
2010-01-07 09:22:34 -08:00
Carl Worth
d807e28f43 lib: Implement new notmuch_directory_t API.
This new directory ojbect provides all the infrastructure needed to
detect when files or directories are deleted or renamed. There's still
code needed on top of this (within "notmuch new") to actually do that
detection.
2010-01-06 10:32:06 -08:00
Carl Worth
498edff503 database: Abstract _filename_to_direntry from _add_message
The code to map a filename to a direntry is something that we're going
to want in a future _remove_message function, so put it in a new
function _notmuch_database_filename_to_direntry .
2010-01-06 10:32:05 -08:00
Carl Worth
1376a90db6 database: Allowing storing multiple filenames for a single message ID.
The library interface is unchanged so far, (still just
notmuch_database_add_message), but internally, the old
_set_filename function is now _add_filename instead.
2010-01-06 10:32:05 -08:00
Carl Worth
6ca6c089e9 database: Store mail filename as a new 'direntry' term, not as 'data'.
Instead of storing the complete message filename in the data portion
of a mail document we now store a 'direntry' term that contains the
document ID of a directory document and also the basename of the
message filename within that directory. This will allow us to easily
store multple filenames for a single message, and will also allow us
to find mail documents for files that previously existed in a
directory but that have since been deleted.
2010-01-06 10:32:05 -08:00
Carl Worth
ba12bf1f26 lib: Abstract the extraction of a relative path from set_filename
We'll soon be having multiple entry points that accept a filename
path, so we want common code for getting a relative path from a
potentially absolute path.
2010-01-06 10:32:05 -08:00
Carl Worth
64c8d6227a Avoid bogus internal error reporting duplicate In-Reply-To IDs.
This error was tirggered with a debugging build via:

	make CXXFLAGS="-DDEBUG"

and reported by David Bremner. The actual error is that I'm an
idiot that doesn't know how to use strcmp's return value. Of
course, the strcmp interface scores a negative 7 on Rusty Russell
ranking of bad interfaces:

http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
2009-11-28 10:01:22 -08:00
David Bremner
12c91e8050 add missing comma in debugging code 2009-11-27 19:51:53 -08:00
Bart Trojanowski
c57a0b4f8b message: add flags to notmuch_message_t
This patch allows for different flags, internal to notmuch, to be set on a
message object.  The patch does not define any such flags, just the
facilities to manage these flags.

Signed-off-by: Bart Trojanowski <bart@jukie.net>
2009-11-27 17:06:50 -08:00
Jan Janak
c3c52e464b notmuch: New function to retrieve all tags from the database.
This patch adds a new function called notmuch_database_get_all_tags
which can be used to obtain a list of all tags from the database
(in other words, the list contains all tags from all messages). The
function produces an alphabetically sorted list.

To add support for the new function, we rip the guts off of
notmuch_message_get_tags and put them in a new generic function
called _notmuch_convert_tags. The generic function takes a
Xapian::TermIterator as argument and uses the iterator to find tags.
This makes the function usable with different Xapian objects.

Function notmuch_message_get_tags is then reimplemented to call the
generic function with message->doc.termlist_begin() as argument.

Similarly, we implement notmuch_message_database_get_all_tags, the
function calls the generic function with db->xapian_db->allterms_begin()
as argument.

Finally, notmuch_database_get_all_tags is exported through
lib/notmuch.h

Signed-off-by: Jan Janak <jan@ryngle.com>
2009-11-26 07:01:52 -08:00
Bart Trojanowski
ceee152fca fix notmuch-new bug when database path ends with a trailing /
I configured my database.path with a trailing /, and after running notmuch
new every notmuch search would fail with error messages like this:

  Error opening /inbox/cur/1258565257.000211.mbox:2,S: No such file or directory

The actual bug was in the filename normalization for storage in the
database.  The database.path was removed from the full filename, but if
the database.path from the config file contained a trailing /, the
relative file name would retain an extra leading /... which made it look
like an absolute path after it was read out from the DB.

Signed-off-by: Bart Trojanowski <bart@jukie.net>
2009-11-23 04:37:01 +01:00
Carl Worth
e2341cbc09 Catch and optionally print about exception at database->flush.
If an earlier exception occurred, then it's not unexpected for the
flush to fail as well. So in that case, we'll silently catch the
exception. Otherwise, make some noise about things going wrong at the
time of flush.
2009-11-22 03:54:20 +01:00
Carl Worth
717279fbcf Add a missing print after catching an exception.
Without this, trying to debug this exception was *really* confusing.
2009-11-22 03:52:55 +01:00
Carl Worth
637f99d8f3 Rename NOTMUCH_DATABASE_MODE_WRITABLE to NOTMUCH_DATABASE_MODE_READ_WRITE
And correspondingly, READONLY to READ_ONLY.
2009-11-21 22:10:18 +01:00
Chris Wilson
f379aa5284 Permit opening the notmuch database in read-only mode.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
2009-11-21 22:04:49 +01:00
Carl Worth
3ae12b1e28 add_message: Re-fix handling of non-mail files.
More fallout from _get_header now returning "" for missing headers.

The bug here is that we would no longer detect that a file is not an
email message and give up on it like we should.

And this time, I actually audited all callers to
notmuch_message_get_header, so hopefully we're done fixing this
bug over and over.
2009-11-20 21:46:37 +01:00
Carl Worth
31b54bc787 Avoid access of a Xapian iterator's object when there's nothing there.
This eliminates a crash when a message (either corrupted or a non-mail
file that wasn't properly detected as not being mail) has no In-Reply-To
header, (and so few terms that trying to skip to the prefix of the
In-Reply-To terms actually brings us to the end of the termlist).
2009-11-20 12:06:11 +01:00
Ingmar Vanhassel
2ce25b93a7 Typsos 2009-11-18 03:21:36 -08:00
Keith Packard
d025e89ac7 Fix "too many open files" bug by closing message files when done with them.
The message file header parsing code parses only enough of the file to
find the desired header fields, then it leaves the file open until the
next header parsing call or when the message is no longer in use. If a
large number of messages end up being active, this will quickly run
out of file descriptors.

Here, we add support to explicitly close the message file within a
message, (_notmuch_message_close) and call that from thread
construction code.

Signed-off-by: Keith Packard <keithp@keithp.com>

Edited-by: Carl Worth <cworth@cworth.org>:

Many portions of Keith's original patch have since been solved other
ways, (such as the code that changed the handling of the In-Reply-To
header). So the final version is clean enough that I think even Keith
would be happy to have his name on it.
2009-11-17 18:37:13 -08:00
Carl Worth
f7eaeff242 message_get_thread_id: Generate internal error if message has no thread ID.
This case was happening when a message had its own message ID in its
In-Reply-To header. The thread-resolution code would find the
partially constructed message, (with no thread ID yet), get garbage
from this function, and then march right along with that garbage.

With this commit, a self-cyclic message like this will now trigger an
internal error rather than marching along silienty. (And a subsequent
commit will remove the call to this function in this case.)
2009-11-17 17:42:32 -08:00
Carl Worth
24a25ffba9 Remove the talloc_owner argument from create_for_message_id.
This function has only one caller, and that one caller was passing the
same value for both talloc_owner and the notmuch database. Dropping
the redundant argument simplifies the documentation of this function
considerably.
2009-11-17 17:42:32 -08:00
Carl Worth
387828c435 get_in_reply_to: Implement via the database, not by opening mail file.
This reduces our reliance on open message_file objects, (so is a step
toward fixing the "too many open files" bug), but more importantly, it
means we don't load a self-referencing in-reply-to header, (since we
weed those out before adding any replyto terms to the database).
2009-11-17 17:40:19 -08:00
Mikhail Gusarov
469ea9ebc6 Include <stdint.h> to get uint32_t in C++ file with gcc 4.4
Signed-off-by: Mikhail Gusarov <dottedmag@dottedmag.net>
2009-11-17 08:53:19 -08:00
Carl Worth
933caf814f notmuch show: Implement proper thread ordering/nesting of messages.
We now properly analyze the in-reply-to headers to create a proper
tree representing the actual thread and present the messages in this
correct thread order. Also, there's a new "depth:" value added to the
"message{" header so that clients can format the thread as desired,
(such as by indenting replies).
2009-11-15 20:41:45 -08:00
Carl Worth
d136a1e2cf Add _notmuch_message_get_in_reply_to.
The existing notmuch_message_get_header is *almost* good enough for
this, except that we also need to remove the '<' and '>'
delimiters. We'll probably want to implement this function with
database storage in the future rather than loading the email message.
2009-11-15 20:36:51 -08:00
Carl Worth
1465493210 libify: Move library sources down into lib directory.
A "make" invocation still works from the top-level, but not from
down inside the lib directory yet.
2009-11-09 16:24:03 -08:00
Renamed from message.cc (Browse further)