This new directory ojbect provides all the infrastructure needed to
detect when files or directories are deleted or renamed. There's still
code needed on top of this (within "notmuch new") to actually do that
detection.
This commit contains my changes to the API proposed by Keith. Nothing
is dramatically different. There are minor things like changing
notmuch_files_t to notmuch_filenames_t and then various things needed
for completeness as noticed while implementing this, (such as
notmuch_directory_destroy and notmuch_directory_set_mtime).
This will allow applications to support the removal of messages, (such
as when a file is deleted from the mail store). No removal support is
provided yet in commands such as "notmuch new".
The existing find_doc_ids function is convenient when the caller
doesn't want to be bothered constructing a term. But when the caller
*does* have the term already, that interface is just wasteful. So we
export a lower-level interface that maps a pre-constructed term to a
document-ID iterators.
The code to map a filename to a direntry is something that we're going
to want in a future _remove_message function, so put it in a new
function _notmuch_database_filename_to_direntry .
The library interface is unchanged so far, (still just
notmuch_database_add_message), but internally, the old
_set_filename function is now _add_filename instead.
Instead of storing the complete message filename in the data portion
of a mail document we now store a 'direntry' term that contains the
document ID of a directory document and also the basename of the
message filename within that directory. This will allow us to easily
store multple filenames for a single message, and will also allow us
to find mail documents for files that previously existed in a
directory but that have since been deleted.
Some pending commits want the _split_path functionality separate from
mapping a directory to a document ID. The split_path function now
returns the basename as well as the directory name.
We're planning to have mail documents refer to directory documents for
the path of the containing directory. To support this, we need the
path in the data, (since the path in the 'directory' term can be
irretrievable as it will be the SHA1 sum of the path for a very long
path).
We'll soon have mail documents referring to their parent directory's
directory documents, so we'll need access to _find_parent_id in files
such as message.cc.
Storing the document ID of the parent of each directory document will
allow us to find all child-directory documents for a given directory
document. We will need this in order to detect directories that have
been removed from the mail store, (though we aren't yet doing this).
The recent change from storing absolute paths to relative paths means
that new directory documents will already be created, (and the old
ones will just linger stale in the database). Given that, we might as
well put a clean name on the term in the new documents, (and no real
flag day is needed).
We were already storing relative mail filenames, so this is consistent
with that. Additionally, it means that directory documents remain
valid even if the database is relocated within its containing
filesystem.
We'll soon be having multiple entry points that accept a filename
path, so we want common code for getting a relative path from a
potentially absolute path.
And fix the initialization such that the private enum will always have
distinct values from the public enum even if we similarly miss the
addition of a new public value in the future.
The function _notmuch_message_add_thread_id has been removed
from the private interface of notmuch. There's no reason for
one to keep a declaration of its prototype in the code base.
Also, lets update a commentary that referenced that function
and escaped from previous scrutiny.
Signed-off-by: Fernando Carrijo <fcarrijo@yahoo.com.br>
Since we need to do this for portability, (some systems don't have a
strndup function), we might as well do it unconditionally. There's
almost no disadvantage to doing so, and this has the advantages of not
requiring a configure-time check nor having two different
implementations, one of which would often be less tested.
We carefully noted the fact that we had locally allocated the string
here, but then we neglected to free it. Switch to talloc instead
which makes it easier to get the behavior we want. It's simpler since
we can just call talloc_free unconditionally, without having to track
the state of whether we allocated the storage for name or not.
This error was tirggered with a debugging build via:
make CXXFLAGS="-DDEBUG"
and reported by David Bremner. The actual error is that I'm an
idiot that doesn't know how to use strcmp's return value. Of
course, the strcmp interface scores a negative 7 on Rusty Russell
ranking of bad interfaces:
http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
As per Carl's request, this patch corrects the only value defined under
the notmuch_message_flag_t enum typedef to match the name of the type.
Signed-off-by: Bart Trojanowski <bart@jukie.net>
If Xapian threw an exception on notmuch_query_count_messages the count
variable could be used uninitialized. Initialize count to solve the
problem.
Signed-off-by: Jeffrey C. Ollie <jeff@ocjtech.us>
When _notmuch_thread_create() is given a query string, it can return more
messages than just those matching the query. To distinguish those that
matched the query expression, the MATCHING_SEARCH flag is set
appropriately.
Signed-off-by: Bart Trojanowski <bart@jukie.net>
This patch allows for different flags, internal to notmuch, to be set on a
message object. The patch does not define any such flags, just the
facilities to manage these flags.
Signed-off-by: Bart Trojanowski <bart@jukie.net>
This patch adds a new function that can be used to collect a list of
unique tags from a list of messages. 'notmuch search-tags' uses the
function to get a list of tags from messages matching a search-term,
but it has the potential to be used elsewhere so we put it in the lib.
Signed-off-by: Jan Janak <jan@ryngle.com>
This patch adds a new function called notmuch_database_get_all_tags
which can be used to obtain a list of all tags from the database
(in other words, the list contains all tags from all messages). The
function produces an alphabetically sorted list.
To add support for the new function, we rip the guts off of
notmuch_message_get_tags and put them in a new generic function
called _notmuch_convert_tags. The generic function takes a
Xapian::TermIterator as argument and uses the iterator to find tags.
This makes the function usable with different Xapian objects.
Function notmuch_message_get_tags is then reimplemented to call the
generic function with message->doc.termlist_begin() as argument.
Similarly, we implement notmuch_message_database_get_all_tags, the
function calls the generic function with db->xapian_db->allterms_begin()
as argument.
Finally, notmuch_database_get_all_tags is exported through
lib/notmuch.h
Signed-off-by: Jan Janak <jan@ryngle.com>
Xapian provides an interator-based interface to all search results.
So it was natural to make notmuch_messages_t be iterator-based as
well. Which we did originally.
But we ran into a problem when we added two APIs, (_get_replies and
_get_toplevel_messages), that want to return a messages iterator
that's *not* based on a Xapian search result. My original compromise
was to use notmuch_message_list_t as the basis for all returned
messages iterators in the public interface.
This had the problem of introducing extra latency at the beginning
of a search for messages, (the call would block while iterating over
all results from Xapian, converting to a message list).
In this commit, we remove that initial conversion and instead provide
two alternate implementations of notmuch_messages_t (one on top of a
Xapian iterator and one on top of a message list).
With this change, I tested a "notmuch search" returning *many* results
as previously taking about 7 seconds before results started appearing,
and now taking only 2 seconds.
Previously, notmuch_query_search_threads would do all the work, so the
caller would block until all results were processed. Now, we do the
work as we go, as the caller iterates with notmuch_threads_next. This
means that once results start coming back from "notmuch search" they
just keep continually streaming.
There's still some initial blocking before the first results appear
because the notmuch_messages_t object has the same bug (for now).
This was a poor workaround around the fact that the existing
notmuch_threads_t object is implemented poorly. It's got a fine
iterartor-based interface, but the implementation does all of the
work up-front in _create rather than doing the work incrementally
while iterating.
So to start fixing this, first get rid of all the hacks we had working
around this. This drops the --first and --max-threads options from the
search command, (but hopefully nobody was using them
anyway---notmuch.el certainly wasn't).
The rudimentary aspect here is that the date ranges are specified with
UNIX timestamp values (number of seconds since 1970-01-01 UTC). One
thing that can help here is using the date program to determins
timestamps, such as:
$(date +%s -d 2009-10-01)..$(date +%s)
Long-term, we'll probably need to do our own query parsing to be able
to support directly-specified dates and also relative expressions like
"since:'2 months ago'".
Getting the count of matching threads or messages is a fairly
expensive operation. Xapian provides a very efficient mechanism that
returns an approximate value, so use that for this new command.
This returns the number of matching messages, not threads, as that is
cheap to compute.
Signed-off-by: Keith Packard <keithp@keithp.com>
I configured my database.path with a trailing /, and after running notmuch
new every notmuch search would fail with error messages like this:
Error opening /inbox/cur/1258565257.000211.mbox:2,S: No such file or directory
The actual bug was in the filename normalization for storage in the
database. The database.path was removed from the full filename, but if
the database.path from the config file contained a trailing /, the
relative file name would retain an extra leading /... which made it look
like an absolute path after it was read out from the DB.
Signed-off-by: Bart Trojanowski <bart@jukie.net>
Use the facilities of GNU make to create a magic function that will
on the first invocation print a description of how to enable verbose
compile lines and then print the quiet rule.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Carl Worth <cworth@cworth.org>
Cc: Mikhail Gusarov <dottedmag@dottedmag.net>
[ickle: Rebased, and duplicate command string eliminated.]
[ickle: Fixed verbose bug pointed out by Mikhail]
Since Xapian has a limit on the maximum length of a term, we have
to check for that before trying to add the message ID as a term.
This fixes the bug reported by Mike Hommey here:
<20091120132625.GA19246@glandium.org>
I've also constructed 20 files with a range of message ID lengths
centered around the Xapian term-length limit which I'll use to seed a
new test suite soon.
If an earlier exception occurred, then it's not unexpected for the
flush to fail as well. So in that case, we'll silently catch the
exception. Otherwise, make some noise about things going wrong at the
time of flush.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
In my script containing a series of queries to be run on new mail for
setting up tags, it's nice to see which query I typed wrong.
Signed-off-by: Eric Anholt <eric@anholt.net>
More fallout from _get_header now returning "" for missing headers.
The bug here is that we would no longer detect that a file is not an
email message and give up on it like we should.
And this time, I actually audited all callers to
notmuch_message_get_header, so hopefully we're done fixing this
bug over and over.
There's been a fair amount of fallout from when we changed
message_file_get_header from returning NULL to returning "" for
missing headers. This is yet more fallout from that, (where we were
accepting an empty message-ID rather than generating one like we want
to).
This eliminates a crash when a message (either corrupted or a non-mail
file that wasn't properly detected as not being mail) has no In-Reply-To
header, (and so few terms that trying to skip to the prefix of the
In-Reply-To terms actually brings us to the end of the termlist).
As suggested by Keith in FLAG_PURE_NOT allows for expressions like:
notmuch search NOT tag:inbox
Note that this way a search like:
notmuch search foobar NOT tag:inbox
should not be written instead:
notmuch search foobar AND NOT tag:inbox
In my opinion, the latter feels more natural and is somewhat more explicit.
It gives a better clue of what the search is about instead of assuming that
an implicit AND operator is there.
This was recently introduced in commit:
64c03ae97f
which was adding extra checks to avoid adding a self-referencing
message.
How many times am I going to fix a dumb regression like this and say
"we really need a test suite" before I actually sit down and write the
test suite?
This is what most people want for a _search_ command. It's often
different for actually reading mail in an inbox, (where it makes more
sense to have results displayed in chronological order), but in such a
case, ther user is likely using an interface that can simply pass the
--sort=oldest-first option to "notmuch search".
Here we're also change the sort enum from NOTMUCH_SORT_DATE and
NOTMUCH_SORT_DATE_REVERSE to NOTMUCH_SORT_OLDEST_FIRST and
NOTMUCH_SORT_NEWEST_FIRST. Similarly we replace the --reverse option
to "notmuch search" with two options: --sort=oldest-first and
--sort=newest-first.
Finally, these changes are all tracked in the emacs interface, (which
has no change in its behavior).
We had exposed this to the internal implementation for a short time,
(only while we had the silly code fetching In-Reply-To values from
message files instead of from the database). Make this private again
as it should be.
Maybe ths lack of this documentation is why I forgot we were actually
storing this and wrote the ugly code to fetch In-Reply-To from message
files rather than from the database.
Which is more consistent with the XREFERENCE prefix used in the terms
in the database. Also remove some stale documentation describing the
removal of resolved references from the database (we no longer do
this).
Calling continue here worked only because we set a flag before the
continue, and, check the flag at the beginning of the loop, and *then*
break. It's much more clear to just break in the first place.
The message file header parsing code parses only enough of the file to
find the desired header fields, then it leaves the file open until the
next header parsing call or when the message is no longer in use. If a
large number of messages end up being active, this will quickly run
out of file descriptors.
Here, we add support to explicitly close the message file within a
message, (_notmuch_message_close) and call that from thread
construction code.
Signed-off-by: Keith Packard <keithp@keithp.com>
Edited-by: Carl Worth <cworth@cworth.org>:
Many portions of Keith's original patch have since been solved other
ways, (such as the code that changed the handling of the In-Reply-To
header). So the final version is clean enough that I think even Keith
would be happy to have his name on it.
In our scheme it's illegal for any message to refer to itself, (nor
would it be useful for anything anyway). Cut these self-references off
at the source, before they trip up any internal errors.
This case was happening when a message had its own message ID in its
In-Reply-To header. The thread-resolution code would find the
partially constructed message, (with no thread ID yet), get garbage
from this function, and then march right along with that garbage.
With this commit, a self-cyclic message like this will now trigger an
internal error rather than marching along silienty. (And a subsequent
commit will remove the call to this function in this case.)
This function has only one caller, and that one caller was passing the
same value for both talloc_owner and the notmuch database. Dropping
the redundant argument simplifies the documentation of this function
considerably.
This reduces our reliance on open message_file objects, (so is a step
toward fixing the "too many open files" bug), but more importantly, it
means we don't load a self-referencing in-reply-to header, (since we
weed those out before adding any replyto terms to the database).
When this function was originally written, the 'message' object was
always destroyed locally, so I thought it would be good to use a NULL
talloc context to make it more obvious if there was any leak.
Since then, however, this function has been changed to optionally
return the added message, and in that case we *don't* free the message
locally, so let's let the database be the talloc context.
We now properly analyze the in-reply-to headers to create a proper
tree representing the actual thread and present the messages in this
correct thread order. Also, there's a new "depth:" value added to the
"message{" header so that clients can format the thread as desired,
(such as by indenting replies).
The existing notmuch_message_get_header is *almost* good enough for
this, except that we also need to remove the '<' and '>'
delimiters. We'll probably want to implement this function with
database storage in the future rather than loading the email message.
This prototype has been sitting around for a while with no function
implementing it. I wonder if there's a compiler warning I could turn
on to catch these things.
Previously, the notmuch_messages_t object was a linked list built on
top of a linked-list node with the odd name of notmuch_message_list_t.
Now, we've got much more sane naming with notmuch_message_list_t being
a list built on a linked-list node named notmuch_message_node_t. And
now the public notmuch_messages_t object is a separate iterator based
on notmuch_message_node_t. This means the interfaces for the new
notmuch_message_list_t object are now made available to the library
internals.
The new object is simply a linked-list of notmuch_message_t objects,
(unlike the old object which contained a couple of Xapian iterators).
This works now by the query code immediately iterator over all results
and creating notmuch_message_t objects for them, (rather than waiting
to create the objects until the notmuch_messages_get call as we did
earlier).
The point of this change is to allow other instances of lists of
messages, (such as in notmuch_thread_t), that are not directly related
to Xapian search results.
Previously, an excess call would have caused a crash. Now it simply
does nothing. Also, make notmuch_tags_get use a similar, consistent
early return for a NULL iterator.
We were properly sorting the threads based only on matched messages,
but we were displaying the date based on the total messages in the
thread, which led to inconsistent and very confusing results.
Note that the difference between thread results in date order and
thread results in reverse-date order is not simply a matter of
reversing the final results. When sorting in date order, the threads
are sorted by the oldest message in the thread. When sorting in
reverse-date order, the threads are sorted by the newest message in
the thread.
This difference means that we might want an explicit option in the
interface to reverse the order, (even though the default will be to
display the inbox in date order and global searches in reverse-date
order).
Note that we don't print the number of *unread* messages, but instead
the number of messages that matched the search terms. This is in
keeping with our philosophy that the inbox is nothing more than a
search view. If we search for messages with an inbox tag, then that's
what we'll get a count of. (And if somebody does want to see unread
counts, then they can search for the "unread" tag.)
Getting the number of matched messages is really nice when doing
historical searches. For example in a search like:
notmuch search tag:sent
(where the "sent" tag has been applied to all messages originating
from the user's email address)---here it's really nice to be able to
see a thread where the user just mentioned one point [1/13] vs. really
getting involved in the discussion [10/29].
We add a hash to the thread object so that we can detect author names
that have already been added to the list, and avoid adding them
redundantly. This avoids the giant chain of "bugzilla-daemon,
bugzilla-daemon, bugzilla-daemon, bugzilla-daemon, ..." author lists
that we would get otherwise, for example.
We've now expanded the notmuch_thread_create function to fire off a
secondary database query to find all the messages that belong to this
particular thread. This allows us to now have the complete authors'
list for the thread, and will also make it trivial to print accurate
message counts for threads in the future.
I thought it would be safe enough to return a few extra threads,
(since we happened to already get the relevant messages out of the
database). The problem is that then requires the caller to carefully
read the number of threads returned and adjust its next "first" value
accordingly. The interface is much simpler to use if we simply return
exactly what is asked for and no more.
This serves me right for committing untested code. The
notmuch_query_search_threads was totally broken, (it didn't properly
treat -1 as being unlimited and instead returned no threads in that
case).
The library interface now allows the caller to do incremental searches,
(such as one page of results at a time). Next we'll just need to hook
this up to "notmuch search" and the emacs interface.
It's important to have the names present for determining whether a
thread is worth reading or not. We may want to think about
abbreviating the list somehow if it is excessively long (or redundant
as in bugzilla-daemon, bugzilla-daemon, bugzilla-daemon, etc.).
We never did export any interface to get at these, and when I went to
use these, I found them inadequate, (because I wanted to distinguish
address found in from: from those found in To:). Meanwhile, it was
easy enough to extract addresses with a search like:
notmuch show tag:sent | grep ^To:
so the storage of contact terms was just wasting space. Stop that.
This will allow for things like the database path to be specified
without any cheesy NOTMUCH_BASE environment variable. It also will
allow "notmuch reply" to recognize the user's email address when
constructing a reply in order to do the right thing, (that is, to use
the user's address to which mail was sent as From:, and not to reply
to the user's own addresses).
With this change, the "notmuch setup" command is now strictly for
changing the configuration of notmuch. It no longer creates the
database, but instead instructs the user to call "notmuch new" to do
that.
Previously, the top-level Makefile was explicitly adding -I./lib to
the compiler flags. However, that's something that's much better done
from within the Makefile.local fragment within the lib directory
itself.
I saw this recommendation in the implementation notes for "Recursive
Make Considered Harmful" and then the further recommendation for
implementing the idea in the GNU make manual.
The idea is that if any of the files change then we need to regenerate
the dependency file before we regenerate any targets.
The approach from the GNU make manual is simpler in that it just uses
a sed script to fix up the output of an extra invocation of the
compiler, (as opposed to the approach in the implementation notes from
the paper's author which use a wrapper script for the compiler that's
always invoked rather than the compiler itself).
The idea here is that every Makefile at each lower level will be an
identical, tiny file that simply defers to a top-level make.
Meanwhile, the Makefile.local file at each level is a Makefile snippet
to be included at the top-level into a large, flat Makefile. As such,
it needs to define its rules with the entire relative directory to
each file, (typically in $(dir)). The local files can also append to
variables such as SRCS and CLEAN for files to be analyzed for
dependencies and to be cleaned.