Commit graph

79 commits

Author SHA1 Message Date
Austin Clements
5394924e6c lib: Separate list of all messages from top-level messages
Previously, thread.cc built up a list of all messages, then
proceeded to tear it apart to transform it into a list of
top-level messages.  Now we simply build a new list of top-level
messages.

This simplifies the interface to _notmuch_message_add_reply,
eliminates the pointer acrobatics from
_resolve_thread_relationships, and will enable us to do things
with the list of all messages in the following patches.
2013-02-18 20:20:24 -04:00
Jani Nikula
5505d55515 lib: fix warnings when building with clang
Building notmuch with CC=clang and CXX=clang++ produces the warnings:

CC -O2 lib/tags.o
lib/tags.c:43:5: warning: expression result unused [-Wunused-value]
    talloc_steal (tags, list);
    ^~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/talloc.h:345:143: note: expanded from:
  ...__location__); __talloc_steal_ret; })
                    ^~~~~~~~~~~~~~~~~~
1 warning generated.

CXX -O2 lib/message.o
lib/message.cc:791:5: warning: expression result unused [-Wunused-value]
    talloc_reference (message, message->tag_list);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/talloc.h:932:36: note: expanded from:
  ...(_TALLOC_TYPEOF(ptr))_talloc_reference_loc((ctx),(ptr), __location__)
     ^                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

Check talloc_reference() return value, and explicitly ignore
talloc_steal() return value as it has no failure modes, to silence the
warnings.
2012-12-01 08:10:32 -04:00
Austin Clements
b88030bda6 lib: Treat messages in new/ as maildir messages with no flags set
Previously, notmuch new only synchronized maildir flags to tags for
files with a maildir "info" part.  Since messages in new/ don't have
an info part, notmuch would ignore them for flag-to-tag
synchronization.

This patch makes notmuch consider messages in new/ to be legitimate
maildir messages that simply have no maildir flags set.  The most
visible effect of this is that such messages now automatically get the
unread tag.
2012-06-10 20:14:56 -03:00
Austin Clements
750231bae8 lib: Only synchronize maildir flags for messages in maildirs
Previously, we synchronized flags to tags for any message that looked
like it had maildir flags in its file name, regardless of whether it
was in a maildir-like directory structure.  This was asymmetric with
tag-to-flag synchronization, which only applied to messages in
directories named new/ and cur/ (introduced by 95dd5fe5).

This change makes our interpretation stricter and addresses this
asymmetry by only synchronizing flags to tags for messages in
directories named new/ or cur/.  It also prepares us to treat messages
in new/ as maildir messages, even though they lack maildir flags.
2012-06-10 20:13:58 -03:00
Austin Clements
93ab4c7d11 lib: Move _filename_is_in_maildir
This way notmuch_message_maildir_flags_to_tags can call it.  It makes
more sense for this to be just above all of the maildir
synchronization code rather than mixed in the middle.
2012-06-10 20:13:45 -03:00
Austin Clements
d9f61c26a1 lib: Don't needlessly create directory docs in _notmuch_message_remove_filename
Previously, if passed a filename with a directory that did not exist
in the database, _notmuch_message_remove_filename would needlessly
create that directory document.  Fix it so that doesn't happen.
2012-05-23 22:32:12 -03:00
Austin Clements
67ae2377a9 lib: Perform the same transformation to _notmuch_database_filename_to_direntry
Now _notmuch_database_filename_to_direntry takes a flags argument and
can indicate if the necessary directory documents do not exist.
Again, callers have been updated, but retain their original behavior.
2012-05-23 22:30:43 -03:00
Louis Rilling
b9360be2bd tags_to_maildir_flags: Cleanup double assignement
The for loop right after already does the job.

Signed-off-by: Louis Rilling <l.rilling@av7.net>
2011-11-21 20:32:32 -04:00
Louis Rilling
21b13c3932 lib: Kill last usage of C++ type bool
Signed-off-by: Louis Rilling <l.rilling@av7.net>
2011-11-21 20:32:07 -04:00
Austin Clements
567bcbc294 Store "from" and "subject" headers in the database.
This is a rebase and cleanup of Istvan Marko's patch from
id:m3pqnj2j7a.fsf@zsu.kismala.com

Search retrieves these headers for every message in the search
results.  Previously, this required opening and parsing every message
file.  Storing them directly in the database significantly reduces IO
and computation, speeding up search by between 50% and 10X.

Taking full advantage of this requires a database rebuild, but it will
fall back to the old behavior for messages that do not have headers
stored in the database.
2011-11-14 17:10:58 -04:00
Ali Polatel
02a3076711 lib: make find_message{,by_filename) report errors
Previously, the functions notmuch_database_find_message() and
notmuch_database_find_message_by_filename() functions did not properly
report error condition to the library user.

For more information, read the thread on the notmuch mailing list
starting with my mail "id:871uv2unfd.fsf@gmail.com"

Make these functions accept a pointer to 'notmuch_message_t' as argument
and return notmuch_status_t which may be used to check for any error
condition.

restore: Modify for the new notmuch_database_find_message()
new: Modify for the new notmuch_database_find_message_by_filename()
2011-10-04 07:55:29 +03:00
Austin Clements
bfe4555325 lib: Remove message document directly after removing the last file name.
Previously, notmuch_database_remove_message would remove the message
file name, sync the change to the message document, re-find the
message document, and then delete it if there were no more file names.
An interruption after sync'ing would result in a file-name-less,
permanently un-removable zombie message that would produce errors and
odd results in searches.  We could wrap this in an atomic section, but
it's much simpler to eliminate the round-about approach and just
delete the message document instead of sync'ing it if we removed the
last filename.
2011-09-23 21:50:39 -04:00
Austin Clements
e4379c43e2 lib: Indicate if there are more filenames after removal.
Make _notmuch_message_remove_filename return
NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID if the message has more filenames
and fix callers to handle this.
2011-09-23 21:50:39 -04:00
Austin Clements
62445dd023 lib: Add missing status check in _notmuch_message_remove_filename.
Previously, this function would synchronize the folder list even if
removing the file name failed.  Now it returns immediately if removing
the file name fails.
2011-09-12 23:36:00 -03:00
Mark Anderson
8a856e5c38 Fix folder: coherence issue
Add removal of all ZXFOLDER terms to removal of all XFOLDER terms for
each message filename removal.

The existing filename-list reindexing will put all the needed terms
back in.  Test search-folder-coherence now passes.

Signed-off-by:Mark Anderson <ma.skies@gmail.com>
2011-06-29 14:13:16 -07:00
Pieter Praet
8bb6f7869c fix sum moar typos [comments in source code]
Various typo fixes in comments within the source code.

Signed-off-by: Pieter Praet <pieter@praet.org>

Edited-by: Carl Worth <cworth@cworth.org> Restricted to just
source-code comments, (and fixed fix of "descriptios" to "descriptors"
rather than "descriptions").
2011-06-23 15:58:39 -07:00
Carl Worth
d5523ead90 Mark some structures in the library interface with visibility=default attribute.
As of gcc 4.6, there are new warnings from -Wattributes along the lines of:

	warning: ‘_notmuch_messages’ declared with greater visibility
	than the type of its field ‘_notmuch_messages::iterator’
	[-Wattributes]

To squelch these, we decorate all such containing structs with
__attribute__((visibility("default"))). We take care to let only the
C++ compiler see this, (since the C compiler would otherwise warn
about ignored visibility attributes on types).
2011-05-11 13:27:15 -07:00
Carl Worth
2f3a76c569 Remove some variables which were set but not used.
gcc (at least as of version 4.6.0) is kind enough to point these out to us,
(when given -Wunused-but-set-variable explicitly or implicitly via -Wunused
or -Wall).

One of these cases was a legitimately unused variable. Two were simply
variables (named ignored) we were assigning only to squelch a warning about
unused function return values. I don't seem to be getting those warnings
even without setting the ignored variable. And the gcc docs. say that the
correct way to squelch that warning is with a cast to (void) anyway.
2011-05-11 13:27:14 -07:00
Austin Clements
d19c5de17a Add the tag list to the unified message metadata pass.
Now each caller of notmuch_message_get_tags only gets a new iterator,
instead of a whole new list.  In principle this could cause problems
with iterating while modifying tags, but through the magic of talloc
references, we keep the old tag list alive even after the cache in the
message object is invalidated.

This reduces my index search from the 3.102 seconds before the unified
metadata pass to 1.811 seconds (1.7X faster).  Combined with the
thread search optimization in b3caef1f06,
that makes this query 2.5X faster than when I started.
2011-03-21 02:45:18 -04:00
Austin Clements
f271071330 Add the file name list to the unified message metadata pass.
Even if the caller never uses the file names, there is little cost to
simply fetching the file name terms.  However, retrieving the full
paths requires additional database work, so the expansion from terms
to full paths is performed lazily.

This also simplifies clearing the filename cache, since that's now
handled by the generic metadata cache code.

This further reduces my inbox search from 3.102 seconds before the
unified metadata pass to 2.206 seconds (1.4X faster).
2011-03-21 02:45:18 -04:00
Austin Clements
206938ec9b Add a generic function to get a list of terms with some prefix.
Replace _notmuch_convert_tags with this and simplify
_create_filenames_for_terms_with_prefix.  This will also come in handy
shortly to get the message file name list.
2011-03-21 02:45:18 -04:00
Austin Clements
f3c1eebfaf Implement an internal generic string list and use it.
This replaces the guts of the filename list and tag list, making those
interfaces simple iterators over the generic string list.  The
directory, message filename, and tags-related code now build generic
string lists and then wraps them in specific iterators.  The real wins
come in later patches, when we use these for even more generic
functionality.

As a nice side-effect, this also eliminates the annoying dependency on
GList in the tag list.
2011-03-21 02:45:18 -04:00
Austin Clements
d9b0ae918f Use a single unified pass to fetch scalar message metadata.
This performs a single pass over a message's term list to fetch the
thread ID, message ID, and reply-to, rather than requiring a pass for
each.  Xapian decompresses the term list anew for each iteration, so
this reduces the amount of time spent decompressing message metadata.

This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X
faster).
2011-03-21 02:45:18 -04:00
Carl Worth
db70f3f0c4 lib: Save and restore term position in message while indexing.
This fixes the recently addead search-position-overlap bug as
demonstrated in the test of the same name.
2011-01-26 15:59:19 +10:00
Carl Worth
99cfa27030 Add support for folder-based searching.
A new "folder:" prefix in the query string can now be used to match
the directories in which mail files are stored.

The addition of this feature causes the recently added
search-by-folder tests to now pass.
2011-01-15 15:37:43 -08:00
Carl Worth
36161181df Correct some minor typos in a comment
Nothing too important here. Just some misspellings I noticed while reading
nearby code.
2011-01-15 15:37:43 -08:00
Austin Clements
b3caef1f06 Optimize thread search using matched docid sets.
This reduces thread search's 1+2t Xapian queries (where t is the
number of matched threads) to 1+t queries and constructs exactly one
notmuch_message_t for each message instead of 2 to 3.
notmuch_query_search_threads eagerly fetches the docids of all
messages matching the user query instead of lazily constructing
message objects and fetching thread ID's from term lists.
_notmuch_thread_create takes a seed docid and the set of all matched
docids and uses a single Xapian query to expand this docid to its
containing thread, using the matched docid set to determine which
messages in the thread match the user query instead of using a second
Xapian query.

This reduces the amount of time required to load my inbox from 4.523
seconds to 3.025 seconds (1.5X faster).
2010-12-07 16:40:05 -08:00
Carl Worth
7278383005 lib: Fix missing initialization of status field.
This could have been a problematic bug. Fortuinately "gcc -O2" warns
about it.
2010-11-11 20:54:41 -08:00
Carl Worth
fe8eeaf4a5 lib: Add two missing static qualifiers
The debian packaging is nice enough to notice when we accidentally
leak private symbols to the public interface.
2010-11-11 20:53:21 -08:00
Carl Worth
96d99c3837 tags_to_maildir_flags: Fix to preserve existing, unsupported flags
This is to prevent notmuch from destroying any information the user
has encoded as flags in the maildir filename. Tests are also added to
the test suite to verify the documented behavior.
2010-11-11 16:36:02 -08:00
Carl Worth
95dd5fe5d7 notmuch_message_tags_to_maildir_flags: Do nothing outside of "new" and "cur"
Some people use notmuch with non-maildir files, (for example, email
messages in MH format, or else cool things like using sluk[*] to suck
down feeds into a format that notmuch can index).

To better support uses like that, don't do any renaming for files that
are not in a directory named either "new" or "cur".

[*] https://github.com/krl/sluk/
2010-11-11 14:32:17 -08:00
Carl Worth
37a8096fdc notmuch_message_tags_to_maildir_flags: Don't exit on failure to rename.
It is totally legitimate for a non-maildir directory to be named "new"
(and not have a directory next to it named "cur"). To support this
case at least, be silent about any rename failure.
2010-11-11 03:50:42 -08:00
Carl Worth
71a3201885 notmuch_message_tags_to_maildir_flags: Fix to rename multiple files
This function was documented as modifying every filename associated
with the message. Fix it to actually do that.
2010-11-11 03:47:11 -08:00
Carl Worth
404db1de90 maildir_flags_to_tags: Avoid interpreting "no info" as "no flags set".
If a filename has no maildir info at all, (that is, it does not
contain the sequence ":2,"), we consider this distinct from a filename
with an empty maildir info, (the ":2," separator is present, but no
flags characters follow).

Specifically, we regard a missing info field as providing no
information, so tags will remain unchanged. On the other hand, an info
field that is present but has no flags set will cause various tags to
be cleared, (or in the case of "unread", added).

This fixes the "remove info" case of the maildir-sync tests in the
test suite.
2010-11-11 03:40:19 -08:00
Carl Worth
81cbaafc0f Fix notmuch_message_tags_to_maildir_flags to effect rename immediately
We have tests to ensure that when the notmuch library renames a file
that that rename takes place immediately in the database, (without
requiring something like "notmuch new" to notice the change).

This was working when the code was first added, but recently broke in
the reworking of the maildir-synchronization interface since the
tags_to_maildir_flags function can no longer assume that it is being
called as part of _notmuch_message_sync.

Fortunately, the fix is as simple as adding an explicit call to
_notmuch_message_sync.
2010-11-11 03:40:19 -08:00
Carl Worth
4b6063397f Fix notmuch_message_maildir_flags_to_tags to iterate over filenames
As documented, this function now iterates over all filenames for the
message, computing a logical OR of the flags set on the filenames,
then uses the final result to set tags on the message.

This change fixes 3 of the 10 maildir-sync tests that have been
failing since being added.
2010-11-11 03:40:19 -08:00
Carl Worth
1d02dd64af lib: Add new, public notmuch_message_get_filenames
This augments the existing notmuch_message_get_filename by allowing
the caller access to all filenames in the case of multiple files for a
single message.

To support this, we split the iterator (notmuch_filenames_t) away from
the list storage (notmuch_filename_list_t) where previously these were
a single object (notmuch_filenames_t). Then, whenever the user asks
for a file or filename, the message object lazily creates a complete
notmuch_filename_list_t and then:

	For notmuch_message_get_filename, returns the first filename
	in the list.

	For notmuch_message_get_filenames, creates and returns a new
	iterator for the filename list.
2010-11-11 03:40:19 -08:00
Carl Worth
d422dcf0a2 lib: Remove the notion of TAGS_INVALID
This rather ugly hack was recently obviated by the removal of the
notmuch_database_set_maildir_sync function. Now, clients must make
explicit calls to do any syncrhonization between maildir flags and
tags. So the library no longer needs to worry about doing inconsistent
synchronization while a message is only partially added.
2010-11-11 03:40:19 -08:00
Carl Worth
bb74e9dff8 lib: Rework interface for maildir_flags synchronization
Instead of having an API for setting a library-wide flag for
synchronization (notmuch_database_set_maildir_sync) we instead
implement maildir synchronization with two new library functions:

	notmuch_message_maildir_flags_to_tags
  and   notmuch_message_tags_to_maildir_flags

These functions are nicely documented here, (though the implementation
does not quite match the documentation yet---as plainly evidenced by
the current results of the test suite).
2010-11-11 03:40:19 -08:00
Carl Worth
2c262042ac lib: Remove the synchronization of 'T' flag with "deleted" tag.
Tags in a notmuch database affect all messages with the identical
message-ID. But maildir tags affect individual files. And since
multiple files can contain the identical message-ID, there is not a
one-to-one correspondence between messages affected by tags and flags.

This is particularly dangerous with the 'T' (== "trashed") maildir
flag and the corresponding "deleted" tag in the notmuch
database. Since these flags/tags are often used to trigger
irreversible deletion operations, the lack of one-to-one
correspondence can be potentially dangerous.

For example, consider the following sequence:

  1. A third-party application is used to identify duplicate messages
     in the mail store, and mark all-but-one of each duplicate with
     the 'T' flag for subsequent deletion.

  2. A "notmuch new" operation reads that 'T' flag, adding the
     "deleted" flag to the corresponding messages within the notmuch
     database.

  3. A subsequent notmuch operation, (such as a "notmuch dump; notmuch
     restore" cycle) synchronized the "deleted" tag back to the mail
     store, applying the 'T' flag to all(!) filenames with duplicate
     message IDs.

  4. A third-party application reads the 'T' flags and irreversibly
     deletes all mail messages which had any duplicates(!).

In order to avoid this scenario, we simply refuse to synchronize the
'T' flag with the "deleted" tag. Instead, applications can set 'T' and
act on it to delete files, or can set "deleted" and act on it to
delete files. But in either case the semantics are clear and there is
never dangerous propagation through the one-to-many mapping of notmuch
message objects to files.
2010-11-11 02:35:03 -08:00
Michal Sojka
d9d3d3e6f0 Make maildir synchronization configurable
This adds group [maildir] and key 'synchronize_flags' to the
configuration file. Its value enables (true) or diables (false) the
synchronization between notmuch tags and maildir flags. By default,
the synchronization is disabled.
2010-11-10 13:09:32 -08:00
Michal Sojka
088801a14a Maildir synchronization
This patch allows bi-directional synchronization between maildir
flags and certain tags. The flag-to-tag mapping is defined by flag2tag
array.

The synchronization works this way:

1) Whenever notmuch new is executed, the following happens:
   o New messages are tagged with configured new_tags.
   o For new or renamed messages with maildir info present in the file
     name, the tags defined in flag2tag are either added or removed
     depending on the flags from the file name.

2) Whenever notmuch tag (or notmuch restore) is executed, a new set of
   flags based on the tags is constructed for every message and a new
   file name is prepared based on the old file name but with the new
   flags. If the flags differs and the old message was in 'new'
   directory then this is replaced with 'cur' in the new file name. If
   the new and old file names differ, the file is renamed and notmuch
   database is updated accordingly.

   The rename happens before the database is updated. In case of crash
   between rename and database update, the next run of notmuch new
   brings the database in sync with the mail store again.
2010-11-10 13:09:31 -08:00
Carl Worth
d064bd696c lib: Eliminate some redundant includes of xapian.h
Most files including this already include database-private.h which
includes xapian.h already.
2010-11-01 23:24:40 -07:00
Carl Worth
98845fdbb2 Avoid database corruption by not adding partially-constructed mail documents.
Previously we were using Xapian's add_document to allocate document ID
values for notmuch_message_t objects.  This had the drawback of adding
a partially constructed mail document to the database. If notmuch was
subsequently interrupted before fully populating this document, then
later runs would be quite confused when seeing the partial documents.

There are reports from the wild of people hitting internal errors of
the form "Message ... has no thread ID" for example, (which is
currently an unrecoverable error).

We fix this by manually allocating document IDs without adding
documents. With this change, we never call Xapian's add_document
method, but only replace_document with either the current document ID
of a message or a new one that we have allocated.
2010-06-04 10:16:53 -07:00
Carl Worth
361b9d4bd9 Fix misnamed function in internal documentation.
The documentation for several functions mentioned
_notmuch_message_set_sync which doesn't exist. Fix these to reference
_notmuch_message_sync instead.
2010-06-04 09:54:46 -07:00
Dirk Hohndel
57561414d7 Add authors member to message
message->authors contains the author's name (as we want to print it)
get / set methods are declared in notmuch-private.h

Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
2010-04-26 11:44:49 -07:00
Carl Worth
c446f22dee lib: Silence a compiler warning.
The original code was harmless, but apparently some compilers aren't
able to think deep enough to catch that.
2010-03-09 12:07:26 -08:00
Carl Worth
4e5d2f22db lib: Rename iterator functions to prepare for reverse iteration.
We rename 'has_more' to 'valid' so that it can function whether
iterating in a forward or reverse direction. We also rename
'advance' to 'move_to_next' to setup parallel naming with
the proposed functions 'move_to_first', 'move_to_last', and
'move_to_previous'.
2010-03-09 09:22:29 -08:00
Carl Worth
9439b217c3 Switch from random to sequential thread identifiers.
The sequential identifiers have the advantage of being guaranteed to
be unique (until we overflow a 64-bit unsigned integer), and also take
up half as much space in the "notmuch search" output (16 columns
rather than 32).

This change also has the side effect of fixing a bug where notmuch
could block on /dev/random at startup (waiting for some entropy to
appear). This bug was hit hard by the test suite, (which could easily
exhaust the available entropy on common systems---resulting in large
delays of the test suite).
2010-02-09 11:14:11 -08:00
Carl Worth
ccf2e0cc42 lib: Add non-content terms with a WDF value of 0.
The WDF is the "within-document frequency" value for a particular
term. It's intended to provide an indication of how frequent a term is
within a document, (for use in computing relevance). Xapian's term
generator already computes WDF values when we use that, (which we do
for indexing all mail content).

We don't use the term generator when adding single terms for things
that don't actually appear in the mail document, (such as tags, the
filename, etc.). In this case, the WDF value for these terms doesn't
matter much.

But Xapian's flint backend can be more efficient with changes to terms
that don't affect the document "length". So there's a performance
advantage for manipulating tags (with the flint backend) if the WDF of
these terms is 0.
2010-01-09 11:18:27 -08:00