This is a rebase and cleanup of Istvan Marko's patch from
id:m3pqnj2j7a.fsf@zsu.kismala.com
Search retrieves these headers for every message in the search
results. Previously, this required opening and parsing every message
file. Storing them directly in the database significantly reduces IO
and computation, speeding up search by between 50% and 10X.
Taking full advantage of this requires a database rebuild, but it will
fall back to the old behavior for messages that do not have headers
stored in the database.
Apparently the method was renamed in Xapian 1.1.0 but the old method
name will stay around for a while. It seems better to stick with the
old name to make notmuch compile with older versions of Xapian, at
least for now.
We keep the lib/xutil.c version. As a consequence, also factor out
_internal_error and associated macros. It might be overkill to make a
new file error_util.c for this, but _internal_error does not really
belong in database.cc.
Based on discussions with amdragon, tschwinge, and others on IRC, I concluded that
1) symbol versioning was probably overkill for libnotmuch
2) It was also probably GNU ld specific
3) Most importantly, nobody could tell me on short notice how exactly it works.
So since the change to the notmuch_database_find_message breaks the
previous ABI, we need to bump the SONAME.
Previously, the functions notmuch_database_find_message() and
notmuch_database_find_message_by_filename() functions did not properly
report error condition to the library user.
For more information, read the thread on the notmuch mailing list
starting with my mail "id:871uv2unfd.fsf@gmail.com"
Make these functions accept a pointer to 'notmuch_message_t' as argument
and return notmuch_status_t which may be used to check for any error
condition.
restore: Modify for the new notmuch_database_find_message()
new: Modify for the new notmuch_database_find_message_by_filename()
State up front that these functions may add a filename to an existing
message or remove only a filename (and not the message), respectively.
Previously, this key information was buried in return value
documentation or in "notes", which made it seem secondary to these
functions' semantics.
Adding a message may involve changes to multiple database documents,
and thus needs to be done in a transaction. This makes add_message
(and, I think, the whole library) atomicity-safe: library callers only
needs to use atomic sections if they needs atomicity across multiple
library calls.
notmuch_database_find_message_by_filename is mostly stolen from
notmuch_database_remove_message, so this patch also vastly simplfies
the latter using the former.
This API is also useful in its own right and will be used in a later
patch for eager maildir flag synchronization.
Previously, notmuch_database_remove_message would remove the message
file name, sync the change to the message document, re-find the
message document, and then delete it if there were no more file names.
An interruption after sync'ing would result in a file-name-less,
permanently un-removable zombie message that would produce errors and
odd results in searches. We could wrap this in an atomic section, but
it's much simpler to eliminate the round-about approach and just
delete the message document instead of sync'ing it if we removed the
last filename.
notmuch_database_t now keeps a nesting count and we only start a
transaction or commit for the outermost atomic section.
Introduces a new error, NOTMUCH_STATUS_UNBALANCED_ATOMIC.
Previously, this function would synchronize the folder list even if
removing the file name failed. Now it returns immediately if removing
the file name fails.
If the configure script detects missing getline and/or getdelim
symbols, then notmuch will use it's own versions. This patch, based on
id:"87k49v12i5.fsf@pc44es141.cs.uni-magdeburg.de" by Matthias
Guedemann, adds the symbols to notmuch.sym as well so they are
properly exported from the library.
OpenBSD nm apparently doesn't support --defined.
The awk condition is based on the assumption that all defined symbols
have some hex number in the first column.
Thanks to Matthias Guedemann reporting the problem, and an earlier
version of this patch.
Unfortunately Robin Green's patch 52e4dedf9a was lost when I created
gen-version-script.sh. This merges his changes manually into that
script. It turns out tabs seem not needed in version script
files, so I simplified a bit and removed the printf.
Thanks to Alexander Botero-Lowry for help and testing.
If the notmuch.sym target does not explicitly depend on $(libnotmuch_modules),
gen-version-script.sh may be run before all the .o files are created, for
example when doing a parallel build on a machine with many cores.
Conflicts:
lib/Makefile.local
The conflicts are from three kinds of commits not merged into release:
- typo fixes
- removal of debug output
- fix for CLEAN rule
That were never merged into the release branch.
The lack of such exporting seems to cause problems catching
exceptions, as suggested by
http://gcc.gnu.org/wiki/Visibility
This manifested in the symbol-hiding test failing when notmuch was
compile with gcc 4.4.5. On i386, this further manifested as notmuch
new failing to run (crashing with an uncaught exception on first run).
Add removal of all ZXFOLDER terms to removal of all XFOLDER terms for
each message filename removal.
The existing filename-list reindexing will put all the needed terms
back in. Test search-folder-coherence now passes.
Signed-off-by:Mark Anderson <ma.skies@gmail.com>
(cherry picked from commit 8a856e5c38)
Add removal of all ZXFOLDER terms to removal of all XFOLDER terms for
each message filename removal.
The existing filename-list reindexing will put all the needed terms
back in. Test search-folder-coherence now passes.
Signed-off-by:Mark Anderson <ma.skies@gmail.com>
Carl reports "gcc -aux-info notmuch.aux lib/notmuch.h" does not
generate notmuch.aux for him with Debian gcc 4.6.0-8. A small
modification of the original sed regular expression allows us to work
directly from lib/notmuch.h, rather than preprocessing with gcc.
As with most such simple regex based "parsing", this is quite
sensitive to the input format, and needs that each symbol to be
exported from libnotmuch should
- start with "notmuch_"
- be the first non-whitespace token on the line
- be followed by an open parenthesis.
(Cherry-picked from 51b7ab6968, with conflicts resolved by db)
Carl reports "gcc -aux-info notmuch.aux lib/notmuch.h" does not
generate notmuch.aux for him with Debian gcc 4.6.0-8. A small
modification of the original sed regular expression allows us to work
directly from lib/notmuch.h, rather than preprocessing with gcc.
As with most such simple regex based "parsing", this is quite
sensitive to the input format, and needs that each symbol to be
exported from libnotmuch should
- start with "notmuch_"
- be the first non-whitespace token on the line
- be followed by an open parenthesis.
- c0961e6 introduced a missing slash between $(dir)$(LIBNAME) and missing
$(dir) in front of libnotmuch.a
- cdf1c70a created a file $(dir)/notmuch.h.gch and neglected to
add it to CLEAN
Various typo fixes in comments within the source code.
Signed-off-by: Pieter Praet <pieter@praet.org>
Edited-by: Carl Worth <cworth@cworth.org> Restricted to just
source-code comments, (and fixed fix of "descriptios" to "descriptors"
rather than "descriptions").
Various typo fixes in comments within the Makefile and other build scripts.
Signed-off-by: Pieter Praet <pieter@praet.org>
Edited-by: Carl Worth <cworth@cworth.org> Restricted to just build files.
This is closely tied to gcc and particularly gnu ld, but I guess the
shared library linking code would need to be adjusted to work on a
non-gnu linker anyay.
I had to make a few not-obviously related changes to the
lib/Makefile.local to make this work: libnotmuch_modules is defined
with := and used in place of $^
(cherry picked from commit 014bf85b1c06ff49be2bde5a26433d2cf376cf70)
We're not properly concatenating the Received headers if we parse them
while requesting a header that isn't Received.
this fixes notmuch-reply address detection in a bunch of situations.
This patch adds the tag "signed" to messages with any multipart/signed
parts, and the tag "encrypted" to messages with any
multipart/encrypted parts. This only occurs when messages are indexed
during notmuch new, so a database rebuild is required to have old
messages tagged.
As of gcc 4.6, there are new warnings from -Wattributes along the lines of:
warning: ‘_notmuch_messages’ declared with greater visibility
than the type of its field ‘_notmuch_messages::iterator’
[-Wattributes]
To squelch these, we decorate all such containing structs with
__attribute__((visibility("default"))). We take care to let only the
C++ compiler see this, (since the C compiler would otherwise warn
about ignored visibility attributes on types).
gcc (at least as of version 4.6.0) is kind enough to point these out to us,
(when given -Wunused-but-set-variable explicitly or implicitly via -Wunused
or -Wall).
One of these cases was a legitimately unused variable. Two were simply
variables (named ignored) we were assigning only to squelch a warning about
unused function return values. I don't seem to be getting those warnings
even without setting the ignored variable. And the gcc docs. say that the
correct way to squelch that warning is with a cast to (void) anyway.
Now each caller of notmuch_message_get_tags only gets a new iterator,
instead of a whole new list. In principle this could cause problems
with iterating while modifying tags, but through the magic of talloc
references, we keep the old tag list alive even after the cache in the
message object is invalidated.
This reduces my index search from the 3.102 seconds before the unified
metadata pass to 1.811 seconds (1.7X faster). Combined with the
thread search optimization in b3caef1f06,
that makes this query 2.5X faster than when I started.
Even if the caller never uses the file names, there is little cost to
simply fetching the file name terms. However, retrieving the full
paths requires additional database work, so the expansion from terms
to full paths is performed lazily.
This also simplifies clearing the filename cache, since that's now
handled by the generic metadata cache code.
This further reduces my inbox search from 3.102 seconds before the
unified metadata pass to 2.206 seconds (1.4X faster).
Replace _notmuch_convert_tags with this and simplify
_create_filenames_for_terms_with_prefix. This will also come in handy
shortly to get the message file name list.
This replaces the guts of the filename list and tag list, making those
interfaces simple iterators over the generic string list. The
directory, message filename, and tags-related code now build generic
string lists and then wraps them in specific iterators. The real wins
come in later patches, when we use these for even more generic
functionality.
As a nice side-effect, this also eliminates the annoying dependency on
GList in the tag list.
This performs a single pass over a message's term list to fetch the
thread ID, message ID, and reply-to, rather than requiring a pass for
each. Xapian decompresses the term list anew for each iteration, so
this reduces the amount of time spent decompressing message metadata.
This reduces my inbox search from 3.102 seconds to 2.555 seconds (1.2X
faster).
Such as:
mkdir build
cd build
../configure
make
This is implemented by having the configure script set a srcdir
variable in Makefile.config, and then sprinkling $(srcdir) into
various make rules. We also use vpath directives to convince GNU make
to find the source files from the original source directory.
Don't require the caller of _notmuch_doc_id_set_init to pass in a
correct bound; instead compute it from the array. This simplifies the
caller and makes this interface easier to use correctly.
Remove the repeated "sizeof (doc_ids->bitmap[0])" that bothered cworth
by instead defining macros to compute the word and bit offset of a
given bit in the doc ID set bitmap.