This adds a 100 termpos gap between all phrases indexed by
_notmuch_message_gen_terms. This fixes a bug where terms from the end
of one header and the beginning of another header could match together
in a single phrase and a separate bug where term positions of
un-prefixed terms overlapped.
This fix only affects newly indexed messages. Messages that are
already indexed won't benefit from this fix without re-indexing, but
the fix won't make things any worse for existing messages.
This adds two known-broken tests and one working test related to the
term positions assigned to terms from different headers or MIME parts.
The first test fails because we don't create a termpos gap between
different headers. The second test fails because we don't adjust
termpos at all when indexing multiple parts.
Previously, we indexed the name and address parts of from/to headers
with two calls to _notmuch_message_gen_terms. In general, this
indicates that these parts are separate phrases. However, because of
an implementation quirk, the two calls to _notmuch_message_gen_terms
generated adjacent term positions for the prefixed terms, which
happens to be the right thing to do in this case, but the wrong thing
to do for all other calls. Furthermore, _notmuch_message_gen_terms
produced potentially overlapping term positions for the un-prefixed
copies of the terms, which is simply wrong.
This change indexes both the name and address in a single call to
_notmuch_message_gen_terms, indicating that they should be part of a
single phrase. This masks the problem with the un-prefixed terms
(fixing the two known-broken tests) and puts us in a position to fix
the unintentionally phrases generated by other calls to
_notmuch_message_gen_terms.
Two of these are currently known-broken. We index the name and
address parts in two separate calls to _notmuch_message_gen_terms.
Currently this has the effect of placing the term positions of the
prefixed terms from the second call right after those of the first
call, but screws up the term positions of the non-prefixed terms.
Two of the search tests for "from" and "to" queries were clearly
trying to search for prefixed phrases, but forgot to shell quote the
phrases. Fix this by quoting them correctly.
This is effectively a revert of
commit 6812136bf5
Author: Jani Nikula <jani@nikula.org>
Date: Mon Mar 31 00:21:48 2014 +0300
lib: drop support for single-message mbox files
The intention was to drop support for indexing new single-message mbox
files (and whether that was a good idea in the first place is
arguable). However this inadvertently broke support for reading
headers from previously indexed single-message mbox files, which is
far worse.
Distinguishing between the two cases would require more code than
simply bringing back support for single-message mbox files.
At least in emacs24, this removes the "site-lisp" directories from the
load path in addition to enforcing --no-site-lisp --no-init-file.
This works around a slightly mysterious bug on Debian that causes
test-lib.el not to load when there is cl-lib.el(c) in some site-lisp
directory. It should be harmless in general since we really don't
want to load any files from addon packages to emacs.
It turns out to be inconvenient to delete the downloaded datafiles with
distclean, so I propose a new target which does that instead.
The closest conventional target is 'maintainer-clean'; the difference
here is that having the original source tarball is not enough to
reconstruct these files.
The linking to talloc is hard-coded in the testing Makefile. This patch
causes the linking to talloc to be done according to how TALLOC_LDFLAGS
was configured.
Signed-off-by: Charles Celerier <cceleri@cs.stanford.edu>
- The old test was quite impossible to debug; the new one shows the difference
between the two directories, if any.
- "repository" doesn't make sense for out of tree builds. Or tarball
builds, for that matter.
All we do here is calculate the backup filename, and call the existing
dump routine.
Also take the opportunity to add a message about being safe to
interrupt.
The main goal is to support gzipped output for future internal
calls (e.g. from notmuch-new) to notmuch_database_dump.
The additional dependency is not very heavy since xapian already pulls
in zlib.
We want the dump to be "atomic", in the sense that after running the
dump file is either present and complete, or not present. This avoids
certain classes of mishaps involving overwriting a good backup with a
bad or partial one.
We've supported mbox files containing a single message for historical
reasons, but the support has been deprecated, with a warning message
while indexing, since Notmuch 0.15. Finally drop the support, and
consider all mbox files non-email.
Previously, the term escaper used a blacklist of characters that
needed escaping. This blacklist turned out to be somewhat incomplete;
for example, it did not contain non-whitespace ASCII control
characters or Unicode "fancy quotes", both of which do require the
term to be escaped.
Switch to a whitelist of characters that are definitely safe to leave
unquoted. This fixes the broken test introduced by the previous
patch.
The current term escaper gets most of these right, but fails to escape
things containing Unicode "fancy quotes" or things containing
non-whitespace control characters.
I still have one machine with old enough Xapian to not have compaction
support. Make the tests check for unsupported compact operation when
compact is not available.
This allows (and requires) the original-tags to be passed along with
the current-tags to be passed to notmuch-tag-format-tags. This allows
the tag formatting to show added and deleted tags.By default a removed
tag is displayed with strike-through in red (if strike-through is not
available, eg on a terminal, inverse video is used instead) and an
added tag is displayed underlined in green.
If the caller does not wish to use the new feature it can pass
current-tags for both arguments and, at this point, we do exactly that
in the three callers of this function.
Note, we cannot tidily allow original-tags to be optional because we would
need to distinguish nil meaning "we are not specifying original-tags"
from nil meaning there were no original-tags (an empty list).
We use this in subsequent patches to make it clear when a message was
unread when you first loaded a show buffer (previously the unread tag
could be removed before a user realised that it had been unread).
The code adds into the existing tag formatting code. The user can
specify exactly how a tag should be displayed normally, when deleted,
or when added.
Since the formatting code matches regexps a user can match all deleted
tags with a ".*" in notmuch-tag-deleted-formats. For example setting
notmuch-tag-deleted-formats to '((".*" nil)) tells notmuch not to show
deleted tags at all.
All the variables are customizable; however, more complicated cases
like changing the face depending on the type of display will require
custom lisp.
Currently this overrides notmuch-tag-deleted-formats for the tests
setting it to '((".*" nil)) so that they get removed from the display
and, thus, all tests still pass.
It turns out there was a reason the old man pages were stored in a man
compatible hierarchy, namely so that we could run man on them before
installing.
Hardcode doc build location into test suite. This isn't ideal, but
let's unbreak the test suite for now.
The checksum file is used by the test infrastructure to verify the downloaded
test database is the one we had in mind. Note that this test is
rather strict, and the the checksum file needs to be recommitted when
the database is regenerated.
add a pattern .gitignore to ignore the actual databases
Test the upgrade from probabilistic to boolean folder: terms, and
addition of path: terms.
The test depends on the pre-built test corpus and database tarball and
checksum file being in place. If it's not, the test is skipped. The
mechanism to fetch the test database will be added later.
At the time of writing, a working test database and checksum file is
available at
http://notmuchmail.org/releases/test-databases/
It has been noted that some non-GNU environments make lack
sha256sum. We leave this portability issue for a followup patch.
In xapian terms, convert folder: prefix from probabilistic to boolean
prefix, matching the paths, relative from the maildir root, of the
message files, ignoring the maildir new and cur leaf directories.
folder:foo matches all message files in foo, foo/new, and foo/cur.
folder:foo/new does *not* match message files in foo/new.
folder:"" matches all message files in the top level maildir and its
new and cur subdirectories.
This change constitutes a database change: bump the database version
and add database upgrade support for folder: terms. The upgrade also
adds path: terms.
Finally, fix the folder search test for literal folder: search, as
some of the folder: matching capabilities are lost in the
probabilistic to boolean prefix change.
We will need this for improved folder search tests, but having some
folders should exercise our code paths better anyway.
Modify the relevant test accordingly to make it pass.
This reorganization triggers a bug in the test suite, namely that it
expects the output of --output=files to be in a certain order. So we
add the fix for that into the same commit.
This mainly involves sorting, although the case --duplicate=$n
requires more subtlety.
Sanitize tabs and newlines to spaces rather than question marks in
--output=summary --format=text output.
This will also hide any difference in unfolding a header that has been
folded with a tab. Our own header parser replaces tabs with spaces,
while gmime would retain the tab.
The printf builtin "%(fmt)T" specifier (which allows time values
to use strftime-like formatting) is introduced in bash 4.2.
Trying to execute this in pre-4.2 bash will fail -- and if this
happens execute the fallback piece of perl code to do the same thing.
The test names assigned to NOTMUCH_SKIP_TESTS variable can now be given
with or without the Tddd- prefix for tester convenience:
The test name without Tddd -prefix stays constant even when test filenames
are renumbered.
The test name with Tddd -prefix is printed out when tests run.
According the semantics of make, the expansion of $(dir) in recipes
uses dynamic scope, i.e. the value at the time the recipe is run. This
means if test/Makefile.local is not the last sub-makefile included,
all heck breaks loose.
Previously, we stripped the "Tnnn-" part from the test name when
printing its description at the beginning of each test. However, this
makes it difficult to find the source script for a test (e.g., when a
test fails). Put this prefix back.
Searching by Message-Id no longer works via the old mail-archive.com
API, though I have contacted them in hopes that they restore it to
prevent dead links. Anyway, the new API is cleaner.
Acked-by: Austin Clements <amdragon@MIT.EDU>
Script `notmuch-test` expects the results file have T\d\d\d- part
intact so the results files (and some test output files) are now
name as such.
Without this change `notmuch-test` will exit in case the test
script it was executing exited with nonzero value.
The T\d\d\d- part is dropped in new variable $this_test_bare which is
used in progress informational messages and when loading .el files in
emacs tests (whenever $this_test_bare.el exists).
If there is a syntax error in the emacs test library, it causes other
tests to hang or crash without a useful error message.
This test could be eliminated if the error reporting for emacs tests
was somehow improved.