I still don't know everything about how I want search order to be
customizable, but I do like the current defaults, (namely, performing
a new search gives results newest first, but performing a saved search
like "tag:inbox" gives results as oldest first).
Until we come up with a better plan for people to select what *they*
want, (rather than just getting what I want), let's codify the current
results in the test suite.
After any emacs test failure, the tmp.emacs directory will have this
run_emacs script in it which the user can use to run emacs within the
test suite environment, (pointing at the test suite's notmuch
database, using the local notmuch command-line program, and the local
notmuch emacs lisp code).
My scripts expect that empty search result is actually empty. Since
commit 6dcb7592, even empty search prints a newline character and this
breaks my scripts.
This patch adds a test for this bug. In the test I cannot use
test_expect_equal function as $() operator suppresses the final
newline and this kind of difference is not detected.
test/search | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
The bash code in the test suite is using associative arrays which were
only added to bash as of release 4.0.
If the test suite is run with an older bash, we now immediately error
out and explain the situation, (instead of emitting confusing error
messages and failing dozens of tests, which is what happened before
this change).
Some recently-added tests used hard-coded thread ID values in search
specifications. This is unreliable since the thread IDs depend on the
order in which "notmuch new" encounters new files, (which in turn can
depend on inode ordering within the filesystem).
Fix these by using the new "notmuch search --output=threads" to find the
correct thread IDs given a hard-coded (but reliable) message ID.
The reply is primarily taken care of by "notmuch reply" which is already
thoroughly tested. But a recent bug is inserting a duplicate From header
in the emacs-based reply. So exercise that bug here.
Update the tests so that they no longer expect the Bcc header in the
output of "notmuch reply" now that it has been removed.
Edited-by Carl Worth: Simply applying the change to our newly
modularized test suite.
We test that the message we sent via (fake) SMTP is included in the mail
index after a "notmuch new". This verifies that the FCC setting indeed
successfully saved the sent message within the notmuch mail store.
Simply setting an explicit date is cleaner than letting the current,
(arbitrary), date get generated for the email message and then constantly
filtering that date out of search results.
Now that the FCC code is fixed to use the notmuch database path, we can
actually enable this by default, which should be highly useful for all
new users of notmuch.
Rather than *reall* sending mail here, we instead have a new test
program, smtp-dummy which implements (a small piece of) the
server-side SMTP protocol and saves a mail message to the filename
provided. This gives us reasonable test coverage of a large chunk of
the notmuch+emacs code base (down to talking to an SMTP server with
the final mail contents).
We set the HOME environment variable to the test directory to avoid
the tests relying on any configuration files from the test author's
own home directory, (such as ${HOME}/.emacs or similar).
If Xapian sees unquoted ".." as in id:123..456 then it thinks that's a
range specification. We avoid this problem by instead passing
id:"123..456" to Xapian.
We simulate the act of selecting the "inbox" saved search from
notmuch-hello and the act of selecting a desired thread from the
notmuch-search results.
The test for the navigation of notmuch-hello is currently marked as
BROKEN since its output is in the opposite order compared to the
'(notmuch-search "tag:inbox")' test. This question of ordering is a
currently open issue on the notmuch mailing list, so we'll let the
test suite reflect that for now.
Finally, this commit also abstracts some common emacs lisp code,
(waiting for the current buffer's process to complete), into a new
notmuch-test-wait function that is made available to anything calling
test_emacs.
This should be quite handy for doing automated testing of the
emacs-based functionality in notmuch. This function invokes emacs with
the necessary command-line arguments, (to run in batch mode with no
local initialization, to load the notmuch code from the source
directory, and to ensure an 80-column width).
While adding the documentation here for add_email_corpus I noticed
that the other email-adding functions in test-lib.sh were not yet
documented here, so add all of that documentation.
When the NOTMUCH variable was originally invented it was used as an
explicit path to the notmuch binary being tested. Today, the test
suite sets the PATH variable instead, so the NOTMUCH variable always
has a value of simply "notmuch".
We simplifying that by using the constant value rather than the
continual variable reference.
A bug in the results-aggregation code was causing the test suite to report
"all tests passed" even when there were failures, (as long as there were
also no "broken" tests). Fix this.
Now that we can usefully pass section names via the NOTMUCH_SKIP_TESTS
environment variable, it's useful to actually print those names out
for the user. Then, since we're now printing these names, let's use
nicer names, (not excessively long but also not using abbreviations
like "msg").
In order for --valgrind to be useful, we drop noisy additional output of
all of the commands being executed in verbose mode. This makes --verbose
alone quite useless, so we don't document it any more.
Also, add a zlib valgrind suppression that was showing up frequently in the
test suite.
This file was obviously describing the git test suite previously, and
would have been very hard to understand in the context of the notmuch
test suite. HOpefully it's easier to follow now.
By scanning test-lib.sh for occurrences of "git" or "GIT", I found
that most of those are internal things, (like the GIT_TEST_TEE_STARTED
variable). But GIT_SKIP_TESTS is part of the user-interface to the
test suite, so we rename it to reference notmuch rather than git.
Also, the GIT_TRACE warning is git-specific, so we drop that as well.
Since we are now using an explicit list of tests to run in
notmuch-test we need to be careful that we don't add a new file of
tests and then forget to add it to the list.
The numbers were meaningless, and they made it hard to find a file of interest.
Instead, we get the ordering we want by adding an explicit list of
tests to run to the notmuch-test script.
These were interfering with the aggregate statistics reported at the
end of the test-suite run. (Always reporting 1 broken, 1 fixed, and 1
skipped). The correct way to test the test-suite itself would be to
run the test suite externally for these cases, capture the expected
result, and then report that as a PASS test.
But, really, there's almost no value in these tests anyway. It's
almost to the level of testing that 'if false; exit 1; fi' returns
1. That is, there are so many ways that the test suite could be broken
internally, that these minor tests don't really help.
The original git test suite works by concatenating many commands into
a very long string (each separated by &&). This is painful to work
with since it prevents the editor from helping by parsing the shell
script, indenting, colorizing, etc.
Instead, we switch this back to something like the original notmuch
test suite, and add two new functions to test-lib.sh
(test_begin_subtest and test_expect_equal) to support these.
This also fixes the test suite to once again display the diff when a
test fails to generate the expected input.
This makes the new, git-derived test suite report results in a manner
similar to the original notmuch test suite.
Notable changes include:
* No more initial '*' on every line
* Only colorize a single word
* Don't print useless test numbers
* Use "PASS" in place of "ok"
* Begin sentences with a capital letter
* Print test descriptions for each block
* Separate each block of tests with a blank line
* Don't summarize counts between each block
This avoids "make test" emitting messages from three (3!) recursive
invocations of make. We change the invocations of the tests themselves
to occur directly from the shell script rather than having the shell
script invoke make again and using wildcards in the Makefile.
In order to have repeatable test suite, all times in messages are set
to UTC time zone to match the time zone (TZ variable) set in
test-lib.sh.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
The changes are:
- The notmuch-test was split into several files (t000?-*.sh).
- Removed helper functions which were moved to test-lib.sh
- Replaced every printf with test_expect_success.
- Test commands chained with && (test-lib.sh doesn't use "set -e" in
order to complete the test suite even if something fails)
- Many variables such as ${MAIL_DIR} were properly quoted as they
contain spaces.
- Changed quoting patterns in add_message and generate_message (single
quotes are already used by the test framework).
- ${TEST_DIR} replaced by ${PWD}
QUICK HOWTO:
To run the whole test suite
make
To run only a single test
./t0001-new.sh
To stop on the first error
./t0001-new.sh -i
then mail store and database can be inspected in
"trash directory.t0001-new"
To see the output of tests
./t0001-new.sh -v
To not remove trash directory at the end:
./t0001-new.sh -d
To run all tests verbosely:
make GIT_TEST_OPTS="-v"
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Modify the helper functions to work with git-based test suite i.e.
1) Quote arguments where it is necessary.
2) Do not use $NOTMUCH. It is equal to "notmuch" since $PATH is set to
the build tree.
3) Modify pass_if_equal to fit into the git-based test suite.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
This removes Git specific things from the test-lib.sh and adds helper
functions for notmuch taken from Carl's notmuch-test script. README is
also slightly modified to reflect the current state.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Git uses a simple and yet powerful test framework, written in shell.
The framework is easy to use for both users and developers so I think
it would help if it is used in notmuch as well.
This is a copy of Git's test framework from commit
b6b0afdc30e066788592ca07c9a6c6936c68cc11 in git repository.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Micah Anderson reported an issue where a message failed to display in
the emacs interface, (it instead gave an error, "json-read-string: Bad
string format").
Micah tracked this down to the json output from "notmuch show" being
interrupted by a GMime error message:
gmime-CRITICAL **: g_mime_stream_filter_add: assertion
`GMIME_IS_FILTER (filter)
I tracked this down further to notmuch passing a NULL value to
g_mime_stream_filter_add. And this was due to calling
g_mime_filter_charset_new with a value of "unknown-8bit".
So we add a test message withe a Conten-Type of "text/plain;
charset=unknown-8bit" from Micah's message. Then we fix "notmuch show"
to test for NULL before calling g_mime_stream_filter_add. Bug fixed.
Scott Henson reported an internal error that occurred when he tried to
add a message that referenced another message with a message ID well
over 300 characters in length. The bug here was running into a Xapian
limit for the length of metadata key names, (which is even more
restrictive than the Xapian limit for the length of terms).
We fix this by noticing long message ID values and instead using a
message ID of the form "notmuch-sha1-<sha1_sum_of_message_id>". That
is, we use SHA1 to generate a compressed, (but still unique), version
of the message ID.
We add support to the test suite to exercise this fix. The tests add a
message referencing the long message ID, then add the message with the
long message ID, then finally add another message referencing the long
ID. Each of these tests exercise different code paths where the
special handling is implemented.
A final test ensures that all three messages are stitched together
into a single thread---guaranteeing that the three code paths all act
consistently.
We're about to add a test with an excessively long message-id, (512
characters or so). This exceeds filename length limits, so just always
the simple counter to generate the filenames, (which we were doing for
messages with non-custom IDs anyway).
The commit said it fixed a problem with headers >200 characters
long. But examination of the code suggests that it was a header of
exactly 200 characters long that caused the problem. So we add a test
case for that here.
Before the fix in the previous commit, valgrind would detect many
errors when replying to the message created with this test case. After
that commit, those errors are gone.
Immediately after releasing 0.3 we learned that the magic-from-guessing
code could hang in an infinite loop in some cases. The bug occurred
only when the user had configured only a primary email addresss and no
other email addresses.
The test suite wasn't previously covering this case, so address this
shortcoming.
this test actually tests behavior that I consider as broken.
The Bcc should be to the same address as used in the From line,
otherwise we are creating a potential information leak as email
that is related to one email account (say, work) is copied to
a different account
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
These tests don't actually pass yet, since the feature being tested
has not been merged. But gettting these tests in first will let us
more easily test that the feature actually works, (and will help us
ensure we don't forget the feature before the next release).
right now these are not trying to be overly fancy
simply one test per strategy that we apply to figure out the best
from address - including the fallback if there's nothing to go on
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
When the test suite is run in a different time zone that where Carl
lives, some tests may fail depending on the time when the test suite is
run. For example, just now I get:
Search for all messages ("*"):... FAIL
--- test-031.expected 2010-04-23 09:33:47.898634822 +0200
+++ test-031.output 2010-04-23 09:33:47.898634822 +0200
@@ -1,5 +1,5 @@
-thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; Test message #6 (inbox unread)
-thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; Test message #14 (inbox unread)
+thread:XXX 2001-01-06 [1/1] Notmuch Test Suite; Test message #6 (inbox unread)
+thread:XXX 2001-01-06 [1/1] Notmuch Test Suite; Test message #14 (inbox unread)
thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; body search (inbox unread)
thread:XXX 2000-01-01 [1/1] searchbyfrom; search by from (inbox unread)
thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (inbox unread)
By setting a fixed time zone in the test script, these problems should
be eliminated.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
The test suite doesn't yet cover --format=json output nor UTF-8 in
subject or body.
This patch starts with test cases for 'search --format=json' and
'show --format=json'.
Furthermore, it has test cases for a search for a UTF-8 string in a mail
body for a UTF-8 string in a mail subject.
Finally, it has a test case for --format=json with UTF-8 messages,
demonstrating the fix in 1267697893-sup-4538@sam.mediasupervision.de.
Reviewed-by: Carl Worth <cworth@cworth.org>
Updated tests to current implementation of the test suite.
These tests demonstrate a bug in the current implementation
of "notmuch show --format=json", (timestamp output is changed
depending on current timezone).
If future updates to the test suite add more messages to the database
before this "notmuch show" test, then the message-ID numbers in the
expected output will all change. But we can at least compute the
numbers so that this test will continue to pass.
In the recent change to rename threads based on changing subject
lines, I broke message ordering within "notmuch show" output. But our
test suite didn't catch that regressions, because we didn't have any
tests of "notmuch show".
This adds one "notmuch show" test along with the thread-naming
tests. It's not a whole suite of "notmuch show" testing, but it does
catch this regression at least.
We're starting to get test output that's fairly long, so it's much
kinder to just show a diff rather than displaying the complete
expected and actual output. To allow the user to investigate things
after the fact, we save the expected and actual output to files named
test-${test_number}.expected and test-${test_number}.output .
We recently added a feature to name threads based on the messages that
actually matched the search, (as opposed to simply the oldest or
newest message in the thread whether it matched or not). So add tests
for that, and (surprise, surprise!) the feature does not entirely
work.
Hurrah---no more manual verification of that PASS column.
This means that "make test" can actually be a useful part of the
release process now, (since it will exit with non-zero status if there
are any failures).
Previously some tests (dump/restore) were doing ad-hoc verification of
values and their own printing of PASS/FAIL, etc. This made it
impossible to count test pass/fail rates in a single place.
The only reason these tests were written that way was because the old
execute_expecting function only worked if one could directly test the
stdout output of a notmuch command. The recent switch to pass_if_equal
means that all tests can use it.
This feature was added recently and should have gotten a new test at
the time.
As this test demonstrates, the code is broken, ("notmuch search '*'
returns bogus dates of the Unix epoch for any threads where the
term "and" does not appear in any messages).
Using a date in the current year makes the test suite fragile since
the search output will include a date of "January 05" for now, but
will start doing "2010-01-05" in the future.
The filenames aren't predictable (including the current directory) nor
stable from one run to the next (including the PID). This makes it
hard to predict the output from a search command that returns such a
message (such as "*").
The original goal was simply to ensure that each generated message was
distinguishable somehow. So just use the message counter instead.
The old execute_expecting function was doing far too much for its own
good. One of the worst aspects of this was that it introduced
shell-quoting challengers where the caller could not easily control
the precise invocation of the command to be executed.
I personally couldn't find a way to test "notmuch search '*'" without
the shell expanding * against files in the current directory, or
having bogus quotation marks appearing in the search string,
(defeating the recognition of "*" as a special search term).
Hopefully this aspect of the test suite will be much easier to maintain now.
The recent fix to properly decode encoded headers made the expected
output of "notmuch reply" differ by a single space, (previously, there
were two spaces before the References: value and now there is just
one).
Fix the test suite so that these are all noted as correct results
again.
These new tests demonstrate a bug as follows:
Multiple messages are added to the database
All of these message references a common parent
The parent message does not exist in the databas
In this scenario, the messages will not be recognized as belonging to
the same thread. We consider this a bug, and the new tests treat this
as a failure.
Edited by Carl Worth <cworth@cworth.org>: Split these tests into their
own commit (before the fix of the bug). This lets me see the actual
failure in the test suite, before the fix is applied. Also fix the
alignment of new messages from test suite, (so that the PASS portions
all line up---which is important while we're still manually verifying
test-suite results).
These results have all the same terms as the target phrase, but
not in the expected order. They are designed to ensure that we
actually test phrase searches.
And as it turns out, we're not currently quoting the search terms
properly, so the phrase-search tests now fail with this commit.
The sequential identifiers have the advantage of being guaranteed to
be unique (until we overflow a 64-bit unsigned integer), and also take
up half as much space in the "notmuch search" output (16 columns
rather than 32).
This change also has the side effect of fixing a bug where notmuch
could block on /dev/random at startup (waiting for some entropy to
appear). This bug was hit hard by the test suite, (which could easily
exhaust the available entropy on common systems---resulting in large
delays of the test suite).
These tests were surprisingly simple to write---not much code at all
and most of them worked the first time even with hand-prepared
versions of the expected output.
The previous generate_message function is what's needed when testing
"notmuch new". But after that, we never want to generate a message
without also adding it to the index. So create a new add_message
function with this convenience.
This is a test for the recently added feature where we detect that the
reply-to address already exists in the To: or Cc: header so will
already be replied to. In this case we want to include the From:
address in our reply, (where, otherwise we would use the Reply-To
address *instead* of the address in the From header).
The feature tested here is that we reply to both the sender and to
others addresses on the To: line of the original message, but that we
don't reply to our own address.
This is the standard support of reply-to, (replying to that address
rather than the from address). It has nothing to do with the proposed
feature for extra-clever handling of a mail from a mailing-list that
has munged the reply-to header.
When reply to a message addresses to an address configured in the
other_email setting in the configuration file, the reply should use
that address in the From header. Test this.
We were sleeping merely to ensure that our updates to the mail store
would result in the mtime of the appropriate directories being
updated. We make the test suite run faster by not sleeping, but
instead explicitly updating the mtime of the directory to a future
time with touch.
We're careful to ensure that the time is not merely in the future
compared to the current time, but also later than any previous update
to the same directory mtime.
This makes the test suite bash-specific, but that's not much of
an issue for me, (if somebody else would prefer some other language
then they can rewrite the test suite and maintain it).
The advantage here is that we'll now be able to easily generate
custom messages for testing operations that depend on the message
content, (such as "notmuch reply", etc.).
This notmuch-test script simply runs a few different notmuch operations,
(things that I found were useful while testing the rename-support code).
It's not useful as a test suite yet, since it doesn't actually check
the results of any operation, (the user of the suite has to know what
the results should be and must manually verify them. So there's no
integration with the build system yet, (no "make test" target).
But I didn't want to lose what I had so far, so here it is.