notmuch/test/T680-html-indexing.sh
David Bremner 77c9ec1fdd test: add known broken test for indexing html
'quite' on IRC reported that notmuch new was grinding to a halt during
initial indexing, and we eventually narrowed the problem down to some
html parts with large embedded images. These cause the number of terms
added to the Xapian database to explode (the first 400 messages
generated 4.6M unique terms), and of course the resulting terms are
not much use for searching.

The second test is sanity check for any "improved" indexing of HTML.
2017-04-20 06:59:40 -03:00

19 lines
581 B
Bash
Executable file

#!/usr/bin/env bash
test_description="indexing of html parts"
. ./test-lib.sh || exit 1
add_email_corpus html
test_begin_subtest 'embedded images should not be indexed'
test_subtest_known_broken
notmuch search kwpza7svrgjzqwi8fhb2msggwtxtwgqcxp4wbqr4wjddstqmeqa7 > OUTPUT
test_expect_equal_file /dev/null OUTPUT
test_begin_subtest 'non tag text should be indexed'
notmuch search hunter2 | notmuch_search_sanitize > OUTPUT
cat <<EOF > EXPECTED
thread:XXX 2009-11-17 [1/1] David Bremner; test html attachment (inbox unread)
EOF
test_expect_equal_file EXPECTED OUTPUT
test_done