These tests were surprisingly simple to write---not much code at all
and most of them worked the first time even with hand-prepared
versions of the expected output.
The previous generate_message function is what's needed when testing
"notmuch new". But after that, we never want to generate a message
without also adding it to the index. So create a new add_message
function with this convenience.
If we had external users of this filter then they might expect some of
these macros to exist. But since this is just internal, that's just
unneeded noise.
With modern MIME attachments, we're already avoiding indexing the
attachments. But for old-school uuencoded data in the mail, we have
been directly indexing the encoded data as terms, (which is not useful
at all---nobody will ever ytry to search based on the seemingly random
uuencoded data).
Additionally, indexing a modestly large uuencoded file seems to make
Xapian go insane, (consuming *lots* of memory).
We fix both problems by detecting uuencoded content and not performing
any indexing of it.
This function detects whether the address in the Reply-To header
already appears in either To or Cc. So give it a name that reflects
what it does (reply_to_header_is_redundant) rather than the old name
which described one possible use of the function, (as a simple
heuristic for detecting whether a mailing list had applied reply-to
munging).
Apparently, GMime doesn't want to create a valid address list object
for an empty string. That's annoying, but it's easy enough to test for
the empty string and avoid the problem.
This change was already recommended in a comment in the original
implementation of this patch. If someone really wants to support
un-munging in the case of To: and Reply-To: having the same address
but different case, then they can provide a portable approach for
that.
This is a test for the recently added feature where we detect that the
reply-to address already exists in the To: or Cc: header so will
already be replied to. In this case we want to include the From:
address in our reply, (where, otherwise we would use the Reply-To
address *instead* of the address in the From header).
Some mailing lists engage in the evil practice of changing the Reply-To
header so that replies from all mailers go to the list by default, at
the expense of not responding to the person who actually sent the
message. When this is detected, we reply to `From' and remove the
duplicate response to the mailing list. Consider a reply to the
following message.
From: Some User <some.user@example.com>
To: Sample users list <sample-users@sample.org>
Reply-To: Sample users list <sample-users@sample.org>
Prior to this patch, `notmuch reply' produces
To: Sample users list <sample-users@sample.org>,
Sample users list <sample-users@sample.org>
and after the patch,
To: Some User <some.user@example.com>,
Sample users list <sample-users@sample.org>
Signed-off-by: Jed Brown <jed@59A2.org>
This code was already duplicated. We move it to a new, shared
add_recipients_from_message function, in preparation for more
sophisticated mailing list logic.
Signed-off-by: Jed Brown <jed@59A2.org>
The feature tested here is that we reply to both the sender and to
others addresses on the To: line of the original message, but that we
don't reply to our own address.
This is the standard support of reply-to, (replying to that address
rather than the from address). It has nothing to do with the proposed
feature for extra-clever handling of a mail from a mailing-list that
has munged the reply-to header.
When reply to a message addresses to an address configured in the
other_email setting in the configuration file, the reply should use
that address in the From header. Test this.
We were sleeping merely to ensure that our updates to the mail store
would result in the mtime of the appropriate directories being
updated. We make the test suite run faster by not sleeping, but
instead explicitly updating the mtime of the directory to a future
time with touch.
We're careful to ensure that the time is not merely in the future
compared to the current time, but also later than any previous update
to the same directory mtime.
This makes the test suite bash-specific, but that's not much of
an issue for me, (if somebody else would prefer some other language
then they can rewrite the test suite and maintain it).
The advantage here is that we'll now be able to easily generate
custom messages for testing operations that depend on the message
content, (such as "notmuch reply", etc.).
This notmuch-test script simply runs a few different notmuch operations,
(things that I found were useful while testing the rename-support code).
It's not useful as a test suite yet, since it doesn't actually check
the results of any operation, (the user of the suite has to know what
the results should be and must manually verify them. So there's no
integration with the build system yet, (no "make test" target).
But I didn't want to lose what I had so far, so here it is.
Add an install target that uses desktop-file-install to install the
desktop file in the appropriate location. The location of the install
can be modified by changing the desktop_dir variable.
Signed-off-by: Jeffrey C. Ollie <jeff@ocjtech.us>
As Keith pointed out, (with a humorous citation from Mark Twain),
the two uses of "very" added nothing to the description. Also,
"large collection of email" was repeated uselessly.
Such as reiserfs or xfs. This has been broken since the merge of
support for rename and deletion of files from the mail store.
Here's the original justification for the patch:
A review of notmuch-new.c shows three uses of ->d_type:
Near line 153, in _entries_resemble_maildir() we can simply allow for
DT_UNKNOWN. This would fail if people have MH-style folders which have
three folders called "new" "cur" and "tmp", but that seems unlikely, in
which case the "tmp" folder would simply not be scanned.
Near line 273 in add_files_recursive() we have another check. If
DT_UNKNOWN, we fall through, then add_files_recursive() does a stat
almost immediately, returning with success if the path isn't a
directory.
Thus, the fallback is already written.
Finally, near line 343, in add_files_recursive() (a long function) we
have another check. Here we can simply treat DT_UNKNOWN as DT_LNK, since
the logic for the stat() results are the same.
According to the Debian zsh maintainer Clint Adams, this is the first
time that a package installs its own completer into zsh. Part of the
reason this is not usually done is because zsh does not provide a stable
API.
We agreed to try it, given that notmuch is expected to change quite
a bit initially. If there are problems or the completer goes stable,
we'll move it into the upstream zsh repository.
Signed-off-by: martin f. krafft <madduck@debian.org>
Previously we were printing a number of messages upgraded so far. The
original motivation for this was to accurately reflect the fact that
there are two passes, (so each message is processed twice and it's not
accurate to represent with a single count). But as it turns out, the
second pass takes zero time (relatively speaking) so we're still not
accounting for it.
If nothing else, the percentage-based reporting makes for a cleaner
API for the progress_notify function.
The WDF is the "within-document frequency" value for a particular
term. It's intended to provide an indication of how frequent a term is
within a document, (for use in computing relevance). Xapian's term
generator already computes WDF values when we use that, (which we do
for indexing all mail content).
We don't use the term generator when adding single terms for things
that don't actually appear in the mail document, (such as tags, the
filename, etc.). In this case, the WDF value for these terms doesn't
matter much.
But Xapian's flint backend can be more efficient with changes to terms
that don't affect the document "length". So there's a performance
advantage for manipulating tags (with the flint backend) if the WDF of
these terms is 0.
All notmuch searches currently sort by value (either date or message
ID) so it's just wasted effort for Xapian to compute relevance values
for each result. We now explicitly tell Xapian that we're uninterested
in the relevance values.
The first phase copies data from the old format to the new format
without deleting anything. This allows an old notmuch to still use the
database if the upgrade process gets interrupted. The second phase
performs the deletion (after updating the database version number). If
the second phase is interrupted, there will be some unused data in the
database, but it shouldn't cause any actual harm.
Our signal handler is designed to quickly flush out changes and then
exit. But if a database upgrade is in progress when the user
interrupts, then we just want to immediately abort. We could do
something fancy like add a return value to our progress_notify
function to allow it to tell the upgrade process to abort. But it's
actually much cleaner and robust to delay the installation of our
signal handler so that the default abort happens on SIGINT.
This takes advantage of the recently added library support to detect
if the database needs to be upgraded and then automatically performs
that upgrade, (with a nice progress report).
The recent support for renames in the database is our first time
(since notmuch has had more than a single user) that we have a
database format change. To support smooth upgrades we now encode a
database format version number in the Xapian metadata.
Going forward notmuch will emit a warning if used to read from a
database with a newer version than it natively supports, and will
refuse to write to a database with a newer version.
The library also provides functions to query the database format
version:
notmuch_database_get_version
to ask if notmuch wants a newer version than that:
notmuch_database_needs_upgrade
and a function to actually perform that upgrade:
notmuch_database_upgrade
Previously, when notmuch detected that a directory had been deleted it
was only removing files immediately in that directory. We now
correctly recurse to also remove any directories (and files, etc.)
within sub-directories, etc.
Previously we had NOTMUCH_DATABASE_MODE_READ_ONLY but
NOTMUCH_STATUS_READONLY_DATABASE which was ugly and confusing. Rename
the latter to NOTMUCH_STATUS_READ_ONLY_DATABASE for consistency.
Previously, many checks were deep in the library just before a cast
operation. These have now been replaced with internal errors and new
checks have instead been added at the beginning of all top-levelentry
points requiring a read-write database.
The new checks now also use a single function for checking and
printing the error message. This will give us a convenient location to
extend the check, (such as based on database version as well).
The original wording made it sound like this function was just doing
some string manipulation. But this function actually creates new
directory documents as a side effect. So make that explicit in its
documentation.
When a notmuch database is upgraded to the new database format, (to
support file rename and deletion), any message documents corresponding
to deleted files will not currently be upgraded. This means that a
search matching these documents will find no filenames in the expected
place.
Go ahead and return the filename as originally stored, (rather than
aborting with an internal error), in this case.
When we know that we are adding a new directory to the database, (and
we therefore are using inode rather than strcmp-based sorting of the
filenames), then we *never* want to see any names from the
database. If we get any names that could only make us inadvertently
remove files that we just added.
Since it's not obvious from the Xapian documentation whether new terms
being added as part of new documents will appear in the in-progress
all-terms iteration we are using, (and this might differ based on
Xapian backend and also might differ based on how many new directories
are added and whether a flush threshold is reached).
For all of these reasons, we play it safe and use NULL rather than a
real notmuch_filenames_t iterator in this case to avoid any problem.
The bug here was that we would see that the database did not know
anything about a directory so would get results from the filesystem in
inode rather than strcmp order.
However, we wouldn't actually ask for the list of files from the
database until after recursing into the sub-directories. So by the
time we traverse the filenames looking for deletions, the database
*does* have entries and we end up detecting erroneous deletions
because our filename list from the filesystem isn't in strcmp order.
So ask for the list of names from the database before doing any
additions to avoid this problem.
Previously we only scanned the list of filenames in the filesystem and
detected a deletion whenever that scan skipped a name that existed in
the database. That much was fine, but we *also* need to continue
walking the list of names from the database when the filesystem list
is exhausted.
Without this, removing the last file or directory within any
particular directory would go undetected.
As described in the previous commit message, we introduced multiple
symlink-based regressions in commit
3df737bc4addfce71c647792ee668725e5221a98
Here, we fix the case of symlinks to regular files by doing an extra
stat of any DT_LNK files to determine if they do, in fact, link to
regular files.
In commit 3df737bc4addfce71c647792ee668725e5221a98 we switched from
using stat() to using the d_type field in the result of scandir() to
determine whether a filename is a regular file or a directory. This
change introduced a regression in that the recursion would no longer
traverse through a symlink to a directory. (Since stat() would resolve
the symlink but with scandir() we see a distinct DT_LNK value in
d_type).
We fix this for directories by allowing both DT_DIR and DT_LNK values
to recurse, and then downgrading the existing not-a-directory check
within the recursion to not be an error. We also add a new
not-a-directory check outside the recursion that is an error.