Commit graph

156 commits

Author SHA1 Message Date
Pieter Praet
97216b3cb8 cli: notmuch new: optionally output debug information when ignoring files/directories
When running 'notmuch new' with the '--debug' option, output debug
information regarding explicitly ignored files and directories.
2012-10-20 17:28:19 -03:00
Pieter Praet
12d328a597 cli: add '--debug' option to 'notmuch new'
This will be used in later patches to test the 'new.ignore'
config option more thoroughly.
2012-10-20 17:27:58 -03:00
Austin Clements
4ca36441a8 new: Unify add_files and add_files_recursive
Since starting at the top of a directory tree and recursing within
that tree are now identical operations, there's no need for both
add_files and add_files_recursive.  This eliminates add_files (which
did nothing more than call add_files_recursive after the previous
patch) and renames add_files_recursive to add_files.
2012-05-24 21:53:38 -03:00
Austin Clements
da170ee657 new: Merge error checks from add_files and add_files_recursive
Previously, add_files_recursive could have been called on a symlink to
a non-directory.  Hence, calling it on a non-directory was not an
error, so a separate function, add_files, existed to fail loudly in
situations where the path had to be a directory.

With the new stat-ing logic, add_files_recursive is always called on
directories, so the separation of this logic is no longer necessary.
Hence, this patch moves the strict error checking previously done by
add_files into add_files_recursive.
2012-05-24 21:53:19 -03:00
Austin Clements
d99270c450 new: Centralize file type stat-ing logic
This moves our logic to get a file's type into one function.  This has
several benefits: we can support OSes and file systems that do not
provide dirent.d_type or always return DT_UNKNOWN, complex
symlink-handling logic has been replaced by a simple stat fall-through
in one place, and the error message for un-stat-able file is more
accurate (previously, the error always mentioned directories, even
though a broken symlink is not a directory).
2012-05-24 21:53:08 -03:00
Austin Clements
3f3c446c40 new: Remove workaround for detecting newly created directory objects
Previously, notmuch_database_get_directory did not indicate whether or
not the returned directory object was newly created, which required a
workaround to distinguish newly created directory objects with no
child messages from directory objects that had no mtime set but did
have child messages.  Now that notmuch_database_get_directory
distinguishes whether or not the directory object exists in the
database, this workaround is no longer necessary.
2012-05-23 22:31:10 -03:00
Austin Clements
7199d22f43 lib/cli: Make notmuch_database_get_directory return a status code
Previously, notmuch_database_get_directory had no way to indicate how
it had failed.  This changes its prototype to return a status code and
set an out-argument to the retrieved directory, like similar functions
in the library API.  This does *not* change its currently broken
behavior of creating directory objects when they don't exist, but it
does document it and paves the way for fixing this.  Also, it can now
check for a read-only database and return
NOTMUCH_STATUS_READ_ONLY_DATABASE instead of crashing.

In the interest of atomicity, this also updates calls from the CLI so
that notmuch still compiles.
2012-05-15 08:56:33 -03:00
Austin Clements
ba57294218 lib/cli: Make notmuch_database_create return a status code
This is the notmuch_database_create equivalent of the previous change.

In this case, there were places where errors were not being propagated
correctly in notmuch_database_create or in calls to it.  These have
been fixed, using the new status value.
2012-05-05 10:12:26 -03:00
Austin Clements
5fddc07dc3 lib/cli: Make notmuch_database_open return a status code
It has been a long-standing issue that notmuch_database_open doesn't
return any indication of why it failed.  This patch changes its
prototype to return a notmuch_status_t and set an out-argument to the
database itself, like other functions that return both a status and an
object.

In the interest of atomicity, this also updates every use in the CLI
so that notmuch still compiles.  Since this patch does not update the
bindings, the Python bindings test fails.
2012-05-05 10:11:57 -03:00
Justus Winter
6f7469f547 Use notmuch_database_destroy instead of notmuch_database_close
Adapt the notmuch binaries source to the notmuch_database_close split.

Signed-off-by: Justus Winter <4winter@informatik.uni-hamburg.de>
2012-04-28 09:27:33 -03:00
Austin Clements
2e7b649404 new: Fix missing end_atomic in remove_filename on error
Previously, if we failed to find the message by filename in
remove_filename, we would return immediately from the function without
ending its atomic block.  Now this code follows the usual goto DONE
idiom to perform cleanup.
2012-04-24 23:25:52 -03:00
Austin Clements
746fef0aea new: Print final fatal error message to stderr
This was going to stdout.  I removed the newline at the beginning of
printing the fatal error message because it wouldn't make sense if you
were only looking at the stderr stream (e.g., you had redirected
stdout to /dev/null).
2012-04-24 23:25:52 -03:00
Austin Clements
d3b5533123 new: Handle fatal errors in remove_filename and _remove_directory
Previously such errors were simply ignored.  Now they cause an
immediate cleanup and abort.
2012-04-24 23:25:51 -03:00
Austin Clements
e075ee37c9 new: Consistently treat fatal errors as fatal
Previously, fatal errors in add_files_recursive were not treated as
fatal by its callers (including itself!).  This makes
add_files_recursive errors consistently fatal and updates all callers
to treat them as fatal.
2012-04-24 23:25:51 -03:00
Tomi Ollila
ce1e720de6 add support for user-specified files & directories to ignore
A new configuration key 'new.ignore' is used to determine which
files and directories user wants not to be scanned as new mails.

Mark the corresponding test as no longer broken.
This work merges my previous attempts and Andreas Amann's work
in id:"ylp7hi23mw8.fsf@tyndall.ie"
2012-02-17 08:04:34 -04:00
Ethan Glasser-Camp
5f39979a4a Free the results of scandir()
scandir() returns "strings allocated via malloc(3)" which are then
"collected in array namelist which is allocated via
malloc(3)". Currently we just free the array namelist. Instead, free
all the entries of namelist, and then free namelist.

entry only points to elements of namelist, so we don't free it
separately.
2012-02-14 23:44:30 -04:00
Austin Clements
a9a9e374e2 Silence buildbot warnings about unused results
This ignores the results of the two writes in sigint handlers even
harder than before.

While my libc lacks the declarations that trigger these warnings, this
can be tested by adding the following to notmuch.h:

__attribute__((warn_unused_result))
ssize_t write(int fd, const void *buf, size_t count);
2012-01-21 08:49:50 -04:00
David Edmondson
77ec8108a1 notmuch: Quiet buildbot warnings.
Cast away the result of various *write functions. Provide a default
value for some variables to avoid "use before set" warnings.
2011-12-21 07:32:16 -04:00
Jani Nikula
69bb7f35b6 cli: add support for pre and post notmuch new hooks
Run notmuch new pre and post hooks, named "pre-new" and "post-new", if
present in the notmuch hooks directory. The hooks will be run before and
after incorporating new messages to the database.

Typical use cases for pre-new and post-new hooks are fetching or delivering
new mail to the maildir, and custom tagging of the mail incorporated to the
database.

Also add command line option --no-hooks to notmuch new to bypass the hooks.

Signed-off-by: Jani Nikula <jani@nikula.org>
2011-12-11 13:58:15 -04:00
David Bremner
61f0a5b8ee cli: change argument parsing convention for subcommands
previously we deleted the subcommand name from argv before passing to
the subcommand. In this version, the deletion is done in the actual
subcommands. Although this causes some duplication of code, it allows
us to be more flexible about how we parse command line arguments in
the subcommand, including possibly using off-the-shelf routines like
getopt_long that expect the name of the command in argv[0].
2011-10-22 19:42:54 -03:00
Ali Polatel
02a3076711 lib: make find_message{,by_filename) report errors
Previously, the functions notmuch_database_find_message() and
notmuch_database_find_message_by_filename() functions did not properly
report error condition to the library user.

For more information, read the thread on the notmuch mailing list
starting with my mail "id:871uv2unfd.fsf@gmail.com"

Make these functions accept a pointer to 'notmuch_message_t' as argument
and return notmuch_status_t which may be used to check for any error
condition.

restore: Modify for the new notmuch_database_find_message()
new: Modify for the new notmuch_database_find_message_by_filename()
2011-10-04 07:55:29 +03:00
Austin Clements
bff30540d8 new: Wrap adding and removing messages in atomic sections.
This addresses atomicity of tag synchronization, the last atomicity
problems in notmuch new.  Each message add or remove is wrapped in its
own atomic section, so interrupting notmuch new doesn't lose progress.
2011-09-24 20:00:29 -03:00
Austin Clements
8305f0aac7 new: Synchronize maildir flags eagerly.
Because flag synchronization is stateless, it can be performed at any
time as long as it's guaranteed to be performed after any change to a
message's filename list.  Take advantage of this to synchronize tags
immediately after a filename is added or removed.

This does not yet make adding or removing a message atomic, but it is
a big step toward atomicity because it reduces the window where the
database tags are inconsistent from nearly the entire notmuch-new to
just around when the message is added or removed.
2011-09-24 20:00:28 -03:00
Austin Clements
191c4ae693 new: Cleanup. De-duplicate file name removal code.
Previously, file name removal was implemented identically in two
places.  Now it's captured in one function.

This is important because file name removal is about to get slightly
more complicated with eager tag synchronization and correct removal
atomicity.
2011-09-24 20:00:28 -03:00
Austin Clements
1353dbe864 new: Cleanup. Put removed/renamed message count in add_files_state_t.
Previously, pointers to these variables were passed around
individually.  This was okay when only one function needed them, but
we're about to need them in a few more places.
2011-09-24 20:00:28 -03:00
Austin Clements
e59cc0031f lib: Add support for nested atomic sections.
notmuch_database_t now keeps a nesting count and we only start a
transaction or commit for the outermost atomic section.

Introduces a new error, NOTMUCH_STATUS_UNBALANCED_ATOMIC.
2011-09-23 21:50:38 -04:00
Austin Clements
fcd433709e new: Defer updating directory mtimes until the end.
Previously, if notmuch new were interrupted between updating the
directory mtime and handling removals from that directory, a
subsequent notmuch new would not handle those removals until something
else changed in that directory.  This defers recording the updated
mtime until after removals are handled to eliminate this problem.
2011-09-23 21:50:38 -04:00
Austin Clements
bdaee77e1b new: Don't lose messages on SIGINT.
Previously, message removals were always performed, even after a
SIGINT.  As a result, when a message was moved from one folder to
another, a SIGINT between processing the directory the message was
removed from and processing the directory it was added to would result
in notmuch removing that message from the database.
2011-09-13 22:00:15 -03:00
Austin Clements
bb2b33fbb8 new: Improved workaround for mistaken new directories
Currently, notmuch new assumes any directory with a database mtime of
0 is new, but we don't set the mtime until after processing messages
and subdirectories in that directory.  Hence, anything that prevents
the mtime update (such as an interruption or the wall-clock logic
introduced in 8c39e8d6) will cause the next notmuch new to think the
directory is still new.

We work around this by setting the new directory's database mtime to
-1 before scanning anything in the new directory.  This also obviates
the need for the workaround used in 8c39e8d6.
2011-06-29 16:10:41 -07:00
Austin Clements
8c39e8d6fb new: Don't update DB mtime if FS mtime equals wall-clock time.
This fixes a race where multiple message deliveries in the same second
with an intervening notmuch new could result in messages being ignored
by notmuch (at least, until a later delivery forced a rescan).
Because mtimes only have second granularity, later deliveries in the
same second won't change the directory mtime, and hence won't trigger
notmuch new to rescan the directory.  This situation can only occur
when notmuch new is being run at the same second as the directory's
modification time, so simply don't update the saved mtime in this
case.

This very race happens all over the test suite, and is currently
compensated for with increment_mtime (and, occasionally, luck).  With
this change, increment_mtime becomes unnecessary.
2011-06-29 15:26:04 -07:00
Pieter Praet
8bb6f7869c fix sum moar typos [comments in source code]
Various typo fixes in comments within the source code.

Signed-off-by: Pieter Praet <pieter@praet.org>

Edited-by: Carl Worth <cworth@cworth.org> Restricted to just
source-code comments, (and fixed fix of "descriptios" to "descriptors"
rather than "descriptions").
2011-06-23 15:58:39 -07:00
Carl Worth
2f3a76c569 Remove some variables which were set but not used.
gcc (at least as of version 4.6.0) is kind enough to point these out to us,
(when given -Wunused-but-set-variable explicitly or implicitly via -Wunused
or -Wall).

One of these cases was a legitimately unused variable. Two were simply
variables (named ignored) we were assigning only to squelch a warning about
unused function return values. I don't seem to be getting those warnings
even without setting the ignored variable. And the gcc docs. say that the
correct way to squelch that warning is with a cast to (void) anyway.
2011-05-11 13:27:14 -07:00
Carl Worth
61d4d89572 new: Update comments for add_files_recursive
The most recent commit optimized the implementation of this
function. This commit simply updates the relevant comments to match
the new implementation.
2011-03-10 11:56:16 -08:00
Karel Zak
b0006b6ea2 new: read db_files and db_subdirs only if mtime changed
The db_files and db_subdirs are unnecessary for unchanged directories.

maildir with 10000 e-mails:

old version:
	$ time ./notmuch new
	No new mail.

	real    0m0.053s
	user    0m0.028s
	sys     0m0.026s

new version:
	$ time ./notmuch new
	No new mail.

	real    0m0.032s
	user    0m0.009s
	sys     0m0.023s

Signed-off-by: Karel Zak <kzak@redhat.com>

Reviewed-by:  Austin Clements <amdragon@mit.edu>

Looks good (faster than, but provably equivalent to the original code!
notmuch_directory_get_child_* are side-effect free,
db_files/db_subdirs aren't used between where they were set in the old
code and where they are set in the new code, and db_files/db_subdirs
are initialized to NULL when declared).

Another timing data point:
Old code: ./notmuch new  0.77s user 0.28s system 99% cpu 1.051 total
New code: ./notmuch new  0.09s user 0.27s system 98% cpu 0.368 total
2011-03-10 11:48:33 -08:00
Michal Sojka
c58523088a new: Print progress estimates only when we have sufficient information
Without this patch, it might happen that the remaining time or processing
rate were calculated just after start where nothing was processed yet.
This resulted into division by a very small number (or zero) and the
printed information was of little value.

Instead of printing nonsenses we print only that the operation is in
progress. The estimates will be printed later, after there is enough data.
2011-01-26 23:47:51 +10:00
Michal Sojka
90a505373e new: Enhance progress reporting
notmuch new reports progress only during the "first" phase when the
files on disk are traversed and indexed. After this phase, other
operations like rename detection and maildir flags synchronization are
performed, but the user is not informed about them. Since these
operations can take significant time, we want to inform the user about
them.

This patch enhances the progress reporting facility that was already
present. The timer that triggers reporting is not stopped after the
first phase but continues to run until all operations are finished. The
rename detection and maildir flag synchronization are enhanced to report
their progress.
2011-01-26 22:10:11 +10:00
Michal Sojka
7c450905e4 new: Add all initial tags at once
If there are several tags applied to the new messages, it is beneficial
to store them to the database at one, because it saves some time,
especially when the notmuch new is run for the first time.

This patch decreased the time for initial import from 1h 35m to 1h 14m.
2011-01-26 22:05:28 +10:00
Austin Clements
de2acbd49c Do not defer maildir flag synchronization for new messages
This is a simplified version of a patch originally by Michal Sojka
<sojkam1@fel.cvut.cz> which is designed to have the same performance
benefits. Michal said the following:

  When notmuch new is run for the first time, it is not necessary to
  defer maildir flags synchronization to later because we already know
  that no files will be removed.

  Performing the maildinr flag synchronization immediately after the
  message is added to the database has the advantage that the message
  is likely hot in the disk cache so the synchronization is faster.
  Additionally, we also save one database query for each message,
  which must be performed when the operation is deferred.

  Without this patch, the first notmuch new of 200k messages (3 GB)
  took 1h and 46m out of which 20m was maildir flags
  synchronization. With this patch, the whole operation took only 1h
  and 36m.

Unlike Michal's patch, this version does the deferral for any new
message, rather than doing it only on the first run of "notmuch new".
2011-01-26 21:52:54 +10:00
Carl Worth
73198f5c74 notmuch new: Scan directory whenever fs mtime is not equal to db mtime
Previously, we would only scan a directory if the filesystem
modification time was strictly newer than the database modification
time for the directory. This would cause a problem for systems with an
unstable clock, (if a new mail was added to the filesystem, then the
system clock rolled backward, "notmuch new" would not find the message
until the clock caught up and the directory was modified again).

Now, we always scan the directory if the modification time of the
directory is not exactly the same between the filesystem and the
database. This avoids the problem described above even with an
unstable system clock.
2010-12-05 01:40:16 -08:00
Carl Worth
38d82b07c4 notmuch new: Defer maildir_flags synchronization until after removals
When a file in the mailstore is renamed, this appears to "notmuch new"
as both an added file and a removed file (for the same message). We
want the synchronization of the maildir_flags to reflect the final
state, (after the rename is complete). Therefore, it's incorrect to
perform the synchronization immediately after adding a new
file. Instead we queue up these synchronizations (by message ID[*])
and perform them after the removals are complete.

With this change, the "dump/restore" case of the maildir-sync tests,
as well as the recent "remove 'S'" case both now pass where they were
failing before.

Interestingly, the "remove info" test was passing before, but now
fails. This is actually due to a separate bug, (and the bug just fixed
was masking it, by preventing the test from performing as desired).

[*] It's important to queue by message ID---queueing actual message
objects does not work since the message objects will retain stale data
such as the old filenames.
2010-11-11 03:40:19 -08:00
Carl Worth
bb74e9dff8 lib: Rework interface for maildir_flags synchronization
Instead of having an API for setting a library-wide flag for
synchronization (notmuch_database_set_maildir_sync) we instead
implement maildir synchronization with two new library functions:

	notmuch_message_maildir_flags_to_tags
  and   notmuch_message_tags_to_maildir_flags

These functions are nicely documented here, (though the implementation
does not quite match the documentation yet---as plainly evidenced by
the current results of the test suite).
2010-11-11 03:40:19 -08:00
Carl Worth
4cfb2a0277 Avoid abbreviation, preferring notmuch_config_get_maildir_synchronize_flags
Since the name of the configuration parameter here is:

	maildir.synchronize_flags

the convention is that the functions to get and set this parameter
should match it in name. Hence:

       notmuch_config_get_maildir_synchronize_flags

etc. (as opposed to notmuch_config_get_maildir_sync).
2010-11-11 03:40:19 -08:00
Michal Sojka
d9d3d3e6f0 Make maildir synchronization configurable
This adds group [maildir] and key 'synchronize_flags' to the
configuration file. Its value enables (true) or diables (false) the
synchronization between notmuch tags and maildir flags. By default,
the synchronization is disabled.
2010-11-10 13:09:32 -08:00
Michal Sojka
088801a14a Maildir synchronization
This patch allows bi-directional synchronization between maildir
flags and certain tags. The flag-to-tag mapping is defined by flag2tag
array.

The synchronization works this way:

1) Whenever notmuch new is executed, the following happens:
   o New messages are tagged with configured new_tags.
   o For new or renamed messages with maildir info present in the file
     name, the tags defined in flag2tag are either added or removed
     depending on the flags from the file name.

2) Whenever notmuch tag (or notmuch restore) is executed, a new set of
   flags based on the tags is constructed for every message and a new
   file name is prepared based on the old file name but with the new
   flags. If the flags differs and the old message was in 'new'
   directory then this is replaced with 'cur' in the new file name. If
   the new and old file names differ, the file is renamed and notmuch
   database is updated accordingly.

   The rename happens before the database is updated. In case of crash
   between rename and database update, the next run of notmuch new
   brings the database in sync with the mail store again.
2010-11-10 13:09:31 -08:00
Carl Worth
65d278afb1 Sprinkle some const-correctness around new_tags.
To eliminate a compiler warning.
2010-04-23 09:19:52 -07:00
Ben Gamari
143d436874 notmuch-config: make new message tags configurable
Add a new_tags option in the [messages] section of the configuration
file to allow the user to specify which tags should be added to new
messages by notmuch new.
2010-04-23 08:41:59 -07:00
Michal Sojka
4234215263 Prevent data loss caused by SIGINT during notmuch new
When Ctrl-C is pressed in a wrong time during notmuch new, it can lead
to removal of messages from the database even if the files were not
removed.

It happened at least once to me.

Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
2010-04-13 08:44:34 -07:00
Carl Worth
4e5d2f22db lib: Rename iterator functions to prepare for reverse iteration.
We rename 'has_more' to 'valid' so that it can function whether
iterating in a forward or reverse direction. We also rename
'advance' to 'move_to_next' to setup parallel naming with
the proposed functions 'move_to_first', 'move_to_last', and
'move_to_previous'.
2010-03-09 09:22:29 -08:00
Carl Worth
c25bc03dc6 Fix misspelling of DT_UNKNOWN.
How foolish of me to advertise the fact that I pushed a commit without
compiling it first...
2010-01-23 22:45:23 +13:00
Carl Worth
344c48a47d Add some comments to document the recently-fixed handling of d_type.
The fix was subtle, (requiring less code than originally expected), so
it behooves us to document it well.
2010-01-23 18:58:30 +13:00
Geo Carncross
c5416b6f1b notmuch new: Fix to work on filesystems returning DT_UNKNOWN
Such as reiserfs or xfs. This has been broken since the merge of
support for rename and deletion of files from the mail store.

Here's the original justification for the patch:

A review of notmuch-new.c shows three uses of ->d_type:

Near line 153, in _entries_resemble_maildir() we can simply allow for
DT_UNKNOWN. This would fail if people have MH-style folders which have
three folders called "new" "cur" and "tmp", but that seems unlikely, in
which case the "tmp" folder would simply not be scanned.

Near line 273 in add_files_recursive() we have another check. If
DT_UNKNOWN, we fall through, then add_files_recursive() does a stat
almost immediately, returning with success if the path isn't a
directory.

Thus, the fallback is already written.

Finally, near line 343, in add_files_recursive() (a long function) we
have another check. Here we can simply treat DT_UNKNOWN as DT_LNK, since
the logic for the stat() results are the same.
2010-01-23 18:52:30 +13:00
Carl Worth
c340c1bd11 notmuch new: Print upgrade progress report as a percentage.
Previously we were printing a number of messages upgraded so far. The
original motivation for this was to accurately reflect the fact that
there are two passes, (so each message is processed twice and it's not
accurate to represent with a single count). But as it turns out, the
second pass takes zero time (relatively speaking) so we're still not
accounting for it.

If nothing else, the percentage-based reporting makes for a cleaner
API for the progress_notify function.
2010-01-09 17:38:23 -08:00
Carl Worth
c485c51585 notmuch new: Don't prevent database upgrade from being interrupted.
Our signal handler is designed to quickly flush out changes and then
exit. But if a database upgrade is in progress when the user
interrupts, then we just want to immediately abort. We could do
something fancy like add a return value to our progress_notify
function to allow it to tell the upgrade process to abort. But it's
actually much cleaner and robust to delay the installation of our
signal handler so that the default abort happens on SIGINT.
2010-01-08 08:45:16 -08:00
Carl Worth
e307e990c9 notmuch new: Automatically upgrade the database if necessary.
This takes advantage of the recently added library support to detect
if the database needs to be upgraded and then automatically performs
that upgrade, (with a nice progress report).
2010-01-07 18:30:32 -08:00
Carl Worth
21f8fd6967 notmuch new: Fix deletion support to recurse on removed directories.
Previously, when notmuch detected that a directory had been deleted it
was only removing files immediately in that directory. We now
correctly recurse to also remove any directories (and files, etc.)
within sub-directories, etc.
2010-01-07 18:20:28 -08:00
Carl Worth
807aef93d3 Prefer READ_ONLY consistently over READONLY.
Previously we had NOTMUCH_DATABASE_MODE_READ_ONLY but
NOTMUCH_STATUS_READONLY_DATABASE which was ugly and confusing. Rename
the latter to NOTMUCH_STATUS_READ_ONLY_DATABASE for consistency.
2010-01-07 10:29:05 -08:00
Carl Worth
1a38cb841c notmuch new: Never ask the database for any names from a new directory.
When we know that we are adding a new directory to the database, (and
we therefore are using inode rather than strcmp-based sorting of the
filenames), then we *never* want to see any names from the
database. If we get any names that could only make us inadvertently
remove files that we just added.

Since it's not obvious from the Xapian documentation whether new terms
being added as part of new documents will appear in the in-progress
all-terms iteration we are using, (and this might differ based on
Xapian backend and also might differ based on how many new directories
are added and whether a flush threshold is reached).

For all of these reasons, we play it safe and use NULL rather than a
real notmuch_filenames_t iterator in this case to avoid any problem.
2010-01-06 14:35:56 -08:00
Carl Worth
7d8271dd9d notmuch new: Fix bug resulting in file removal on initial build of database.
The bug here was that we would see that the database did not know
anything about a directory so would get results from the filesystem in
inode rather than strcmp order.

However, we wouldn't actually ask for the list of files from the
database until after recursing into the sub-directories. So by the
time we traverse the filenames looking for deletions, the database
*does* have entries and we end up detecting erroneous deletions
because our filename list from the filesystem isn't in strcmp order.

So ask for the list of names from the database before doing any
additions to avoid this problem.
2010-01-06 13:54:39 -08:00
Carl Worth
59c09623c8 notmuch new: Fix to detect deletions of names at the end of the list.
Previously we only scanned the list of filenames in the filesystem and
detected a deletion whenever that scan skipped a name that existed in
the database. That much was fine, but we *also* need to continue
walking the list of names from the database when the filesystem list
is exhausted.

Without this, removing the last file or directory within any
particular directory would go undetected.
2010-01-06 13:26:47 -08:00
Carl Worth
39e81ca431 notmuch new: Fix regression preventing addition of symlinked mail files.
As described in the previous commit message, we introduced multiple
symlink-based regressions in commit
3df737bc4addfce71c647792ee668725e5221a98

Here, we fix the case of symlinks to regular files by doing an extra
stat of any DT_LNK files to determine if they do, in fact, link to
regular files.
2010-01-06 10:48:43 -08:00
Carl Worth
49f09958df notmuch new: Fix regression preventing recursion through symlinks.
In commit 3df737bc4addfce71c647792ee668725e5221a98 we switched from
using stat() to using the d_type field in the result of scandir() to
determine whether a filename is a regular file or a directory. This
change introduced a regression in that the recursion would no longer
traverse through a symlink to a directory. (Since stat() would resolve
the symlink but with scandir() we see a distinct DT_LNK value in
d_type).

We fix this for directories by allowing both DT_DIR and DT_LNK values
to recurse, and then downgrading the existing not-a-directory check
within the recursion to not be an error. We also add a new
not-a-directory check outside the recursion that is an error.
2010-01-06 10:32:06 -08:00
Carl Worth
bd72d95bac Fix typo in comment.
The difference between "now" and "not" ends up being fairly dramatic.
2010-01-06 10:32:06 -08:00
Carl Worth
9d4d7963a1 notmuch new: Print counts of deleted and renamed messages.
It's nice to be able to see a report indicating that the recently
added support for detecting file rename and deletion is working.
2010-01-06 10:32:06 -08:00
Carl Worth
3fa2385f7c notmuch new: Proper support for renamed and deleted files.
The "notmuch new" command will now efficiently notice if any files or
directories have been removed from the mail store and will
appropriately update its database.

Any given mail message (as determined by the message ID) may have
multiple corresponding filenames, and notmuch will return one of
them. When a filen is deleted, the corresponding filename will be
removed from the message in the database. When the last filename is
removed from a message, that message will be entirely removed from the
database.

All file additions are handled before any file removals so that rename
is supported properly.
2010-01-06 10:32:06 -08:00
Carl Worth
2e96464f97 notmuch new: Store detected removed filenames for later processing.
It is essential to defer the actual removal of any filenames from the
database until we are entirely done adding any new files. This is to
avoid any information loss from the database in the case of a renamed
file or directory.

Note that we're *still* not actually doing any removal---still just
printing messages indicating the filenames that were detected as
removed. But we're at least now printing those messages at a time when
we actually *can* do the actual removal.
2010-01-06 10:32:06 -08:00
Carl Worth
03d5175001 notmuch new: Detect deleted (renamed) files and directories.
This takes advantage of the notmuch_directory_t interfaces added
recently (with cooresponding storage of directory documents in the
database) to detect when files or entire directories are deleted or
renamed within the mail store.

This also fixes the recent regression where *all* files would be
processed by every run of "notmuch new", (now only new files are
processed once again).

The deleted files and directories are only detected so far. They
aren't properly removed from the database.
2010-01-06 10:32:06 -08:00
Carl Worth
2a98b1d487 add_files_recursive: Make the maildir detection more efficient.
Previously, we were re-scanning the entire list of entries for every
directory entry. Instead, we can simply check if the entries look like
a maildir once, up-front.
2010-01-06 10:32:06 -08:00
Carl Worth
28ce73848d add_files_recursive: Separate scanning for directories and files for legibility.
We now do two scans over the entries returned from scandir. The first
scan is looking for directories (and making the recursive call). The
second scan is looking for new files to add to the database.

This is easier to read than the previous code which had a single loop
and some if statements with ridiculously long bodies. It also has the
advantage that once the directory scan is complete we can do a single
comparison of the filesystem and database mtimes and entirely skip the
second scan if it's not needed.
2010-01-06 10:32:06 -08:00
Carl Worth
6f05dd8a8c add_files_recursive: Use consistent naming for array and count variables.
Previously we had an array named "namelist" and its count named
"num_entries". We now use an array name of "fs_entries" and a count
named "num_fs_entries" to try to preserve sanity.
2010-01-06 10:32:06 -08:00
Carl Worth
2c4555f1a5 notmuch new: Remove an unnecessary stat of every regular file in the mail store.
We were previousl using the stat for two reasons. One was to obtain
the mtime of the file. This usage was removed in the previous commit,
(since the mtime is unreliable in the case of a file being moved into
the mail store).

The second reason was to identify regular and directory file
types. But this information is already available in the result we get
from scandir.

What's left is simply a stat for each directory in the mailstore,
(which we are still using to compare filesystem mtime with the mtime
stored in the database).
2010-01-06 10:32:06 -08:00
Carl Worth
dde214c768 notmuch new: Eliminate the check on the mtime of regular files before adding.
This check was buggy in that moving a pre-existing file into the mail
store, (where the file existed before the last run of "notmuch new"),
does not update the mtime of the file. So the message would never be
added to the database.

The fix here is not practical in the long run, (since it causes *all*
files in the mail store to be processed in every run of "notmuch new"
(!)). But this change will let us drop a stat() call that we don't
otherwise need and will help move us toward proper database-backed
detection of new files, (which will fix the bug without the
performance impact of the current fix).
2010-01-06 10:32:06 -08:00
Carl Worth
2ce46c31fe notmuch new: Fix internal documentation of add_files_recursive.
To make it more clear that the mtime of a directory does not affect
whether further sub-directories are examined, (they are examined
unconditionally).
2010-01-06 10:32:06 -08:00
Carl Worth
3fb7ee7754 notmuch new: Rename the various timestamp variables to be more clear.
The previous name of "path_mtime" was very ambiguous. The new names
are much more obvious (fs_mtime is the mtime from the filesystem and
db_mtime is the mtime from the database).
2010-01-06 10:32:06 -08:00
Carl Worth
29908b9f13 notmuch new: Avoid updating directory timestamp if interrupted.
This was a very dangerous bug. An interrupted "notmuch new" session
would still update the timestamp for the directory in the
database. This would result in mail files that were not processed due
to the original interruption *never* being picked up by future runs of
"notmuch new". Yikes!
2010-01-06 10:32:06 -08:00
Carl Worth
999f4c895c notmuch-new: Remove dead add_files_callback code.
Always satisfying to delete code (even if tiny).
2010-01-06 10:32:06 -08:00
Carl Worth
63ef5cd073 Make the add_files function static within notmuch-new.c.
No other files need this function so we don't need it exported in
notmuch-client.h.
2010-01-06 10:32:06 -08:00
Carl Worth
d807e28f43 lib: Implement new notmuch_directory_t API.
This new directory ojbect provides all the infrastructure needed to
detect when files or directories are deleted or renamed. There's still
code needed on top of this (within "notmuch new") to actually do that
detection.
2010-01-06 10:32:06 -08:00
Carl Worth
50ae83a17f lib: Rename set/get_timestamp to set/get_directory_mtime.
I've been suitably scolded by Keith for doing a premature
generalization that ended up just making the documentation more
convoluted. Fix that.
2010-01-06 10:32:05 -08:00
Carl Worth
3a9c3ec9e7 notmuch new: Remove hack to ignore read-only directories in mail store.
This was really the last thing keeping the initial run of "notmuch
new" being different from all other runs. And I'm taking a fresh
look at the performance of "notmuch new" anyway, so I think we can
safely drop this optimization.
2010-01-06 10:32:05 -08:00
Carl Worth
e1669b155c notmuch new: Restrict the "not much" pun to the first run.
Several people complained that the humor wore thin very quickly.  The
most significant case of "not much mail" is when counting the user's
initial mail collection. We've promised on the web page that no matter
how much mail the user has, notmuch will consider it to be "not much"
so let's say so. (This message was in place very early on, but was
inadvertently dropped at some point.)
2010-01-06 10:32:05 -08:00
Dirk-Jan C. Binnema
5f0b2ece16 Avoid compiler warnings due to ignored write return values
Glibc (at least) provides the warn_unused_result attribute on write,
(if optimizing and _FORTIFY_SOURCE is defined). So we explicitly
ignore the return value in our signal handler, where we couldn't do
anything anyway.

Compile with:

	make CFLAGS="-O -D_FORTIFY_SOURCE"

before this commit to see the warning.
2009-12-01 07:50:35 -08:00
Chris Wilson
de064f1772 notmuch-new: Check for non-fatal errors from stat()
Currently we assume that all errors on stat() a dname is fatal (but
continue anyway and report the error at the end). However, some errors
reported by stat() such as a missing file or insufficient privilege,
we can simply ignore and skip the file. For the others, such as a fault
(unlikely!) or out-of-memory, we handle like the other fatal errors by
jumping to the end.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-27 21:36:35 -08:00
Carl Worth
fb1a3452da Fix up whitespace styling from previous commit.
Function name in definition belong left-aligned. Body of if statement
cannot be on the same line as the "if".
2009-11-27 19:38:46 -08:00
Jan Janak
24ae7718b7 notmuch-new: Test if directory looks like Maildir before skipping tmp.
'notmuch new' skips directory entries with the name 'tmp'. This is to
prevent notmuch from processing possibly incomplete Maildir messages
stored in that directory.

This patch attempts to refine the feature. If "tmp" entry is found,
it first checks if the containing directory looks like a Maildir
directory. This is done by searching for other common Maildir
subdirectories. If they exist and if the entry "tmp" is a directory
then it is skipped.

Files and subdirectories with the name "tmp" that do not look like
Maildir will still be processed by 'notmuch new'.

Signed-off-by: Jan Janak <jan@ryngle.com>
2009-11-27 19:37:23 -08:00
Aneesh Kumar K.V
5c7c6c0bae notmuch-new: Fix notmuch new to look at files within symbolic links
We look at the modified time of the database and the directory
to decide whether we need to look at only the subdirectories.
ie, if directory modified time is < database modified time
then we have already looking at all the files withing the
directory. So we just need to iterate through the subdirectories

But with symlinks we need to make sure we follow them even if
the directory modified time is less than database modified time

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-11-27 17:29:10 -08:00
Jed Brown
f667bad7a5 Stay out of tmp to respect the Maildir spec. 2009-11-23 18:29:01 -08:00
Adrian Perez
d024ab4a04 ANSI escapes in "new" only when output is a tty
When running "notmuch new --verbose", ANSI escapes are used. This may not be
desirable when the output of the command is *not* being sent to a terminal
(e.g. when piping output into another command). In that case each file
processed is printed in a new line and ANSI escapes are not used at all.
2009-11-23 06:02:06 +01:00
Adrian Perez
5fdce046a1 Support for printing file paths in new command
For very large mail boxes, it is desirable to know which files are being
processed e.g. when a crash occurs to know which one was the cause. Also,
it may be interesting to have a better idea of how the operation is
progressing when processing mailboxes with big messages.

This patch adds support for printing messages as they are processed by
"notmuch new":

* The "new" command now supports a "--verbose" flag.

* When running in verbose mode, the file path of the message about to be
  processed is printed in the following format:

    current/total: /path/to/message/file

  Where "current" is the number of messages processed so far and "total" is
  the total count of files to be processed.

  The status line is erased using an ANSI sequence "\033[K" (erase current
  line from the cursor to the end of line) each time it is refreshed. This
  should not pose a problem because nearly every terminal supports it.

* The signal handler for SIGALRM and the timer are not enabled when running
  in verbose mode, because we are already printing progress with each file,
  periodical reports are not neccessary.
2009-11-23 01:07:02 +01:00
Chris Wilson
018ca890a3 notmuch-new: Only print the regular progress report when on a tty
Check that the stdout is connected to an interactive terminal with
isatty() before installing the periodic timer to print progress reports.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-22 05:36:39 +01:00
Chris Wilson
986f6c9824 notmuch-new: Only install SIGALRM if not running under gdb
I felt sorry for Carl trying to step through an exception from xapian
and suffering from the SIGALARMs..

We can detect if the user launched notmuch under a debugger by either
checking our cmdline for the presence of the gdb string or querying if
valgrind is controlling our process. For the latter we need to add a
compile time check for the valgrind development library, and so add the
initial support to build Makefile.config from configure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Carl Worth <cworth@cworth.org>
[ickle: And do not install the timer when under the debugger]
2009-11-22 05:36:36 +01:00
Chris Wilson
b5d7632000 notmuch new: Fix to actually open the database READ_WRITE.
Chris claims he must have been distracted when he wrote this.
2009-11-22 00:13:24 +01:00
Carl Worth
637f99d8f3 Rename NOTMUCH_DATABASE_MODE_WRITABLE to NOTMUCH_DATABASE_MODE_READ_WRITE
And correspondingly, READONLY to READ_ONLY.
2009-11-21 22:10:18 +01:00
Chris Wilson
f379aa5284 Permit opening the notmuch database in read-only mode.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
2009-11-21 22:04:49 +01:00
Carl Worth
5939490f64 Revert "notmuch: Add Maildir directory name as tag name for messages"
This reverts commit 9794f19017.

The feature makes a lot of sense for the initial import, but it's not
as clear whether it makes sense for ongoing "notmuch new" runs. We
might need to make this opt-in by configuration.
2009-11-21 21:21:58 +01:00
Aneesh Kumar K.V
9794f19017 notmuch: Add Maildir directory name as tag name for messages
This patch adds maildir directory name as the tag name for
messages. This helps in adding tags using filtering already
provided by procmail.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-11-21 13:28:24 +01:00
Carl Worth
0c0a401f70 notmuch new: Restore printout of total files counted.
This was more fallout from the recent re-shuffling of this code.
2009-11-19 00:32:21 +01:00
Carl Worth
3687472d45 notmuch new: Fix countdown timer on first run.
A recent shuffling of this code accidentally disabled the timer,
(making the time spent counting the files totally useless).
2009-11-19 00:29:52 +01:00
Stewart Smith
fca070f8ce count_files: sort directory in inode order before statting
Carl says: This has similar performance benefits as the previous
patch, and I fixed similar style issues here as well, (including
missing more of a commit message than the one-line summary).
2009-11-18 22:31:57 +01:00
Carl Worth
22759fb279 Minor style fixups for the previous fix.
Use consistent whitespace, a slightly less abbreviated identifier, and
avoid a C99 declaration after statement.
2009-11-18 22:31:50 +01:00
Stewart Smith
a45ff8c361 Read mail directory in inode number order
This gives a rather decent reduction in number of seeks required when
reading a Maildir that isn't in pagecache.

Most filesystems give some locality on disk based on inode numbers.
In ext[234] this is the inode tables, in XFS groups of sequential inode
numbers are together on disk and the most significant bits indicate
allocation group (i.e inode 1,000,000 is always after inode 1,000).

With this patch, we read in the whole directory, sort by inode number
before stat()ing the contents.

Ideally, directory is sequential and then we make one scan through the
file system stat()ing.

Since the universe is not ideal, we'll probably seek during reading the
directory and a fair bit while reading the inodes themselves.

However... with readahead, and stat()ing in inode order, we should be
in the best place possible to hit the cache.

In a (not very good) benchmark of "how long does it take to find the first
15,000 messages in my Maildir after 'echo 3 > /proc/sys/vm/drop_caches'",
this patch consistently cut at least 8 seconds off the scan time.

Without patch: 50 seconds
With patch: 38-42 seconds.

(I did this in a previous maildir reading project and saw large improvements too)
2009-11-18 22:25:41 +01:00