Commit graph

140 commits

Author SHA1 Message Date
Carl Worth
2a98b1d487 add_files_recursive: Make the maildir detection more efficient.
Previously, we were re-scanning the entire list of entries for every
directory entry. Instead, we can simply check if the entries look like
a maildir once, up-front.
2010-01-06 10:32:06 -08:00
Carl Worth
28ce73848d add_files_recursive: Separate scanning for directories and files for legibility.
We now do two scans over the entries returned from scandir. The first
scan is looking for directories (and making the recursive call). The
second scan is looking for new files to add to the database.

This is easier to read than the previous code which had a single loop
and some if statements with ridiculously long bodies. It also has the
advantage that once the directory scan is complete we can do a single
comparison of the filesystem and database mtimes and entirely skip the
second scan if it's not needed.
2010-01-06 10:32:06 -08:00
Carl Worth
6f05dd8a8c add_files_recursive: Use consistent naming for array and count variables.
Previously we had an array named "namelist" and its count named
"num_entries". We now use an array name of "fs_entries" and a count
named "num_fs_entries" to try to preserve sanity.
2010-01-06 10:32:06 -08:00
Carl Worth
2c4555f1a5 notmuch new: Remove an unnecessary stat of every regular file in the mail store.
We were previousl using the stat for two reasons. One was to obtain
the mtime of the file. This usage was removed in the previous commit,
(since the mtime is unreliable in the case of a file being moved into
the mail store).

The second reason was to identify regular and directory file
types. But this information is already available in the result we get
from scandir.

What's left is simply a stat for each directory in the mailstore,
(which we are still using to compare filesystem mtime with the mtime
stored in the database).
2010-01-06 10:32:06 -08:00
Carl Worth
dde214c768 notmuch new: Eliminate the check on the mtime of regular files before adding.
This check was buggy in that moving a pre-existing file into the mail
store, (where the file existed before the last run of "notmuch new"),
does not update the mtime of the file. So the message would never be
added to the database.

The fix here is not practical in the long run, (since it causes *all*
files in the mail store to be processed in every run of "notmuch new"
(!)). But this change will let us drop a stat() call that we don't
otherwise need and will help move us toward proper database-backed
detection of new files, (which will fix the bug without the
performance impact of the current fix).
2010-01-06 10:32:06 -08:00
Carl Worth
2ce46c31fe notmuch new: Fix internal documentation of add_files_recursive.
To make it more clear that the mtime of a directory does not affect
whether further sub-directories are examined, (they are examined
unconditionally).
2010-01-06 10:32:06 -08:00
Carl Worth
3fb7ee7754 notmuch new: Rename the various timestamp variables to be more clear.
The previous name of "path_mtime" was very ambiguous. The new names
are much more obvious (fs_mtime is the mtime from the filesystem and
db_mtime is the mtime from the database).
2010-01-06 10:32:06 -08:00
Carl Worth
29908b9f13 notmuch new: Avoid updating directory timestamp if interrupted.
This was a very dangerous bug. An interrupted "notmuch new" session
would still update the timestamp for the directory in the
database. This would result in mail files that were not processed due
to the original interruption *never* being picked up by future runs of
"notmuch new". Yikes!
2010-01-06 10:32:06 -08:00
Carl Worth
999f4c895c notmuch-new: Remove dead add_files_callback code.
Always satisfying to delete code (even if tiny).
2010-01-06 10:32:06 -08:00
Carl Worth
63ef5cd073 Make the add_files function static within notmuch-new.c.
No other files need this function so we don't need it exported in
notmuch-client.h.
2010-01-06 10:32:06 -08:00
Carl Worth
d807e28f43 lib: Implement new notmuch_directory_t API.
This new directory ojbect provides all the infrastructure needed to
detect when files or directories are deleted or renamed. There's still
code needed on top of this (within "notmuch new") to actually do that
detection.
2010-01-06 10:32:06 -08:00
Carl Worth
50ae83a17f lib: Rename set/get_timestamp to set/get_directory_mtime.
I've been suitably scolded by Keith for doing a premature
generalization that ended up just making the documentation more
convoluted. Fix that.
2010-01-06 10:32:05 -08:00
Carl Worth
3a9c3ec9e7 notmuch new: Remove hack to ignore read-only directories in mail store.
This was really the last thing keeping the initial run of "notmuch
new" being different from all other runs. And I'm taking a fresh
look at the performance of "notmuch new" anyway, so I think we can
safely drop this optimization.
2010-01-06 10:32:05 -08:00
Carl Worth
e1669b155c notmuch new: Restrict the "not much" pun to the first run.
Several people complained that the humor wore thin very quickly.  The
most significant case of "not much mail" is when counting the user's
initial mail collection. We've promised on the web page that no matter
how much mail the user has, notmuch will consider it to be "not much"
so let's say so. (This message was in place very early on, but was
inadvertently dropped at some point.)
2010-01-06 10:32:05 -08:00
Dirk-Jan C. Binnema
5f0b2ece16 Avoid compiler warnings due to ignored write return values
Glibc (at least) provides the warn_unused_result attribute on write,
(if optimizing and _FORTIFY_SOURCE is defined). So we explicitly
ignore the return value in our signal handler, where we couldn't do
anything anyway.

Compile with:

	make CFLAGS="-O -D_FORTIFY_SOURCE"

before this commit to see the warning.
2009-12-01 07:50:35 -08:00
Chris Wilson
de064f1772 notmuch-new: Check for non-fatal errors from stat()
Currently we assume that all errors on stat() a dname is fatal (but
continue anyway and report the error at the end). However, some errors
reported by stat() such as a missing file or insufficient privilege,
we can simply ignore and skip the file. For the others, such as a fault
(unlikely!) or out-of-memory, we handle like the other fatal errors by
jumping to the end.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-27 21:36:35 -08:00
Carl Worth
fb1a3452da Fix up whitespace styling from previous commit.
Function name in definition belong left-aligned. Body of if statement
cannot be on the same line as the "if".
2009-11-27 19:38:46 -08:00
Jan Janak
24ae7718b7 notmuch-new: Test if directory looks like Maildir before skipping tmp.
'notmuch new' skips directory entries with the name 'tmp'. This is to
prevent notmuch from processing possibly incomplete Maildir messages
stored in that directory.

This patch attempts to refine the feature. If "tmp" entry is found,
it first checks if the containing directory looks like a Maildir
directory. This is done by searching for other common Maildir
subdirectories. If they exist and if the entry "tmp" is a directory
then it is skipped.

Files and subdirectories with the name "tmp" that do not look like
Maildir will still be processed by 'notmuch new'.

Signed-off-by: Jan Janak <jan@ryngle.com>
2009-11-27 19:37:23 -08:00
Aneesh Kumar K.V
5c7c6c0bae notmuch-new: Fix notmuch new to look at files within symbolic links
We look at the modified time of the database and the directory
to decide whether we need to look at only the subdirectories.
ie, if directory modified time is < database modified time
then we have already looking at all the files withing the
directory. So we just need to iterate through the subdirectories

But with symlinks we need to make sure we follow them even if
the directory modified time is less than database modified time

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-11-27 17:29:10 -08:00
Jed Brown
f667bad7a5 Stay out of tmp to respect the Maildir spec. 2009-11-23 18:29:01 -08:00
Adrian Perez
d024ab4a04 ANSI escapes in "new" only when output is a tty
When running "notmuch new --verbose", ANSI escapes are used. This may not be
desirable when the output of the command is *not* being sent to a terminal
(e.g. when piping output into another command). In that case each file
processed is printed in a new line and ANSI escapes are not used at all.
2009-11-23 06:02:06 +01:00
Adrian Perez
5fdce046a1 Support for printing file paths in new command
For very large mail boxes, it is desirable to know which files are being
processed e.g. when a crash occurs to know which one was the cause. Also,
it may be interesting to have a better idea of how the operation is
progressing when processing mailboxes with big messages.

This patch adds support for printing messages as they are processed by
"notmuch new":

* The "new" command now supports a "--verbose" flag.

* When running in verbose mode, the file path of the message about to be
  processed is printed in the following format:

    current/total: /path/to/message/file

  Where "current" is the number of messages processed so far and "total" is
  the total count of files to be processed.

  The status line is erased using an ANSI sequence "\033[K" (erase current
  line from the cursor to the end of line) each time it is refreshed. This
  should not pose a problem because nearly every terminal supports it.

* The signal handler for SIGALRM and the timer are not enabled when running
  in verbose mode, because we are already printing progress with each file,
  periodical reports are not neccessary.
2009-11-23 01:07:02 +01:00
Chris Wilson
018ca890a3 notmuch-new: Only print the regular progress report when on a tty
Check that the stdout is connected to an interactive terminal with
isatty() before installing the periodic timer to print progress reports.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-11-22 05:36:39 +01:00
Chris Wilson
986f6c9824 notmuch-new: Only install SIGALRM if not running under gdb
I felt sorry for Carl trying to step through an exception from xapian
and suffering from the SIGALARMs..

We can detect if the user launched notmuch under a debugger by either
checking our cmdline for the presence of the gdb string or querying if
valgrind is controlling our process. For the latter we need to add a
compile time check for the valgrind development library, and so add the
initial support to build Makefile.config from configure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Carl Worth <cworth@cworth.org>
[ickle: And do not install the timer when under the debugger]
2009-11-22 05:36:36 +01:00
Chris Wilson
b5d7632000 notmuch new: Fix to actually open the database READ_WRITE.
Chris claims he must have been distracted when he wrote this.
2009-11-22 00:13:24 +01:00
Carl Worth
637f99d8f3 Rename NOTMUCH_DATABASE_MODE_WRITABLE to NOTMUCH_DATABASE_MODE_READ_WRITE
And correspondingly, READONLY to READ_ONLY.
2009-11-21 22:10:18 +01:00
Chris Wilson
f379aa5284 Permit opening the notmuch database in read-only mode.
We only rarely need to actually open the database for writing, but we
always create a Xapian::WritableDatabase. This has the effect of
preventing searches and like whilst updating the index.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Carl Worth <cworth@cworth.org>
2009-11-21 22:04:49 +01:00
Carl Worth
5939490f64 Revert "notmuch: Add Maildir directory name as tag name for messages"
This reverts commit 9794f19017.

The feature makes a lot of sense for the initial import, but it's not
as clear whether it makes sense for ongoing "notmuch new" runs. We
might need to make this opt-in by configuration.
2009-11-21 21:21:58 +01:00
Aneesh Kumar K.V
9794f19017 notmuch: Add Maildir directory name as tag name for messages
This patch adds maildir directory name as the tag name for
messages. This helps in adding tags using filtering already
provided by procmail.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-11-21 13:28:24 +01:00
Carl Worth
0c0a401f70 notmuch new: Restore printout of total files counted.
This was more fallout from the recent re-shuffling of this code.
2009-11-19 00:32:21 +01:00
Carl Worth
3687472d45 notmuch new: Fix countdown timer on first run.
A recent shuffling of this code accidentally disabled the timer,
(making the time spent counting the files totally useless).
2009-11-19 00:29:52 +01:00
Stewart Smith
fca070f8ce count_files: sort directory in inode order before statting
Carl says: This has similar performance benefits as the previous
patch, and I fixed similar style issues here as well, (including
missing more of a commit message than the one-line summary).
2009-11-18 22:31:57 +01:00
Carl Worth
22759fb279 Minor style fixups for the previous fix.
Use consistent whitespace, a slightly less abbreviated identifier, and
avoid a C99 declaration after statement.
2009-11-18 22:31:50 +01:00
Stewart Smith
a45ff8c361 Read mail directory in inode number order
This gives a rather decent reduction in number of seeks required when
reading a Maildir that isn't in pagecache.

Most filesystems give some locality on disk based on inode numbers.
In ext[234] this is the inode tables, in XFS groups of sequential inode
numbers are together on disk and the most significant bits indicate
allocation group (i.e inode 1,000,000 is always after inode 1,000).

With this patch, we read in the whole directory, sort by inode number
before stat()ing the contents.

Ideally, directory is sequential and then we make one scan through the
file system stat()ing.

Since the universe is not ideal, we'll probably seek during reading the
directory and a fair bit while reading the inodes themselves.

However... with readahead, and stat()ing in inode order, we should be
in the best place possible to hit the cache.

In a (not very good) benchmark of "how long does it take to find the first
15,000 messages in my Maildir after 'echo 3 > /proc/sys/vm/drop_caches'",
this patch consistently cut at least 8 seconds off the scan time.

Without patch: 50 seconds
With patch: 38-42 seconds.

(I did this in a previous maildir reading project and saw large improvements too)
2009-11-18 22:25:41 +01:00
Ingmar Vanhassel
2ce25b93a7 Typsos 2009-11-18 03:21:36 -08:00
Keith Packard
f4245aec94 notmuch new/tag: Flush all changes to database when interrupted.
By installing a signal handler for SIGINT we can ensure that no work
that is already complete will be lost if the user interrupts a
"notmuch new" run with Control-C.
2009-11-13 09:05:13 -08:00
Carl Worth
e70f09d900 notmuch new: Don't ignore files with mtime of 0.
I recently discovered that mb2md has the annoying bug of creating
files with mtime of 0, and notmuch then promptly ignored them,
(thinking that its timestamps initialized to 0 were just as new).

We fix notmuch to not exclude messages based on a database timestamp
of 0.
2009-11-12 07:02:13 -08:00
Keith Packard
5d614048b4 Initialize count of new files to zero.
Leaving this variable uninitialized caused notmuch to display a random
number while counting files for the new database.

Signed-off-by: Keith Packard <keithp@keithp.com>
2009-11-11 22:54:01 -08:00
Carl Worth
37bdd89870 notmuch new: Unbreak after the addition of notmuch-config.
Pull in the code from add-files.c now that notmuch_new_command is
the only user, (we no longer have notmuch_setup_command adding any
messages).
2009-11-11 19:50:15 -08:00
Carl Worth
50144f95ca notmuch: Break notmuch.c up into several smaller files.
Now that the client sources are alone here in their own directory,
(with all the library sources down inside the lib directory), we can
break the client up into multiple files without mixing the files up.
The hope is that these smaller files will be easier to manage and
maintain.
2009-11-10 12:03:05 -08:00