Commit graph

4194 commits

Author SHA1 Message Date
Carl Worth
6b228e4509 sha1: Add new notmuch_sha1_of_string function
We'll be using this for storing really long terms in the database
and when we just need to look them up, (and never read back the
original data directly from the database). For example, storing
arbitrarily long directory paths in the database along with
mtime timestamps.

Note that if we did want to store arbitrarily long terms and also
be able to read them back, the Xapian folks recommending splitting
the term off with multiple prefixes. See the note near the end
of this page:

http://trac.xapian.org/wiki/FAQ/UniqueIds
2009-10-23 13:54:53 -07:00
Carl Worth
c9fbe6b58b notmuch restore: Print names of tags that cannot be applied
This helps the user gauge the severity of the error.

For example, when restoring my sup tags I see a bunch of tags missing
for message IDs of the form "sup-faked-...". That's not surprising
since I know that sup generates these with the md5sum of the message
header while notmuch uses the sha-1 of the entire message. But how
much will this hurt?

Well, now that I can see that most of the missing tags are just
"attachment", then I'm not concerned, (I'll be automatically creating
that tag in the future based on the message contents). But if a
missing tag is "inbox" then that's more concerning because that's data
that I can't easily regenerate outside of sup.
2009-10-23 06:08:22 -07:00
Carl Worth
db93109cfe notmuch_tags_has_more: Fix to use string.empty rather than string.size
I'm really interested in the length of the data here, not the size
of the storage.
2009-10-23 06:06:20 -07:00
Carl Worth
ce5d782962 Fix notmuch_message_get_message_id to never return NULL.
With the recent improvements to the handling of message IDs we
"know" that a NULL message ID is impossible, (so we simply
abort if the impossible happens).
2009-10-23 06:04:57 -07:00
Carl Worth
6ccdffcd87 add_message: Fix to not add multiple documents with the same message ID
Here's the second big fix to message-ID handling, (the first was to
generate message IDs when an email contained none). Now, with no
document missing a message ID, and no two documents having the same
message ID, we have a nice consistent database where the message ID
can be used as a unique key.
2009-10-23 06:00:10 -07:00
Carl Worth
1b5d8984c6 Add _notmuch_message_create_for_message_id
This is the last piece needed for add_message to be able to properly
support a message with a duplicate message ID.  This function creates
a new notmuch_message_t object but one that may reference an existing
document in the database.
2009-10-23 05:53:52 -07:00
Carl Worth
69b25a75ec Fix _notmuch_message_create to catch Xapian DocNotFoundError.
This function is only supposed to be called with a doc_id that
was queried from the database already. So there's an internal
error if no document with that doc_id can be found in the database.

In that case, return NULL.
2009-10-23 05:48:52 -07:00
Carl Worth
17548e314a Add internal functions for manipulating a new notmuch_message_t
This will support the add_message function in incrementally creating
state in a new notmuch_message_t. The new functions are

      _notmuch_message_set_filename
      _notmuch_message_add_thread_id
      _notmuch_message_ensure_thread_id
      _notmuch_message_set_date
      _notmuch_message_sync
2009-10-23 05:48:52 -07:00
Carl Worth
868d3b3068 Add notmuch_message_get_filename
This is a new public function to find the filename of the original
email message for a message-object that was found in the database.

We may change this function in the future to support returning a
list of filenames, (for messages with duplicate message IDs).
2009-10-23 05:48:46 -07:00
Carl Worth
31044d10ed add_message: Re-order the code a bit (find message-id first).
We're preparing for being able to deal with files with duplicate
message IDs here. The plan is to create a notmuch_message_t object in
add_message that may or may not reference a document that exists in
the database. So to do this, we have to find the message ID before we
do any manipulation of the doc.
2009-10-23 05:30:37 -07:00
Carl Worth
c78358fa8a Move thread_id generation code from database.cc to message.cc
It's really up to the message to decide how to generate these.
2009-10-23 05:25:58 -07:00
Carl Worth
97775ef438 Move the _notmuch_message_sync from private to public interfaces
The idea here is to allow internal users to see a non-synced message
object, (for example, while parsing a message file and incrementally
adding terms, etc.). We're willing to take the care to get the
improved performance.

But for the public interface, keeping everything synced will be much
less confusing, (reference lots of sup bugs that happen due to
message state being altered by the user but not synced to the database).
2009-10-23 05:20:03 -07:00
Carl Worth
1ecdef59f5 add_message: Rename message to message_file
I still don't like the name message_file at all, but we're about
to start using a notmuch_message_t in this function so we need
to do something to keep the identifiers separate for now.

Eventually, it probably makes sense to push the message-parsing
code from database.cc to message.cc.
2009-10-23 05:13:42 -07:00
Carl Worth
1ae8c41cda Prevent that last bug from reoccurring.
It's even enough to check if a "missing" header was accidentally
left off the list in the call to restrict_headers. (And it's
cheap since we only check in case no such header was found in the
message.)
2009-10-22 15:47:19 -07:00
Carl Worth
77f9d3ee0e Don't forget the "to" header when restrict parsing to certain headers
We recently started discarding files as "not email" if they have none
of Subject, From, nor To. Apaprently, my mail collection contains a
number of messages that I sent, that are saved without Subject and
From, (perhaps these were drafts?).

Anyway, it's fortunate I had those since they alerted me to this bug,
where we were not parsing the "To" header in some cases.
2009-10-22 15:34:47 -07:00
Carl Worth
90f93fc9c7 Fix missing error check.
The notmuch_message_file_open function is perfectly capable of
returning NULL. So check for it.
2009-10-22 15:33:56 -07:00
Carl Worth
6a4992bc61 Generate message ID (using SHA1) when a mail message contains none.
This is important as we're using the message ID as the unique key
in our database. So previously, all messages with no message ID
would be treated as the same message---not good at all.
2009-10-22 15:31:56 -07:00
Carl Worth
5794496c6e Rename sha1.c to libsha1.c
This way both the .c and .h files have the same name, and all of the
code imported from the "libsha1" implementation is in filenames
matching libsha1.*.

This also gives me room to make my own notmuch_sha1 wrapper functions
in sha1.c.
2009-10-21 23:27:48 -07:00
Carl Worth
84480738a5 Merge branch from fixing up bugs after bisecting.
I'm glad that when I implemented "notmuch restore" I went through the
extra effort to take the code I had written in one sitting into over a
dozen commits. Sure enough, I hadn't tested well enough and had
totally broken "notmuch setup", (segfaults and bogus thread_id
values).

With the little commits I had made, git bisect saved the day, and I
went back to make the fixes right on top of the commits that
introduced the bugs. So now we octopus merge those in.
2009-10-21 23:23:44 -07:00
Carl Worth
c58ee818b5 Bring back the insert_thread_id function.
We deleted this in favor of our fancy new thread_ids iterator
from the message object. But one of the previous callers of
insert_thread_id isn't using notmuch_message_t yet. I made
the mistake of thinking I could just call g_hash_table_insert
directly, but the problem was that nobody was splitting
up the thread_id string at its commas.

So with this, we were inserting bogus comma-separated IDs
into the hash table, so thread_id values were ballooning
out of control. Should be much better now.
2009-10-21 23:21:12 -07:00
Carl Worth
2ce552b5f7 Fix lifetime-maintenance bug with std::string and c_str()
Here's more evidence that C++ is a nightmare to program---or that
I'm smart enough to realize that C++ is more clever than I will
ever be.

Most of my issues with C++ have to do with it hiding things from
me that I'd really like to and expect to be aware of as a C
programmer.

For example, the specific problem here is that there's a
short-lived std::string, from which I just want to copy
the C string. I try to do that on the next line, but before
I can, C++ has already called the destructor on the std::string.

Now, C++ isn't alone in doing garbage collecting like this.
But in a *real* garbage-collecting system, everything would
work that way. For example, here, I'm still holding a pointer
to the C string contents, so if the garbage collector were
aware of that reference, then it might clean up the std::string
container and leave the data I'm still using.

But that's not what we get with C++. Instead, some things are
reference counted and collected, (like the std::string), and
some things just aren't (like the C string it contains). The
end result is that it's very fragile. It forces me to be aware
of the timing of hidden functions. In a "real" system I wouldn't
have to be aware of that timing, and in C the function just
wouldn't be hidden.
2009-10-21 23:20:18 -07:00
Carl Worth
2745575b9b List a few more co-conspirators.
Keith's name already shows up in the git log, so it would be
wrong to not mention him. And Martin and Jamey have been
helpful in discussions about what an ideal mail system
would look like.
2009-10-21 21:33:08 -07:00
Carl Worth
5cc55df57b Add an AUTHORS file.
Now that I've copied in another source file from someone else, I
want to be sure I'm keeping a good list of everyone who has helped.
2009-10-21 21:33:08 -07:00
Mikhail Gusarov
96c0d1c1cb Add sha1.c and libsha1.h for doing SHA-1-based message-ID generation.
This code comes courtesy of Brian Gladman and Mikhail Gusarov.

Both files are available under the GPL and were downloaded as
version 0.2 of libsha1 from git://github.com/dottedmag/libsha1.git
with the following commit:

commit d0f0e7e0dc5ce2d58972cb5a492183c0d4e58433
Author: Mikhail Gusarov <dottedmag@dottedmag.net>
Date:   Mon Oct 20 22:38:47 2008 +0700

    Version bump.

    Signed-off-by: Mikhail Gusarov <dottedmag@dottedmag.net>
2009-10-21 21:33:02 -07:00
Carl Worth
16f2e43652 Add copy of GNU General Public License (version 3).
All the files were already advertising the license, but we didn't
actually have a copy of the license in the repository until now.
2009-10-21 16:25:08 -07:00
Carl Worth
302d54834d Add notmuch_status_to_string function.
Be kind and let the user print error messages, not just error
codes.
2009-10-21 16:12:53 -07:00
Carl Worth
f232f0a797 Implement "notmuch restore".
It's pretty easy to do with all the right infrastructure in place.
Now that I can get my tags from sup to notmuch, maybe I'll be able
to start reading mail again.
2009-10-21 16:03:03 -07:00
Carl Worth
f96f4fe427 Pull out a chomp_newline function from "notmuch setup"
We'll want this same thing with "notmuch restore", (and really
anything using getline).
2009-10-21 15:59:11 -07:00
Carl Worth
defd216487 Add notmuch_message_add_tag and notmuch_message_remove_tag
With these two added, we now have enough functionality in the
library to implement "notmuch restore".
2009-10-21 15:56:33 -07:00
Carl Worth
0bbfa57014 notmuch-private.h: Move NOTMUCH_BEGIN_DECLS earlier
We actually need this before the include of xutil.h, but
it was previously stuck randomly among various system
includes. Instead, put it at the top, right after include
the notmuch.h header that defines it.
2009-10-21 15:51:13 -07:00
Carl Worth
a6b3f341dc notmuch_query_search: Clarify the documentation.
This is where we wanted to put the note to recommend the user
call notmuch_message_destroy if the lifetime of the message
is much shorter than the lifetime of the query. (Somehow this
had ended up in the documentation of notmuch_message_get_tags
before.)
2009-10-21 15:46:46 -07:00
Carl Worth
0383ae2a07 notmuch.h: Fix some copy-paste errors in the documentaton.
In several places we had "results" where "tags" was intended.
It actually read fine in some cases, but this is still better.
2009-10-21 15:45:34 -07:00
Carl Worth
2afd95bfc2 notmuch_message_get_message_id: Fix to cache result
Previously, this would allocate new memory with every call. That
was with talloc, of course, so there wasn't any leaking (eventually).
But since we're now calling this internally we want to be a little
less wasteful. It's easy enough to just stash the result into the
message on the first call, and then just return that on subsequent
calls.
2009-10-21 15:42:54 -07:00
Carl Worth
6c5054ebee database: Add new notmuch_database_find_message
With this function, and the recently added support for
notmuch_message_get_thread_ids, we now recode the find_thread_ids
function to work just the way we expect a user of the public
notmuch API to work. Not too bad really.
2009-10-21 15:40:20 -07:00
Carl Worth
8ad4350fef Add notmuch_message_get_thread_ids function
Along with all of the notmuch_thread_ids_t iterator functions.
Using a consistent idiom seems better here rather than returning
a comma-separated string and forcing the user to parse it.
2009-10-21 15:23:08 -07:00
Carl Worth
d008389a4a Add wrappers for regcomp and regexec to xutil.c.
These will be handy for some parsing.
2009-10-21 15:07:20 -07:00
Carl Worth
22b2265cac Rename NOTMUCH_MAX_TERM to NOTMUCH_TERM_MAX
Just better consistency with our naming schemes.
2009-10-21 14:10:00 -07:00
Carl Worth
6142216132 Move find_prefix function from database.cc to message.cc
It's definitely a better fit there for now, (and can likely
eventually be made static as add_term moves from database
to message as well).
2009-10-21 14:07:40 -07:00
Carl Worth
baf1867cc4 notmuch dump: Fix to print spaces between tags.
Simple little bug here made all the tags run together.
2009-10-21 14:02:51 -07:00
Carl Worth
17b3c214ea Convert notmuch_database_t to start using talloc.
This will be handy as we can hang future talloc allocations off
of the datbase now.
2009-10-21 14:00:37 -07:00
Carl Worth
9ec5189a56 Move declarations for xutil.c from notmuch-private to new xutil.h.
The motivation here is that our top-level notmuch.c main program
wants to start using these, but we don't want it to see into
notmuch-private.h, (since our main program is a test vehicle
for the "public" notmuch interface in notmuch.h).
2009-10-21 13:57:02 -07:00
Carl Worth
0e914d9e96 notmuch dump: Fix buffer overrun in error message.
Just a little bug I noticed while editing nearby code.
2009-10-21 10:12:11 -07:00
Carl Worth
d29a6ec791 notmuch setup: Collapse internal whitespace within message-id
I'm too lazy to see what the RFC says, but I know that having
whitespace inside a message-ID is sure to confuse things. And
besides, this makes things more compatible with sup so that
I have some hope of importing sup labels.
2009-10-21 10:07:34 -07:00
Carl Worth
65baa4f4e7 notmuch dump: Fix the sorting of results.
To properly support sorting in notmuch_query we know use an
Enquire object. We also throw in a QueryParser too, so we're
really close to being able to support arbitrary full-text
searches.

I took a look at the supported QueryParser syntax and chose
a set of flags for everything I like, (such as supporting
Boolean operators in either case ("AND" or "and"), supporting
phrase searching, supporting + and - to include/preclude terms,
and supporting a trailing * on any term as a wildcard).
2009-10-21 00:35:56 -07:00
Carl Worth
6a3b68edef add_message: Add a type:mail ("Kmail") term to all documents.
This gives us an easy way to specify "all mail messages" in a search
query. We simply look for this term.
2009-10-21 00:34:36 -07:00
Carl Worth
af65f52acf notmuch setup: Print a few protecting spaces after progress reports.
This is to help keep the report looking clean when a new report
is shorter than a previous reports, (say, when crossing the
boundary from over one minute remaining to less than one minute
remaining).

This used to be here, but I must have accidentally dropped it
when reformatting the progress report recently.
2009-10-21 00:32:30 -07:00
Carl Worth
266c612a50 .gitignore: Ignore generated file Makefile.dep
Forgot to add this when I first add dependency checking to the
Makefile.
2009-10-20 23:13:28 -07:00
Carl Worth
50144fb354 database: Remove two little bits of dead code. 2009-10-20 23:12:53 -07:00
Carl Worth
6519aff957 query: Remove the magic NOTMUCH_QUERY_ALL
Using the address of a static char* was clever, but really
unnecessary. An empty string is much less magic, and even
easier to understand as the way to query everything from
the database.
2009-10-20 22:40:37 -07:00
Carl Worth
aad13c3ac9 notmuch dump: Free each message as it's used.
Previously we were leaking[*] memory in that the memory footprint of
a "notmuch dump" run would continue to grow until the output was
complete, and then finally all the memory would be freed.

Now, the memory footprint is small and constant, O(1) rather than
O(n) in the number of messages.

[*] Not leaking in a valgrind sense---every byte was still carefully
being accounted for and freed eventually.
2009-10-20 22:27:56 -07:00