Commit graph

7067 commits

Author SHA1 Message Date
Carl Worth
1479b99b50 notmuch-index-message: Don't index the "re:" prefix in subjects.
Getting closer to sup results all the time.
2009-10-13 21:14:55 -07:00
Carl Worth
9bf3cda34c notmuch-index-message: Don't index the line introducing a quote.
We identify it based on a trailing ':' on the line before a quote
begins.

At this point the database-dump diff between sup and notmuch is
getting very, very small, (at least for our one test message).
2009-10-13 21:14:50 -07:00
Carl Worth
048b8aec11 notmuch-index-message: Don't index quoted lines and signatures.
At this point, we're achieving a result that is *very* close to
what sup does. The only difference is that we are still indexing
the "excerpts from message ..." line, and we are not yet indexing
references.
2009-10-13 21:14:44 -07:00
Carl Worth
9dbb1facfb notmuch-index-message: Separate gen_terms_body into its own function
This one is complex enough to deserve its own treament.
2009-10-13 21:14:33 -07:00
Carl Worth
f69215d41f notmuch-index-message: Add code to actually create a Xapian index
Most of this code is fairly clean and works well. One part is
fairly painful---namely extracting the body of an email message
from libgmime. Currently, I'm just extracting the offset to
the end of the headers, and then separately opening the message.
Surely there's a better way.

Anyway, with that the results are looking very similar to sup-sync
now, (as verified by xapian-dump). The only substantial difference
I'm seeing now is that sup does not seem to index quoted portions
of messages nor signatures. I'm not actually sure whether I want
to follow sup's lead in that or not.
2009-10-13 15:59:57 -07:00
Carl Worth
c55c34f4a0 Rename g_mime_test to notmuch-index-message
In preparation for actually creating a Xapian index from the
message, (not that we're doing that quite yet).
2009-10-13 13:31:17 -07:00
Carl Worth
a68a023d47 xapian-dump: Add a little mor indentation
Just to make it easier to visually identify where one document ends
and the next begins.
2009-10-13 13:21:47 -07:00
Carl Worth
1a6d88697b Include document data in the dump.
At the same time, I've started hacking up sup with a new NotmuchIndex
class in the place of the previous XapianIndex class. The new class
stores only the source_info field in the document data, (rather than
a serialized ruby hash with a bunch of data that can be found in the
original message).

Eventually, I plan to replace source_info with a relative filename for
the message, (or even a list of filenames for when multiple messages
in the database share a common message ID).
2009-10-13 13:18:32 -07:00
Carl Worth
ea96cb694f xapian-dump: Add support to unserialize values.
The interface for this is cheesy, (bare integer value numbers on the
command line indicating that unserialization is desired for those
value numbers). But this at least lets us print sup databases with
human-readable output for the date values.
2009-10-13 09:36:25 -07:00
Carl Worth
96a706383f Add .gitignore file to ignore compiled binaries. 2009-10-13 08:57:02 -07:00
Carl Worth
76e15cf673 xapian-dump: Add values to the dump as well. 2009-10-13 08:54:43 -07:00
Carl Worth
c8532ce25d xapian-dump: Fix to dump all terms for each document ID. 2009-10-13 08:54:35 -07:00
Carl Worth
26795d64e6 xapian-dump: Actually dump document IDs
It's not a complete tool yet, but it at least does something now.
2009-10-13 08:53:34 -07:00
Carl Worth
287ffc828d Remove unused variable.
Compiling with -Wall considered useful.
2009-10-13 08:53:28 -07:00
Carl Worth
11f99eb8ea Add the beginnings of a xapian-dump program.
This will (when it is finished) make a much more reliable way to
ensure that notmuch's sync program behaves identically to sup-sync.
It doesn't actually do anything yet.
2009-10-13 08:53:14 -07:00
Carl Worth
5986cfe5e7 Add sup-compatible prefixes and achieve sup-compatible print output
What I've done here is to instrument sup-sync to print the text
and terms objects it constructs just before indexing a message.
Then I've made my g_mime_test program achieve (nearly) identical
output for an example email message, (just missing the body
text). Next we can start shoving this data into a Xapian index.
2009-10-13 08:52:34 -07:00
Carl Worth
7d0886352c Initial commit of a test program to form the basis of notmuch.
Basically just playing with some simple code using libgmime to parse
an email message.
2009-10-13 08:52:02 -07:00