Our old notmuch-index-message.cc code had this, but I originally
left it out when adding indexing back in. I was concerned primarily
with mistakenly detecting signature markers and omitting important
text, (for example, I often do long lines of "----" as section
separators).
But now I see that there's a performance benefit to skippint the
quotations, (about 120 files/sec. instead of 95 files/sec.). I mitigated
the bogus signature checking by recognizing nothing other than the
all-time classic "-- ".
We put these is as a separate term so that they can be extracted.
We don't actually need this for searching, since typing an email
address in as a search term will already trigger a phrase search
that does exactly what's wanted.
This is based on the old notmuch-index-message.cc from early in
the history of notmuch, but considerably cleaned up now that
we have some experience with Xapian and know just what we want
to index, (rather than just blindly trying to index exactly
what sup does).
This does slow down notmuch_database_add_message a *lot*, but I've
got some ideas for getting some time back.