Since Xapian does not preserve quotes when passing the subquery to a
field processor, we have to make a guess as to what the user
intended. Here the added assumption is that a string surrounded by
parens is not intended to be a phrase.
It is confusing to use two different names (sexp vs sexpr) when
compared with the command line option --query=sexp and (furthermore)
singular vs plural when compared with the man page title.
This document contains meaningful markup in the terms, which makeinfo
complains about. Replace the use of definition lists with regular
paragraphs containing quote blocks. This is accomplished by splitting
the "term" from the definition with a blank line.
Sphinx-doc already formats the terms appropriately for a given
backend (bold in html and man). `makeinfo` complains noisily about
formatting inside a @item if we add our own explicit formatting.
This change may change the formatting in the info output. On the other
hand, the existing use of quotes for bold is not that great anyway.
In some places blank lines were removed to preserve the logical
structure of a definition list.
Macros implement lazy evaluation and lexical scope. The former is
needed to make certain natural constructs work sensibly (e.g. (tag
,param)) but the latter is mainly future-proofing in case the DSL is
is extended to allow local bindings.
For technical background, see chapters 6 and 17 of [1] (or some other
intermediate programming languages textbook).
[1] http://cs.brown.edu/courses/cs173/2012/book/
This provides functionality analogous to query: in the Xapian
QueryParser based parser. Perhaps counterintuitively, the saved
queries currently have to be in the original query syntax (i.e. not
s-expressions).
One subtle aspect is the replacement of _find_prefix with
_notmuch_database_prefix, which understands user headers. Otherwise
the code mainly consists of creating a fake prefix record (since the
user prefixes are not in the prefix table) and error handling.
This is necessary so that programs can take infix syntax queries from
a user and use the sexp query syntax to construct e.g. a refinement of
that query.
At least to the degree that the Xapian QueryParser based parser
also supports them. Support short alias 'rx' as it seems to make more
complex queries nicer to read.
The many tests potentially overkill, but they could catch typos in the
prefixes table. As a simplifying assumption, for now we assume a
single argument to the wildcard operator, as this matches the Xapian
semantics. The name 'starts-with' is chosen to emphasize the supported
case of wildcards in currrent (1.4.x) Xapian.
We use "boolean" to describe fields that should generate terms
literally without stemming or phrase splitting. This terminology
might not be ideal but it is already enshrined in
notmuch-search-terms(7).
Anything that is quoted or not purely word characters is considered a
phrase. Phrases are not stemmed, because the stems do not have
positional information in the database. It is less efficient to scan
the term twice, but it avoids a second pass to add prefixes, so maybe
it balances out. In any case, it seems unlikely query parsing is very
often a bottleneck.
All operations and (Xapian) fields will eventually have an entry in
the prefixes table. The flags field is just a placeholder for now, but
will eventually distinguish between various kinds of prefixes.
There is not much of a parser here yet, but it already does some
useful error reporting. Most functionality sketched in the
documentation is not implemented yet; detailed documentation will
follow with the implementation.
Add internal hyperlink targets for man pages and cross-reference them
using the any role reference. There are a number of alternatives to
accomplish this, but this seems like the combination that retains the
man page section number and the same boldface style in the man pages.
As a bonus, we get sanity checking on the links; for example
notmuch-search-terms.rst had a reference to notmuch-properties(1)
i.e. the wrong section.
The obvious semantic follow-up change would be to only have meaningful
"see also" references instead of having them all everywhere.
Using manpage role references generates helpful links in html
documentation, while retaining the same boldface style in the man
pages.
The external man page site is configurable. The Debian manpage site
seems like a good fit for Notmuch.
The features that require field processor support, are now just
documented w/o mentioning **Xapian Field Processors**' is needed
for those.
Replaced "compact" and "field_processor" with "retry_lock" in
build_with config option, as it is currently the only one that
is optionally excluded. The former 2 are now documented as
features always included.
Dropped one 'we' "passive" in notmuch-search-terms.rst. It was the
only one, and inconsistent with rest of the documentation in that
file.
Dropped message about conditional open-ended ranges support, as
those are now always supported.
When encountering a message that has been mangled in the "mixed up"
way by an intermediate MTA, notmuch should instead repair it and index
the repaired form.
When it does this, it also associates the index.repaired=mixedup
property with the message. If a problem is found with this repair
process, or an improved repair process is proposed later, this should
make it easy for people to reindex the relevant message. The property
will also hopefully make it easier to diagnose this particular problem
in the future.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
When we notice a legacy-display part during indexing, it makes more
sense to avoid indexing it as part of the message body.
Given that the protected subject will already be indexed, there is no
need to index this part at all, so we skip over it.
If this happens during indexing, we set a property on the message:
index.repaired=skip-protected-headers-legacy-display
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
This adds no functionality directly, but is a useful starting point
for adding new repair functionality.
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.
Unprefixed query terms should continue to match anywhere (header or
body) in the message.
This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.
Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
Many of the manpages didn't treat literal text as literal text. I've
tried to normalize some of the restructured text to make it a bit more
regular.
several of the synopsis lines are still untouched by this cleanup, but
i'm not sure what the right way to represent those is in .rst,
actually.
In particular find that if i rebuild the manpages, sometimes i end up
with some of the synopsis lines showing – (U+2013 EN DASH) where they
should have -- (2 × U+002D HYPHEN-MINUS) in the generated nroff
output, though i have not tracked down the source of this error yet.
I think we've diverged enough from the Xapian query parser
that we can't rely on that syntax description [1]. As far as I can
tell, [1] also only discusses quotes in the context of phrases.
[1]: https://xapian.org/docs/queryparser.html
Need to be clearer about specifying time ranges using timestamps.
Legacy syntax which predates the date prefix is still supported, but
timestamps used in conjunction with the date prefix require additional
syntax.
Make all parameter descriptions etc. use reStructuredText definition
lists with uniform style and indentation. Remove redundant indentation
from around the lists. Remove blank lines between term lines and
definition blocks. Use four spaces for indentation.
This is almost completely whitespace and paragraph reflow changes.
Having first a list of prefixes followed by detailed descriptions was
viable when we didn't have all that many prefixes. Now, arranging the
prefix descriptions in a definition list makes more sense.
While at it, include all the supported prefix forms, especially some
missing regex ones.
If you're going to store the cleartext index of an encrypted message,
in most situations you might just as well store the session key.
Doing this storage has efficiency and recoverability advantages.
Combined with a schedule of regular OpenPGP subkey rotation and
destruction, this can also offer security benefits, like "deletable
e-mail", which is the store-and-forward analog to "forward secrecy".
But wait, i hear you saying, i have a special need to store cleartext
indexes but it's really bad for me to store session keys! Maybe
(let's imagine) i get lots of e-mails with incriminating photos
attached, and i want to be able to search for them by the text in the
e-mail, but i don't want someone with access to the index to be
actually able to see the photos themselves.
Fret not, the next patch in this series will support your wacky
uncommon use case.
The stashed session keys are stored internally as notmuch properties.
So a user or developer who is reading about those properties might
want to understand how they fit into the bigger picture.
Note here that decrypting with a stored session key no longer needs
-decrypt for "notmuch show" and "notmuch reply".
the command-line interface for indexing (reindex, new, insert) used
--try-decrypt; and the configuration records used index.try_decrypt.
But by comparison with "show" and "reply", there doesn't seem to be
any reason for the "try" prefix.
This changeset adjusts the command-line interface and the
configuration interface.
For the moment, i've left indexopts_{set,get}_try_decrypt alone. The
subsequent changeset will address those.
When doing any decryption, if the notmuch database knows of any
session keys associated with the message in question, try them before
defaulting to using default symmetric crypto.
This changeset does the primary work in _notmuch_crypto_decrypt, which
grows some new parameters to handle it.
The primary advantage this patch offers is a significant speedup when
rendering large encrypted threads ("notmuch show") if session keys
happen to be cached.
Additionally, it permits message composition without access to
asymmetric secret keys ("notmuch reply"); and it permits recovering a
cleartext index when reindexing after a "notmuch restore" for those
messages that already have a session key stored.
Note that we may try multiple decryptions here (e.g. if there are
multiple session keys in the database), but we will ignore and throw
away all the GMime errors except for those that come from last
decryption attempt. Since we don't necessarily know at the time of
the decryption that this *is* the last decryption attempt, we'll ask
for the errors each time anyway.
This does nothing if no session keys are stashed in the database,
which is fine. Actually stashing session keys in the database will
come as a subsequent patch.
By default, notmuch won't try to decrypt on indexing. With this
patch, we make it possible to indicate a per-database preference using
the config variable "index.try_decrypt", which by default will be
false.
At indexing time, the database needs some way to know its internal
defaults for how to index encrypted parts. It shouldn't be contingent
on an external config file (since that can't be retrieved from the
database object itself), so we store it in the database.
This behaves similarly to the query.* configurations, which are also
stored in the database itself, so we're not introducing any new
dependencies by requiring that it be stored in the database.
If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.
If we can decrypt, we add the property index.decryption=success.
If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.
Note that a single message may have both values of the
"index.decryption" property: "success" and "failure". For example,
consider a message that includes multiple layers of encryption. If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").
Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing. This means it will subsequently
correspond to the actual semantics of the stored index.
This allows us to create new properties that will be automatically set
during indexing, and cleared during re-indexing, just by choice of
property name.
the idea is that you can run
% notmuch search subject:/<your-favourite-regexp>/
% notmuch search from:/<your-favourite-regexp>/
or
% notmuch search subject:"your usual phrase search"
% notmuch search from:"usual phrase search"
This feature is only available with recent Xapian, specifically
support for field processors is needed.
It should work with bindings, since it extends the query parser.
This is easy to extend for other value slots, but currently the only
value slots are date, message_id, from, subject, and last_mod. Date is
already searchable; message_id is left for a followup commit.
This was originally written by Austin Clements, and ported to Xapian
field processors (from Austin's custom query parser) by yours truly.