Commit graph

48 commits

Author SHA1 Message Date
David Bremner
011d06f4d6 lib/parse-sexp: 'starts-with' wildcard searches
The many tests potentially overkill, but they could catch typos in the
prefixes table. As a simplifying assumption, for now we assume a
single argument to the wildcard operator, as this matches the Xapian
semantics. The name 'starts-with' is chosen to emphasize the supported
case of wildcards in currrent (1.4.x) Xapian.
2021-09-04 17:07:19 -07:00
David Bremner
8322f536f5 lib/parse-sexp: add term prefix backed fields
We use "boolean" to describe fields that should generate terms
literally without stemming or phrase splitting.  This terminology
might not be ideal but it is already enshrined in
notmuch-search-terms(7).
2021-09-04 17:07:19 -07:00
David Bremner
90d9c2ad5c lib/parse-sexp: support phrase queries.
Anything that is quoted or not purely word characters is considered a
phrase.  Phrases are not stemmed, because the stems do not have
positional information in the database. It is less efficient to scan
the term twice, but it avoids a second pass to add prefixes, so maybe
it balances out. In any case, it seems unlikely query parsing is very
often a bottleneck.
2021-09-04 17:07:19 -07:00
David Bremner
200e164dc7 lib/parse-sexp: support subject field
The broken tests are because we do not yet handle phrase searches.
2021-09-04 17:07:19 -07:00
David Bremner
f83cd2a05a lib/parse-sexp: support and, not, and or.
All operations and (Xapian) fields will eventually have an entry in
the prefixes table. The flags field is just a placeholder for now, but
will eventually distinguish between various kinds of prefixes.
2021-09-04 17:07:19 -07:00
David Bremner
a2785c3919 lib/parse-sexp: stem unquoted atoms
This is somewhat less DWIM than the Xapian query parser, but it has
the advantage of simplicity.
2021-09-04 17:07:19 -07:00
David Bremner
be7e83de96 lib/parse-sexp: parse single terms and the empty list.
There is not much of a parser here yet, but it already does some
useful error reporting. Most functionality sketched in the
documentation is not implemented yet; detailed documentation will
follow with the implementation.
2021-09-04 17:07:19 -07:00
Jani Nikula
ff4e81ac57 doc: cross-reference notmuch man pages with actual links
Add internal hyperlink targets for man pages and cross-reference them
using the any role reference. There are a number of alternatives to
accomplish this, but this seems like the combination that retains the
man page section number and the same boldface style in the man pages.

As a bonus, we get sanity checking on the links; for example
notmuch-search-terms.rst had a reference to notmuch-properties(1)
i.e. the wrong section.

The obvious semantic follow-up change would be to only have meaningful
"see also" references instead of having them all everywhere.
2021-05-22 16:38:56 -03:00
Jani Nikula
3baa61e0e5 doc: use manpage role references to external man pages
Using manpage role references generates helpful links in html
documentation, while retaining the same boldface style in the man
pages.

The external man page site is configurable. The Debian manpage site
seems like a good fit for Notmuch.
2021-05-22 09:56:52 -03:00
Tomi Ollila
507d2f07a6 doc: field processor support now always included, adjust manual pages
The features that require field processor support, are now just
documented w/o mentioning **Xapian Field Processors**' is needed
for those.

Replaced "compact" and "field_processor" with "retry_lock" in
build_with config option, as it is currently the only one that
is optionally excluded. The former 2 are now documented as
features always included.

Dropped one 'we' "passive" in notmuch-search-terms.rst. It was the
only one, and inconsistent with rest of the documentation in that
file.

Dropped message about conditional open-ended ranges support, as
those are now always supported.
2020-06-06 07:54:34 -03:00
Daniel Kahn Gillmor
4b1a8fd183 index: repair "Mixed Up" messages before indexing.
When encountering a message that has been mangled in the "mixed up"
way by an intermediate MTA, notmuch should instead repair it and index
the repaired form.

When it does this, it also associates the index.repaired=mixedup
property with the message.  If a problem is found with this repair
process, or an improved repair process is proposed later, this should
make it easy for people to reindex the relevant message.  The property
will also hopefully make it easier to diagnose this particular problem
in the future.

Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
2019-09-15 19:07:06 -04:00
Daniel Kahn Gillmor
9829533e92 index: avoid indexing legacy-display parts
When we notice a legacy-display part during indexing, it makes more
sense to avoid indexing it as part of the message body.

Given that the protected subject will already be indexed, there is no
need to index this part at all, so we skip over it.

If this happens during indexing, we set a property on the message:
index.repaired=skip-protected-headers-legacy-display

Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
2019-09-01 08:45:30 -03:00
Daniel Kahn Gillmor
1b29822cf5 repair: set up codebase for repair functionality
This adds no functionality directly, but is a useful starting point
for adding new repair functionality.

Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
2019-09-01 08:20:25 -03:00
David Bremner
9dedb23b47 doc: document user header indexing.
It's a bit odd that the primary documentation is in notmuch-config,
but it is consistent with the "query:" prefix.
2019-05-25 07:21:21 -03:00
David Bremner
319dd95ebb lib: add 'body:' field, stop indexing headers twice.
The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
2019-04-17 08:48:16 -03:00
Daniel Kahn Gillmor
fd3c93650d doc: clean up manpages
Many of the manpages didn't treat literal text as literal text.  I've
tried to normalize some of the restructured text to make it a bit more
regular.

several of the synopsis lines are still untouched by this cleanup, but
i'm not sure what the right way to represent those is in .rst,
actually.

In particular find that if i rebuild the manpages, sometimes i end up
with some of the synopsis lines showing – (U+2013 EN DASH) where they
should have -- (2 × U+002D HYPHEN-MINUS) in the generated nroff
output, though i have not tracked down the source of this error yet.
2018-06-24 21:59:37 -03:00
David Bremner
f2e6f76a04 doc: document thread subqueries
Mention both performance and quoting issues.
2018-05-07 08:42:53 -03:00
David Bremner
20ba0b7dfa doc: add a section on quoting to notmuch-search-terms(7)
I think we've diverged enough from the Xapian query parser
that we can't rely on that syntax description [1]. As far as I can
tell, [1] also only discusses quotes in the context of phrases.

[1]: https://xapian.org/docs/queryparser.html
2018-04-24 23:08:10 -03:00
Matthew Lear
0cbe982bfd Clarify the syntax required when searching using timestamps.
Need to be clearer about specifying time ranges using timestamps.
Legacy syntax which predates the date prefix is still supported, but
timestamps used in conjunction with the date prefix require additional
syntax.
2018-03-24 20:07:20 -03:00
Jani Nikula
e5e252de55 doc: unify definition list usage across man pages
Make all parameter descriptions etc. use reStructuredText definition
lists with uniform style and indentation. Remove redundant indentation
from around the lists. Remove blank lines between term lines and
definition blocks. Use four spaces for indentation.

This is almost completely whitespace and paragraph reflow changes.
2017-12-31 09:06:11 -04:00
Jani Nikula
89f651a403 doc: arrange search prefix documentation in a definition list
Having first a list of prefixes followed by detailed descriptions was
viable when we didn't have all that many prefixes. Now, arranging the
prefix descriptions in a definition list makes more sense.

While at it, include all the supported prefix forms, especially some
missing regex ones.
2017-12-14 21:41:39 -04:00
Daniel Kahn Gillmor
29648a137c crypto: actually stash session keys when decrypt=true
If you're going to store the cleartext index of an encrypted message,
in most situations you might just as well store the session key.
Doing this storage has efficiency and recoverability advantages.

Combined with a schedule of regular OpenPGP subkey rotation and
destruction, this can also offer security benefits, like "deletable
e-mail", which is the store-and-forward analog to "forward secrecy".

But wait, i hear you saying, i have a special need to store cleartext
indexes but it's really bad for me to store session keys!  Maybe
(let's imagine) i get lots of e-mails with incriminating photos
attached, and i want to be able to search for them by the text in the
e-mail, but i don't want someone with access to the index to be
actually able to see the photos themselves.

Fret not, the next patch in this series will support your wacky
uncommon use case.
2017-12-08 08:08:47 -04:00
Daniel Kahn Gillmor
f845fb2a51 cli/show, reply: document use of stashed session keys in notmuch-properties
The stashed session keys are stored internally as notmuch properties.
So a user or developer who is reading about those properties might
want to understand how they fit into the bigger picture.

Note here that decrypting with a stored session key no longer needs
-decrypt for "notmuch show" and "notmuch reply".
2017-12-08 08:08:46 -04:00
Daniel Kahn Gillmor
d3964e81ac indexing: Change from try_decrypt to decrypt
the command-line interface for indexing (reindex, new, insert) used
--try-decrypt; and the configuration records used index.try_decrypt.
But by comparison with "show" and "reply", there doesn't seem to be
any reason for the "try" prefix.

This changeset adjusts the command-line interface and the
configuration interface.

For the moment, i've left indexopts_{set,get}_try_decrypt alone.  The
subsequent changeset will address those.
2017-12-08 08:05:53 -04:00
Daniel Kahn Gillmor
a990585408 crypto: use stashed session-key properties for decryption, if available
When doing any decryption, if the notmuch database knows of any
session keys associated with the message in question, try them before
defaulting to using default symmetric crypto.

This changeset does the primary work in _notmuch_crypto_decrypt, which
grows some new parameters to handle it.

The primary advantage this patch offers is a significant speedup when
rendering large encrypted threads ("notmuch show") if session keys
happen to be cached.

Additionally, it permits message composition without access to
asymmetric secret keys ("notmuch reply"); and it permits recovering a
cleartext index when reindexing after a "notmuch restore" for those
messages that already have a session key stored.

Note that we may try multiple decryptions here (e.g. if there are
multiple session keys in the database), but we will ignore and throw
away all the GMime errors except for those that come from last
decryption attempt.  Since we don't necessarily know at the time of
the decryption that this *is* the last decryption attempt, we'll ask
for the errors each time anyway.

This does nothing if no session keys are stashed in the database,
which is fine.  Actually stashing session keys in the database will
come as a subsequent patch.
2017-12-04 21:48:31 -04:00
Daniel Kahn Gillmor
d0da7a0a1c config: define new option index.try_decrypt
By default, notmuch won't try to decrypt on indexing.  With this
patch, we make it possible to indicate a per-database preference using
the config variable "index.try_decrypt", which by default will be
false.

At indexing time, the database needs some way to know its internal
defaults for how to index encrypted parts.  It shouldn't be contingent
on an external config file (since that can't be retrieved from the
database object itself), so we store it in the database.

This behaves similarly to the query.* configurations, which are also
stored in the database itself, so we're not introducing any new
dependencies by requiring that it be stored in the database.
2017-10-21 19:54:33 -03:00
Daniel Kahn Gillmor
4dfcc8c9b2 crypto: index encrypted parts when indexopts try_decrypt is set.
If we see index options that ask us to decrypt when indexing a
message, and we encounter an encrypted part, we'll try to descend into
it.

If we can decrypt, we add the property index.decryption=success.

If we can't decrypt (or recognize the encrypted type of mail), we add
the property index.decryption=failure.

Note that a single message may have both values of the
"index.decryption" property: "success" and "failure".  For example,
consider a message that includes multiple layers of encryption.  If we
manage to decrypt the outer layer ("index.decryption=success"), but
fail on the inner layer ("index.decryption=failure").

Because of the property name, this will be automatically cleared (and
possibly re-set) during re-indexing.  This means it will subsequently
correspond to the actual semantics of the stored index.
2017-10-21 19:53:19 -03:00
Daniel Kahn Gillmor
0bb05ff693 reindex: drop all properties named with prefix "index."
This allows us to create new properties that will be automatically set
during indexing, and cleared during re-indexing, just by choice of
property name.
2017-10-21 19:53:08 -03:00
Daniel Kahn Gillmor
6575b7eb31 doc: add notmuch-properties(7)
We will want a user-facing place to record details about the use of
notmuch properties shortly.  This establishes a new manual page for
that purpose.
2017-10-21 19:52:55 -03:00
Daniel Kahn Gillmor
6499fce391 doc: make SEE ALSO references one-per-line
This will make future diffs cleaner, make it easier to keep them
alphabetical, and make it easier to scan and search the documentation
sources.
2017-10-18 22:36:39 -03:00
Jakub Wilk
073188e690 doc: fix typos 2017-09-28 09:00:20 -03:00
Daniel Kahn Gillmor
e5beec39d6 add "notmuch reindex" subcommand
This new subcommand takes a set of search terms, and re-indexes the
list of matching messages.
2017-08-01 21:17:47 -04:00
David Bremner
55524bb063 lib: regexp matching in 'subject' and 'from'
the idea is that you can run

% notmuch search subject:/<your-favourite-regexp>/
% notmuch search from:/<your-favourite-regexp>/

or

% notmuch search subject:"your usual phrase search"
% notmuch search from:"usual phrase search"

This feature is only available with recent Xapian, specifically
support for field processors is needed.

It should work with bindings, since it extends the query parser.

This is easy to extend for other value slots, but currently the only
value slots are date, message_id, from, subject, and last_mod. Date is
already searchable;  message_id is left for a followup commit.

This was originally written by Austin Clements, and ported to Xapian
field processors (from Austin's custom query parser) by yours truly.
2017-03-03 17:46:48 -04:00
Daniel Kahn Gillmor
693ca8d8a8 add property: query prefix to search for specific properties
We want to be able to query the properties directly, like:

   notmuch count property:foo=bar

which should return a count of messages where the property with key
"foo" has value equal to "bar".
2016-09-21 18:14:25 -03:00
Daniel Kahn Gillmor
d080b4100a doc: clean up boolean vs. probabilistic prefixes
sphinx-build emits a minor warning:

[...]doc/man7/notmuch-search-terms.rst:223: WARNING: Block quote ends without a blank line; unexpected unindent.

And the tabular representation of boolean or probabilistic prefixes
currently renders like this when i view it in man:

       ┌───────────────────────────┬────────────────────────────┐
       │Boolean                    │ Probabilistic              │
       └───────────────────────────┴────────────────────────────┘

       │          tag: id:         │           from: to:        │
       │                           │                            │
       │       thread:     folder: │        subject:    attach‐ │
       │       path:               │        ment: mimetype:     │
       └───────────────────────────┴────────────────────────────┘

This isn't just ugly: it's confusing, because it seems to imply that
some of the prefixes in the left-hand column are somehow related to
specific other prefixes in the right-hand column.

The Definition List representation introduced by this patch should be
simpler for readers to understand, and doesn't have the warning.
2016-06-07 08:00:40 -03:00
David Bremner
b9bf3f44ea lib: add support for named queries
This relies on the optional presense of xapian field processors, and the
library config API.
2016-05-25 07:40:44 -03:00
David Bremner
bbf6069252 lib: optionally support single argument date: queries
This relies on the FieldProcessor API, which is only present in xapian
>= 1.3.
2016-05-08 08:17:07 -03:00
Jani Nikula
bf719963a7 man: clarify the parameters for lastmod: range query
<since> and <until> for the lastmod: prefix right below the date:
prefix description give the impression one could use last modified
dates to lastmod: which is not at all the case. Use
<initial-revision>..<final-revision> instead.
2015-10-21 09:13:33 -03:00
Jani Nikula
23b8ed610a lib: add support for date:<expr>..! to mean date:<expr>..<expr>
It doesn't seem likely we can support simple date:<expr> expanding to
date:<expr>..<expr> any time soon. (This can be done with a future
version of Xapian, or with a custom query query parser.) In the mean
time, provide shorthand date:<expr>..! to mean the same. This is
useful, as the expansion takes place before interpetation, and we can
use, for example, date:yesterday..! to match from beginning of
yesterday to end of yesterday.

Idea from Mark Walters <markwalters1009@gmail.com>.
2015-09-25 21:55:24 -03:00
Austin Clements
cb08a2ee01 lib: Add "lastmod:" queries for filtering by last modification
The implementation is essentially the same as the date range search
prior to Jani's fancy date parser.
2015-08-14 18:23:49 +02:00
David Bremner
d7b6e0cae7 doc: update list of prefixes
'attachement' missed a colon, and 'mimetype' was not added to this table
at all.
2015-02-24 08:29:01 +01:00
David Bremner
682a362c85 doc: typo fix for prefix discussion. 2015-02-24 08:29:01 +01:00
David Bremner
4313be0a0c doc: add more information on operators.
More material borrowed from the wiki page on "searching"
2015-02-24 08:29:01 +01:00
David Bremner
7fa58b792c doc: add material on stemming and wildcards
This is lightly massaged from the searching page on the wiki.
2015-02-24 08:29:01 +01:00
David Bremner
4d5477a3d5 doc: add details about Xapian search syntax
Questions related to the way that probabilistic prefixes and phrases
are handled come up quite often and it is nicer to have the documentation self contained.  Hopefully putting it in subsections prevents it from being overwhelming.
2015-01-25 18:36:47 +01:00
Todd
8fb1cbc1c2 Update documentation
Adds new entry to the NEWS file, and updates the search terms section
of the man page.  The search terms section needs to be updated again
once the new section in the documentation covering probablistic terms
has been committed.
2015-01-24 16:51:20 +01:00
Jani Nikula
0969c8be09 man: update man pages for folder: and path: search terms
Text from review by Austin Clements <amdragon@MIT.EDU>.
2014-03-11 19:51:22 -03:00
David Bremner
d736260385 doc: convert sphinx based docs
This is the output from sphinx-quickstart, massaged a bit, along with
our existing man pages converted to rst.

A skeleton notmuch-emacs manual is also included. It is not suitable
for end user use yet.
2014-03-09 10:41:08 -03:00