This script generates reports based on notmuch queries, and doesn't
really have anything to do with nmbug, except for sharing the NMBGIT
environment variable.
For example:
"query": ["tag:a", "tag:b or tag:c"]
is now converted to:
( tag:a ) and ( tag:b or tag:c )
instead of the old:
tag:a and tag:b or tag:c
This helps us avoid confusion due to Xapian's higher-precedence AND
[1], where the old query would be interpreted as:
( tag:a and tag:b ) or tag:c
[1]: http://xapian.org/docs/queryparser.html
These were broken by b70386a4 (Move the generated date from the top of
the page to the footer, 2014-05-31), which moved 'Generated ...' to
the footer with the opening tag, but didn't replace the blurb opening
tag or add a closing tag after 'Generated ...'.
We've been leading off with h2s since 3e5fb88f (contrib/nmbug: add
nmbug-status script, 2012-07-07), but the semantically-correct headers
are:
<h1>{title}</h1>
...
<h2>Views</h2>
...
<h3>View 1</h3>
...
<h3>View 2</h3>
...
We can always add additional CSS if the default h1 formatting is too
intense.
We already have a 'filename' variable with the name, so stay DRY and
use that variable here.
Also fix a missing-whitespace error from bed8b674 (nmbug-status:
Clarify errors for illegible configs, 2014-05-10), wrapping on the
sentence to match similar error-generation earlier in this function.
Let each view have a "sort" key, typically used with values
"oldest-first" or "newest-first" (although all values in Query.SORT
are accepted), and sort the results accordingly. Oldest first remains
the default.
The dynamic approach of mapping sort values is as suggested by
W. Trevor King <wking@tremily.us>.
When loading configs from Git, the bare branch name (without a
refs/heads/ prefix or similar) matches all branches of that name
(including remote-tracking branches):
.nmbug $ git show-ref config
48f3bbf1d1492e5f3d2f01de6ea79a30d3840f20 refs/heads/config
48f3bbf1d1492e5f3d2f01de6ea79a30d3840f20 refs/remotes/origin/config
4b6dbd9ffd152e7476f5101eff26747f34497cee refs/remotes/wking/config
Instead of relying on the ordering of the matching references, use
--heads to ensure we only match local branches.
Carl Worth pointed out that errors like:
$ ./nmbug-status
fatal: Not a git repository: '/home/cworth/.nmbug'
fatal: Not a git repository: '/home/cworth/.nmbug'
Traceback (most recent call last):
File "./nmbug-status", line 254, in <module>
config = read_config(path=args.config)
File "./nmbug-status", line 73, in read_config
return json.load(fp)
File "/usr/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
are not particularly clear. With this commit, we'll get output like:
$ ./nmbug-status
fatal: Not a git repository: '/home/wking/.nmbug'
No local branch 'config' in /home/wking/.nmbug. Checkout a local
config branch or explicitly set --config.
which is much more accessible. I've also added user-friendly messages
for a number of other config-parsing errors.
Our repository [1] has a post-update hook that rebuilds the status
page after each push. Since that may happen several times a day, we
might as well show the build time (as well as the date) in the footer.
The trailing 'Z' is the ISO 8601 designator for UTC. Now that we're
showing times, it's nice to be explicit about the timezone we're
using.
The rename from date -> datetime gives us backward-compatibility for
folks that *do* only want the date. We keep the old date formatting
to support those folks.
[1]: http://nmbug.tethera.net/git/nmbug-tags.git
Rather than splitting this context into header-only and footer-only
groups, just dump it all in a shared dict. This will make it easier
to eventually split the header/footer templates out of this script
(e.g. if we want to load them from the config file).
It's useful reference information, but anyone who wants it will look
for and find it. We don't need this front-and-center. Follow the
pattern set by our header template with a triple-quoted string.
The gray <hr> styling is less agressive. IE uses 'color' for drawing
the rule, while Gecko and Opera use the border or 'background-color'
[1].
[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=239386
Prefer a docstring to a header comment so we can use it as the
ArgumentParser description (formatted with 'nmbug-status --help').
Script readers still have it near the top of the file. Since it's a
docstring, use PEP 257's summary-line-and-body format [1].
[1]: http://legacy.python.org/dev/peps/pep-0257/#multi-line-docstrings
Make nmbug-status more generally usable outside of nmbug by not
hardcoding notmuch related things.
This lets anyone publish html search views to mailing list messages
with a custom config file, independent of nmbug.
Python dict() object does not have __values__() function which
OrderedDict().values() (the stub provided in nmbug-status) could call
to provide ordered list of values. By renaming this thinko to
values() will make our stub work as expected -- dict items listed out
in order those were added to the dictionary.
David [1] and Tomi [2] both feel that the user's choice of LANG is not
explicit enough to have such a strong effect on nmbug-status. For
example, cron jobs usually default to LANG=C, and that is going to
give you ASCII output:
$ LANG=C python -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968
Trying to print Unicode author names (and other strings) in that
encoding would crash nmbug-status with a UnicodeEncodeError. To avoid
that, this patch hardcodes UTF-8, which can handle generic Unicode,
and is the preferred encoding (regardless of LANG settings) for
everyone who has chimed in on the list so far. I'd prefer trusting
LANG, but in the absence of any users that prefer non-UTF-8 encodings
I'm fine with this approach.
While we could achieve the same effect on the output content by
dropping the previous patch (nmbug-status: Encode output using the
user's locale), Tomi also wanted UTF-8 hardcoded as the config-file
encoding [2]. Keeping the output encoding patch and then adding this
to hardcode both the config-file and output encodings at once seems
the easiest route, now that fd29d3f (nmbug-status: Decode Popen output
using the user's locale, 2014-02-10) has landed in master.
[1]: id="877g8z4v4x.fsf@zancas.localnet"
http://article.gmane.org/gmane.mail.notmuch.general/17202
[2]: id="m2vbwj79lu.fsf@guru.guru-group.fi"
http://article.gmane.org/gmane.mail.notmuch.general/17209
Instead of always writing UTF-8, allow the user to configure the
output encoding using their locale. This is useful for previewing
output in the terminal, for poor souls that don't use UTF-8 locales
;).
We already had the tbody with a blank row separating threads (which is
not colored); this commit adds a bit of spacing to separate messages
within a thread. It will also add a bit of colored padding above the
first message and below the final message, but the main goal is to add
padding *between* two-row message blocks.
<--- new padding
thread-1, message-1, row-1 (class="message-first")
thread-1, message-1, row-2 (class="message-last")
<--- new padding
spacer tbody with a blank row
<--- new padding
thread-2, message-1, row-1 (class="message-first")
thread-2, message-1, row-2 (class="message-last")
<--- new padding
<--- new padding
thread-2, message-2, row-1 (class="message-first")
thread-2, message-2, row-2 (class="message-last")
<--- new padding
'message-id' and 'from' now have sensitive characters escaped using
xml.sax.saxutils.escape [1]. The 'subject' data was already being
converted to a link into Gmane; I've escape()d that too, so it doesn't
need to be handled ain the same block as 'message-id' and 'from'.
This prevents broken HTML by if subjects etc. contain characters that
would otherwise be interpreted as HTML markup.
[1]: http://docs.python.org/3/library/xml.sax.utils.html#xml.sax.saxutils.escape
Also allow manual id overrides from the JSON config. Sluggin avoids
errors like:
Bad value '#Possible bugs' for attribute href on element a:
Whitespace in fragment component. Use %20 in place of spaces.
from http://validator.w3.org.
I tried just quoting the titles (e.g. 'Possible%20bugs'), but that
didn't work (at least with Firefox 24.2.0). Slugging avoids any
ambiguity over when the quotes are expanded in the client. The specs
are unclear about quoting, saying only [1]:
Value: Any string, with the following restrictions:
must be at least one character long
must not contain any space characters
[1]: http://dev.w3.org/html5/markup/global-attributes.html#common.attrs.id
HTML 5 for the win :). I also de-namespaced the language; the HTML 5
spec allows a vestigial xml:lang attribute, but it's a no-op [1], so I
stripped it.
This shouldn't break anything at tethera, which already serves the
status as text/html:
$ wget -S http://nmbug.tethera.net/status/
--2014-02-02 21:20:39-- http://nmbug.tethera.net/status/
Resolving nmbug.tethera.net... 87.98.215.224
Connecting to nmbug.tethera.net|87.98.215.224|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Type: text/html
...
This also matches the Content-Type in the generated HTML's http-equiv
meta.
[1]: http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes
Tomi Ollila and David Bremner (and presumably others) are running
Python 2.6 on their nmbug-status boxes, so it makes sense to keep
support for that version. This commit adds a really minimal
OrderedDict stub (e.g. it doesn't handle key removal), but it gets the
job done for Page._get_threads. Once we reach a point where Python
2.6 is no longer important (it's already out of it's security-fix
window [1]), we can pull this stub back out.
[1]: http://www.python.org/download/releases/2.6.9/
I was having trouble understanding the logic of the longish print_view
function, so I refactored the output generation into modular bits.
The basic text rendering is handled by Page, which has enough hooks
that HtmlPage can borrow the logic and slot-in HTML generators.
By modularizing the logic it should also be easier to build other
renderers if folks want to customize the layout for other projects.
Timezones
=========
This commit has not effect on the output, except that some dates have
been converted from the sender's timezone to UTC due to:
- val = m.get_header(header)
- ...
- if header == 'date':
- val = str.join(' ', val.split(None)[1:4])
- val = str(datetime.datetime.strptime(val, '%d %b %Y').date())
...
+ value = str(datetime.datetime.utcfromtimestamp(
+ message.get_date()).date())
I also tweaked the HTML header date to be utcnow instead of the local
now() to make all times independent of the generator's local time.
This matches Gmane, which converts all Date headers to UTC (although
they use a 'GMT' suffix). Notmuch uses
g_mime_utils_header_decode_date to calculate the UTC timestamps, but
uses a NULL tz_offset which drops the information we'd need to get
back to the sender's local time [1]. With the generator's local time
arbitrarily different from the sender's and viewer's local time,
sticking with UTC seems the best bet.
[1]: https://developer.gnome.org/gmime/stable/gmime-gmime-utils.html#g-mime-utils-header-decode-date
Make this all one big string, using '...{date}...'.format(date=...) to
inject the date [1]. This syntax was added in Python 2.6, and is
preferred to %-formatting in Python 3 [1].
[1]: http://docs.python.org/2/library/stdtypes.html#str.format
The database in only used for notmuch.Query, so there's no need for
write access. This allows nmbug-status to run while the database is
being updated, without raising:
A Xapian exception occurred opening database: Unable to get write lock on …: already locked
Traceback (most recent call last):
File "./nmbug-status", line 182, in <module>
db = notmuch.Database(mode=notmuch.Database.MODE.READ_WRITE)
File "/…/notmuch/database.py", line 154, in __init__
self.open(path, mode)
File "/…/notmuch/database.py", line 214, in open
raise NotmuchError(status)
notmuch.errors.XapianError
The definitions of Thread, output_with_separator, and print_view were
between the main argparse and view-printing code. Group them together
with our existing read_config at the top of the module, which makes
for easier reading in the main section.
I also:
* Made 'headers' a print_view argument instead of a module-level
global. The list -> tuple conversion avoids having a mutable
default argument, which makes some people jumpy ;).
* Made 'db' a print_view argument instead of relying on the global
namespace to access it from print_view.
Now the suggested usage (listed by 'nmbug-status --help') is:
usage: nmbug-status [-h] [--text] [--config PATH] [--list-views]
[--get-query VIEW]
instead of the less obvious:
usage: nmbug-status [-h] [--text] [--config CONFIG] [--list-views]
[--get-query GET_QUERY]
Avoid:
$ ./nmbug-status --list-views
Traceback (most recent call last):
File "./nmbug-status", line 47, in <module>
'cat-file', 'blob', sha1+':status-config.json'],
TypeError: can't concat bytes to str
by explicitly converting the byte-stream read from Popen into a
Unicode string. On Python 2, this conversion is str -> unicode; on
Python 3 it is bytes -> str.
_ENCODING is derived from the user's locale (or system default) in an
attempt to match Git's output encoding. It may be more robust to skip
the encoding/decoding by using a Python wrapper like pygit2 [1] for
Git access. That's a fairly heavy dependency though, and using the
locale will probably work.
[1]: http://www.pygit2.org/
There seems to be consensus to use presence in contrib as
documentation of limited support by the notmuch developers; in fact
nmbug is pretty integrated into our current development process, so
devel seems more appropriate.
2013-02-16 07:54:33 -04:00
Renamed from contrib/nmbug/nmbug-status (Browse further)