Previously, we used a variety of ad-hoc canonicalizations for JSON
output in the test suite, but were ultimately very sensitive to JSON
irrelevancies such as whitespace. This introduces a new test
comparison function, test_expect_equal_json, that first pretty-prints
*both* the actual and expected JSON and the compares the result.
The current implementation of this simply uses Python's json.tool to
perform pretty-printing (with a fallback to the identity function if
parsing fails). However, since the interface it introduces is
semantically high-level, we could swap in other mechanisms in the
future, such as another pretty-printer or something that does not
re-order object keys (if we decide that we care about that).
In general, this patch does not remove the existing ad-hoc
canonicalization because it does no harm. We do have to remove the
newline-after-comma rule from notmuch_json_show_sanitize and
filter_show_json because it results in invalid JSON that cannot be
pretty-printed.
Most of this patch simply replaces test_expect_equal and
test_expect_equal_file with test_expect_equal_json. It changes the
expected JSON in a few places where sanitizers had placed newlines
after commas inside strings.
This new JSON format for replies includes headers generated for a
reply message as well as the headers of the original message. Using
this data, a client can intelligently create a reply. For example, the
emacs client will be able to create replies with quoted HTML parts by
parsing the HTML parts.
This is fully compatible for root and leaf parts, but now has proper
support for interior parts. This requires some design decisions that
were guided by what I would want if I were to save a part.
Specifically:
- Leaf parts are printed without headers and with transfer decoding.
This is what makes sense for saving attachments. (Furthermore, the
transfer decoding is necessary since, without the headers, the
caller would not be able to interpret non-transfer-decoded output.)
- Message parts are printed with their message headers, but without
enclosing part headers. This is what makes sense for saving a
message as a whole (which is a message part) and for saving attached
messages. This is symmetric for whole messages and for attached
messages, though we special-case the whole message for performance
reasons (and corner-case correctness reasons: given malformed input,
GMime may not be able to reproduce it from the parsed
representation).
- Multipart parts are printed with their headers and all child parts.
It's not clear what the best thing to do for multipart is, but this
was the most natural to implement and can be justified because such
parts can't be interpreted without their headers.
As an added benefit, we can move the special-case code for part 0 into
the raw formatter.
Previously, there was only one CRLF between the terminating boundary
of the embedded multipart/alternative and the boundary of the
containing multipart. However, according the RFC 1341, 7.2.1:
The boundary must be followed immediately either by another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part
and
The CRLF preceding the encapsulation line is considered part of the
boundary so that it is possible to have a part that does not end
with a CRLF (line break).
Thus, there must be *two* CRLFs between these boundaries: one that
ends the terminating boundary and one that begins the enclosing
boundary.
While GMime accepted the message we had before, it could not produce
such a message.
notmuch show outputs the exclude flag so many tests using notmuch
show failed. This commit adds "excluded:0" or "excluded: false" to
the expected outputs. After this commit there should be no failing
tests.
This has three ramifications:
- Blank To and Cc headers are no longer output for messages.
- Dates are now canonicalized for messages, which means they always
have a day of the week and GMT is printed +0000 (never -0000)
- Invalid From message headers are handled slightly differently, since
they get parsed by GMime now instead of notmuch.
Previously, top-level message headers were printed as Subject, From,
To, Date, while embedded message headers were printed From, To,
Subject, Date. This makes both cases use the former order and updates
the tests accordingly.
Previously, top-level message headers were printed as Subject, From,
To, Date, while embedded message headers were printed From, To,
Subject, Date. This makes both cases use the former order and updates
the tests accordingly.
Strangely, the raw format also uses this function, so this also fixes
the two raw format tests affected by this change.
There's no reason to output "Non-text part:" lines for parts that are
not leaf nodes, eg. multipart/* and message/rfc822. We fix the text
here to test for their absence. The next patch will fix reply
accordingly.
The main goal of this overhaul is to define how message/rfc822 parts
should be handled. message/rfc822 parts should be output in a similar
fashion to the outer message, including some subset of the rfc822
headers. The following decisions about formatting of message/rfc822
parts were made:
The format and content of message/rfc822 parts shall be as similar as
possible to that of full messages. In particular, for formatted
outputs, the "content" of rfc822 part output should include "headers"
and "body" fields).
The "body" field shall include the body of the message.
The "headers" field shall include the following headers, since these
are the ones available from the GMimeMessage:
"From"
"To"
"Cc"
"Subject"
"Date"
However, for the case of --format=raw the raw rfc822 should be output,
including all headers.
A subset of relevant headers shall be output in reply.
The test embedded rfc822 message is also modified to be itself
multipart, so we can more fully test how all sub parts of the message
part are output.
Note added by Committer:
Currently, expect one test (--format=raw --part=3, rfc822 part) to fail.
The test message date, "Tue, 05 Jan 2001 15:43:57 -0000", is not
actually a real date. 05 Jan 2001 was in fact a Friday, not a
Tuesday. Date parsers (such as "date" in coreutils) will return "Fri"
as the day for this string, even if "Tue" is specified.
Also, the time zone "-0000" is actually always returned as "+0000", so
we change that here was well.
This will be relevant for later patches when we begin parsing rfc822
part headers, where gmime returns a parsed date string.
If we do want to test date parsing, we should do that in a separate
test.
There were two "--format=text --part=0" tests. One of them was
supposed to be a test for "--format=text --part=1".
There were also two errant "test_expect_equal_file OUTPUT EXPECTED"
lines, that are removed here.
Various typo fixes in documentation within the code that can be made
available to the user, (emacs function help strings, "notmuch help"
output, notmuch man page, etc.).
Signed-off-by: Pieter Praet <pieter@praet.org>
Edited-by: Carl Worth <cworth@cworth.org> Restricted to just
documentation and fixed fix of "comman" to "common" rather than
"command".
Again, this is a much cleaner and more thorough test, and in fact
exposes a bug in the format=text output, that will be fixed the next
commit. Because of this, some of the multipart tests currently fail.
Since commit 2f8871df6e notmuch has been
using a function (show_part_content) originally written only for text
parts to save all MIME parts. The problem with this is that this
function converts CRLF pairs to LF only and optionally converts to
UTF-8 encoding. These two conversions have the potential to corrupt
binary data when passed through the function.
This test demonstrates that corruption, and so fails currently, until
we fix the bug.
Not that it affects the correctness of the test, but it's nice to use
proper spelling. This kind of change could invalidate a signature on the
test message, but I think that would have happened previously when the
HTML part was added in the first place.
This patch adds the tag "signed" to messages with any multipart/signed
parts, and the tag "encrypted" to messages with any
multipart/encrypted parts. This only occurs when messages are indexed
during notmuch new, so a database rebuild is required to have old
messages tagged.
The example multipart message is made a bit more complicated by adding
a message/rfc822 message, and the all parts are output and tested in
all output formats.
Previously, notmuch show flattened all output, losing information
about the nesting of the MIME hierarchy. Now, the output is properly
nested, (both in the --format=text and --format=json output), so that
clients can analyze the original MIME structure.
Internally, this required splitting the final closing delimiter out of
the various show_part functions and putting it into a new
show_part_end function instead. Also, the show_part function now
accepts a new "first" argument that is set not only for the first MIME
part of a message, but also for each first MIME part within a series
of multipart parts. This "first" argument controls the omission of a
preceding comma when printing a part (for json).
Many thanks to David Edmondson <dme@dme.org> for originally
identifying the lack of nesting in the json output and submitting an
early implementation of this feature. Thanks as well to Jameson Graef
Rollins <jrollins@finestructure.net> for carefully shepherding David's
patches through a remarkably long review process, patiently explaining
them, and providing a cleaned up series that led to this final
implementation. Jameson also provided the new emacs code here.
Previously, the outer multipart part of any multipart/mixed,
multipart/signed, etc. MIME message was silently omitted from the
"notmuch show" output. This prevented any client from correctly
determining to which parts a signature applies, for example.
Now, we actually emit these parts as their own parts. The output is
still flattened---the contained parts are not yet included "within"
the multipart part---so it's still not possible to determine to which
parts a signature applies, but this is one step along the path.
The test suite is updated to reflect this change, (though we'll
eventually want to fix the emacs interface to not display buttons for
the multipart enclosure parts as there's nothing useful for the user
to actually do with them).
This tests "notmuch show" with both --format=text and --format=json on
a message with some non-trivial MIME multipart nesting, (multiple parts
within a multipart/mixed part which is within a multipart/signed part).
The test captures the current behavior (where only the leaf nodes of
the MIME structure are emitted as a flat list---the multipart parts
are effectively ignored). We plan to soon change the json output at
least to emit an actual hierarchy matching the MIME structure, (at
which point we will update this test).