emacs: Observe the charset of MIME parts when reading them.

`notmuch--get-bodypart-raw' previously assumed that all non-binary MIME
parts could be successfully read by assuming that they were UTF-8
encoded. This was demonstrated to be wrong, specifically when a part was
marked as ISO8859-1 and included accented characters (which were
incorrectly rendered as a result).

Rather than assuming UTF-8, attempt to use the part's declared charset
when reading it, falling back to US-ASCII if the declared charset is
unknown, unsupported or invalid.
This commit is contained in:
David Edmondson 2016-04-30 07:51:47 +01:00 committed by David Bremner
parent ea5caecec5
commit fdce7eb545

View file

@ -23,6 +23,7 @@
;;; Code: ;;; Code:
(require 'mm-util)
(require 'mm-view) (require 'mm-view)
(require 'mm-decode) (require 'mm-decode)
(require 'cl) (require 'cl)
@ -572,7 +573,20 @@ the given type."
,@(when process-crypto '("--decrypt")) ,@(when process-crypto '("--decrypt"))
,(notmuch-id-to-query (plist-get msg :id)))) ,(notmuch-id-to-query (plist-get msg :id))))
(coding-system-for-read (coding-system-for-read
(if binaryp 'no-conversion 'utf-8))) (if binaryp 'no-conversion
(let ((coding-system (mm-charset-to-coding-system
(plist-get part :content-charset))))
;; Sadly,
;; `mm-charset-to-coding-system' seems
;; to return things that are not
;; considered acceptable values for
;; `coding-system-for-read'.
(if (coding-system-p coding-system)
coding-system
;; RFC 2047 says that the default
;; charset is US-ASCII. RFC6657
;; complicates this somewhat.
'us-ascii)))))
(apply #'call-process notmuch-command nil '(t nil) nil args) (apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)))))) (buffer-string))))))
(when (and cache data) (when (and cache data)