DKIM Canonicalization – or – why Microsoft breaks your mail

thx1138
One of these things is just like the other

Canonicalization is about comparing things to see if they’re the same. Sometimes you want to do a “fuzzy” comparison, to see if two things are interchangeable for your purposes, even if they’re not exactly identical.
As a concrete example, these two email addresses:

  • (Steve) steve@wordtothewise.com
  • “Also Steve” <steve@WORDTOTHEWISE.COM>

They’re clearly not identical, but they’ll deliver to the same maibox.
I could compare them with a set of comparison rules (if the string to the left of the @ sign up to white space is the same in both, or one of them has a less than sign in front of it, and the string to the right of the @ is the same, compared case-insensitively, ….).
Or I could canonicalize both email addresses and see if the results are identical. A simple canonicalization algorithm might be “Remove anything in parentheses. Remove any quoted strings. Strip any whitespace and any greater than or less than signs at each end. Convert anything after the @ sign to lower-case”. That’ll give two canonicalized email addresses:

  • steve@wordtothewise.com
  • steve@wordtothewise.com

They’re trivially identical, so I know that the two email addresses I started with are interchangeable.
DKIM validation is all about comparing whether things have changed or not. The DKIM-Signature header contains a “fingerprint” of the canonicalized message body and of the canonicalized headers from when the mail was sent. If you canonicalize the body and headers that you received, take the fingerprint in the same way and that’s identical to the one in the DKIM-Signature header, you know the headers and body haven’t been modified since the message was signed (and then you can do some DNS lookups and some cryptography to find who signed the message).
Unfortunately, email was never designed to send messages unchanged. Intermediate servers will often “fix up” messages – by folding long lines, normalizing whitespace, adding missing headers or fixing up invalid ones, by re-encoding content to different encodings and all sorts of other changes. That would break a byte-for-byte comparison of the mail as sent and as received. And because we don’t have a copy of the mail as sent to compare with – we only have a “fingerprint” of it – we can’t do any fancy comparison. So we have to rely on the canonicalization, and hope that even after the “fix ups” made during delivery the canonicalized forms – and hence the fingerprints – will be identical.

DKIM canonicalization

DKIM defines two canonicalization algorithms for the body of the message, simple and relaxed.
Simple body canonicalization does very little: it just strips any blank lines at the end of the body. Relaxed body canonicalization strips those blank lines, and then replaces any run of white space – spaces or tabs – in the body with a single space. This means that any change in whitespace in the body, such as converting tabs to spaces, won’t affect the relaxed canonicalized body.
DKIM also defines two canonicalization algorithms for the headers of the message. They’re also called simple and relaxed, despite doing quite different things. (Yes, this is confusing.)
Simple header canonicalization is as simple as you can get. It makes no changes, so the headers must be byte-for-byte identical to match. Relaxed header canonicalization converts all header names to lower case, unfolds headers so each is a single line, replaces any run of white space with a single space character and removes any trailing whitespace on each line.
(See the DKIM spec if you want all the details.)
The simple takeaway from this is that simple canonicalization makes DKIM signatures that are broken by even trivial modifications in transit, while relaxed canonicalization makes them more robust.
The really simple takeaway is “use relaxed canonicalization”.

The c= field

The canonicalization you use is recorded in the c= field of the DKIM-Signature header, with the two canonicalization names separated by a slash, header first.
So “c=strict/relaxed” means to use strict canonicalization for the headers and relaxed for the body.
You can also use just a single canonicalization type in the c= field. This does not do what you expect.
“c=strict” is exactly the same as “c=strict/strict”. “c=relaxed” is exactly the same as “c=relaxed/strict”.
Yes, this makes no sense. But it’s what the spec says. If you use just “c=relaxed” you’re using strict canonicalization for the body of the message, and any change to whitespace in the body will break your signature.

And Microsoft?

Microsoft have a long history of modifying email in transit, often to “fix up” differences between standard Internet email and the expectations of their internal code. This article goes into some of how that breaks DKIM in some cases.
It appears that some paths through outlook.com from MX to inbox are converting tabs in the body of the message into spaces at the moment, while other paths aren’t. If you’re using strict DKIM body canonicalization – either intentionally or accidentally with “c=relaxed” – that means you’ll see apparently random DKIM failures for mail sent to recipients hosted by outlook.com, but only for messages where the body is susceptible to whitespace damage.
Using the right (“c=relaxed/relaxed” unless you have a good reason not to) canonicalization is a good start but you should also look at making the content you send as clean as possible, beyond just complying with the email standards avoid structures that risk being rewritten in transit. But that’s another post.

Related Posts

Gmail showing authentication results to endusers

A bit of older news, but worth a blog post. Early in August, Gmail announced changes to the inbox on both the web interface and the android client. They will be pushing authentication results into the interface, so end users can see which emails are authenticated.

These are not deliverability changes, the presence or absence of authentication will not affect inbox delivery. And the gmail Gmail support pages clarify that lack of authentication is not a sign that mail is spam.
This isn’t a huge change for most ESPs and most senders. In fact, Gmail has reported more than 95% of their mail is authenticated with either SPF or DKIM. Now, Gmail does a “best guess” SPF – if it looks like an IP should be authorized to send mail for a domain (like the sending IP is the same as the MX) then it’s considered authenticated.
It’s good to see authentication information being passed to the end user.

Read More

A DKIM primer resurrected

I was looking for some references today back in old blog posts. This means I discover some old links are dead, blog posts are gone or moved, and information is lost.
In this case it’s a post by J.D. Falk on deliverability.com. The link is dead (it looks like the whole website is dead), but I found a copy of his post and am reproducing it here. I don’t have permission, because I can’t get permission from him, but the content is extremely useful and I don’t want it lost.

Read More

Improving Outlook Email Display

Today Litmus announced they had partnered with Microsoft to fix many of the rendering issues with Outlook. Congrats, Litmus! This is awesome. I know a lot of folks have tried to get MS to the table to fix some of the problems with Outlook. Take a bow for getting this off the ground.
According to Litmus, the partnership has two parts.

Read More