DKIM Canonicalization – or – why Microsoft breaks your mail

thx1138
One of these things is just like the other

Canonicalization is about comparing things to see if they’re the same. Sometimes you want to do a “fuzzy” comparison, to see if two things are interchangeable for your purposes, even if they’re not exactly identical.
As a concrete example, these two email addresses:

  • (Steve) steve@wordtothewise.com
  • “Also Steve” <steve@WORDTOTHEWISE.COM>

They’re clearly not identical, but they’ll deliver to the same maibox.
I could compare them with a set of comparison rules (if the string to the left of the @ sign up to white space is the same in both, or one of them has a less than sign in front of it, and the string to the right of the @ is the same, compared case-insensitively, ….).
Or I could canonicalize both email addresses and see if the results are identical. A simple canonicalization algorithm might be “Remove anything in parentheses. Remove any quoted strings. Strip any whitespace and any greater than or less than signs at each end. Convert anything after the @ sign to lower-case”. That’ll give two canonicalized email addresses:

  • steve@wordtothewise.com
  • steve@wordtothewise.com

They’re trivially identical, so I know that the two email addresses I started with are interchangeable.
DKIM validation is all about comparing whether things have changed or not. The DKIM-Signature header contains a “fingerprint” of the canonicalized message body and of the canonicalized headers from when the mail was sent. If you canonicalize the body and headers that you received, take the fingerprint in the same way and that’s identical to the one in the DKIM-Signature header, you know the headers and body haven’t been modified since the message was signed (and then you can do some DNS lookups and some cryptography to find who signed the message).
Unfortunately, email was never designed to send messages unchanged. Intermediate servers will often “fix up” messages – by folding long lines, normalizing whitespace, adding missing headers or fixing up invalid ones, by re-encoding content to different encodings and all sorts of other changes. That would break a byte-for-byte comparison of the mail as sent and as received. And because we don’t have a copy of the mail as sent to compare with – we only have a “fingerprint” of it – we can’t do any fancy comparison. So we have to rely on the canonicalization, and hope that even after the “fix ups” made during delivery the canonicalized forms – and hence the fingerprints – will be identical.

DKIM canonicalization

DKIM defines two canonicalization algorithms for the body of the message, simple and relaxed.
Simple body canonicalization does very little: it just strips any blank lines at the end of the body. Relaxed body canonicalization strips those blank lines, and then replaces any run of white space – spaces or tabs – in the body with a single space. This means that any change in whitespace in the body, such as converting tabs to spaces, won’t affect the relaxed canonicalized body.
DKIM also defines two canonicalization algorithms for the headers of the message. They’re also called simple and relaxed, despite doing quite different things. (Yes, this is confusing.)
Simple header canonicalization is as simple as you can get. It makes no changes, so the headers must be byte-for-byte identical to match. Relaxed header canonicalization converts all header names to lower case, unfolds headers so each is a single line, replaces any run of white space with a single space character and removes any trailing whitespace on each line.
(See the DKIM spec if you want all the details.)
The simple takeaway from this is that simple canonicalization makes DKIM signatures that are broken by even trivial modifications in transit, while relaxed canonicalization makes them more robust.
The really simple takeaway is “use relaxed canonicalization”.

The c= field

The canonicalization you use is recorded in the c= field of the DKIM-Signature header, with the two canonicalization names separated by a slash, header first.
So “c=strict/relaxed” means to use strict canonicalization for the headers and relaxed for the body.
You can also use just a single canonicalization type in the c= field. This does not do what you expect.
“c=strict” is exactly the same as “c=strict/strict”. “c=relaxed” is exactly the same as “c=relaxed/strict”.
Yes, this makes no sense. But it’s what the spec says. If you use just “c=relaxed” you’re using strict canonicalization for the body of the message, and any change to whitespace in the body will break your signature.

And Microsoft?

Microsoft have a long history of modifying email in transit, often to “fix up” differences between standard Internet email and the expectations of their internal code. This article goes into some of how that breaks DKIM in some cases.
It appears that some paths through outlook.com from MX to inbox are converting tabs in the body of the message into spaces at the moment, while other paths aren’t. If you’re using strict DKIM body canonicalization – either intentionally or accidentally with “c=relaxed” – that means you’ll see apparently random DKIM failures for mail sent to recipients hosted by outlook.com, but only for messages where the body is susceptible to whitespace damage.
Using the right (“c=relaxed/relaxed” unless you have a good reason not to) canonicalization is a good start but you should also look at making the content you send as clean as possible, beyond just complying with the email standards avoid structures that risk being rewritten in transit. But that’s another post.

Related Posts

August 2016: The Month in Email

August was a busy month for both Word to the Wise and the larger world of email infrastructure.
IMG_0026
A significant subscription attack targeted .gov addresses, ESPs and over a hundred other industry targets. I wrote about it as it began, and Spamhaus chief executive Steve Linford weighed in in our comments thread. As it continued, we worked with M3AAWG and other industry leaders to share data and coordinate efforts to help senders recover from the attack.
In the aftermath, we wrote several posts about abuse, blocklists, how the industry handles these attacks currently, and how we might address these issues going forward. And obviously this has been on my mind before this attack — I posted about ongoing problems with internet security, how open subscription forms contribute to the problem, and other ways that companies inadvertently support phishing operations.
I posted about the history of email, and recounted some of my earliest experiences, when I had a .bitnet and a .gov address. Did you use email before SMTP? Before email clients? I’d be curious to hear your stories.
Speaking of email clients, I did two posts about how mail gets displayed to the end user: Gmail is displaying authentication results, which should provide end users with a bit more transparency about how authentication is used to deliver or block messages, and Microsoft is partnering with Litmus to improve some of the display issues people face using Outlook. These are both notable — if this is not your first time reading this blog, you know about my constant refrain that delivery is a function of sending people mail they want to engage with. If the mail is properly formatted and displayed, and people have a high degree of confidence that it’s been sent from someone they want to get mail from, that goes a long way towards improving engagement in the channel.
On that note, I spoke at length with Derek Harding about how marketers might change their thinking on deliverability, and he wrote that up for ClickZ. I also participated in the creation of Adobe’s excellent Teaching the Email Marketer How to Fish document (no, not phish…).
Steve was very busy behind the scenes this month thinking about abuse-related topics in light of the SBL issues, but he wrote up a quick post about the Traffic Light Protocol, which is used to denote sensitive information as it is shared.
Finally, for my Ask Laura column this month, I answered questions about delivery and engagement metrics and about permissions with purchased lists. As always, if you have a general question about email delivery, send it along and I’ll consider it for the column.

Read More

Improving Outlook Email Display

Today Litmus announced they had partnered with Microsoft to fix many of the rendering issues with Outlook. Congrats, Litmus! This is awesome. I know a lot of folks have tried to get MS to the table to fix some of the problems with Outlook. Take a bow for getting this off the ground.
According to Litmus, the partnership has two parts.

Read More

Don't just follow the HOWTO

speakingIconForBlogThere are so many moving parts to ensure good email deliverability. Email marketers need to know marketing, they need to know email and they need to know design. The technical bits of email can be a challenge to learn, and many folks who write tutorials and How-Tos write them for a different audience than marketers.
One of the things I’m trying to do is demystify the technical end of email for marketers. Today I talked about authentication in the Only Influencers newsletter. Check it out!
Understanding the technical: Authentication
Authentication in general

Read More