It looks like Microsoft are getting pickier about email address syntax, rejecting mail that uses illegal address formats. That might be what’s causing that “550 5.6.0 CAT.InvalidContent.Exception: DataSourceOperationException, proxyAddress: prefix not supported – ; cannot handle content of message” rejection.
Why do we care?
It’s good to send syntactically valid email in a warm fuzzies sort of way – it shows we know what we’re doing, and aren’t dodgy spamware – but it’s increasingly important to delivery as mailbox providers are tightening up on their syntax checks. But why are mailbox providers doing that?
One reason is that authentication tech like DKIM and DMARC is built around them only being applied to email. Not to messages that kinda look like email.
There are ways to bypass DKIM protections by sending invalid messages. As one example, if you send multiple copies of the From: header with different values a DKIM checker will only check the first one. A mail client will only display one of them – maybe the first, maybe the last. If it displays the last then an attacker can send validly DKIM-signed and DMARC-authenticated mail with the attackers email address in the From field.
This is – obviously – a problem. For the specific case of duplicate headers we can mitigate it by “oversigning” headers, to break the DKIM signature if an attacker adds additional headers. That’s just a band-aid defence against a specific attack, though.
There are many other potential attacks that rely on sending messages that look almost like email, but which aren’t syntactically valid in some way. They mostly rely on the email being parsed by two different systems – the authentication checkers parse the email one way, and the metadata they extract is used to show that the message is validly authenticated, while the mail client parses the email a different way and ends up displaying different – unauthenticated – metadata to the recipient.
There’s not really any way to avoid the email being parsed multiple times, so what we need to do to fix this is to make sure it’s parsed the same way each time. And the only good way to do that is to parse it strictly against the grammar specified by the email RFCs.
As a mailbox provider what should we do with mail that violates that grammar? Treat it as unauthenticated? Drop it in the junk folder? Reject it altogether? Some mix of those?
The DKIM RFC said in 2011:
It is up to the Identity Assessor or some other subsequent agent to act on such messages as needed, such as degrading the trust of the message (or, indeed, of the Signer), warning the recipient, or even refusing delivery.
All components of the mail system that perform loose enforcement of other mail standards will need to revisit that posture when incorporating DKIM, especially when considering matters of potential attacks such as those described.
RFC 6376 #8.15
Mailbox providers are increasingly bearing this in mind.
I’ll be back Monday to talk about the syntax of addresses in email headers, which is where all this started thanks to a discussion on the M3AAAWG Slack last night.
I’m glad they’re making the parsing stricter, but it’s really ironic that it’s Microsoft doing this. Look at any message sent from O365 or Exchange and you’ll see a big pile of invalid headers, particularly Received. Their From headers have been OK as far as I can remember.