The trouble with CNAMEs

When you query DNS for something you ask your local DNS recursive resolver for all answers it has about a hostname of a certain type. If you’re going to a website your browser asks your resolver for all records for “google.com” of type “A”1or “AAAA”, but that’s not important right now and it will either return all the A records for google.com it has cached, or it will do the complex process of looking up the results from the authoritative servers, cache them for as long as the TTL field for the reply says it should, then return them to you.

There are dozens of different types of records, AAAA for IPv6 IP addresses, MX for mailservers, TXT for arbitrary text, mostly used for various sorts of authentication (including SPF, DKIM and DMARC). And then there’s CNAME.

CNAME stands for “Canonical Name” and means “Go and ask this different question instead”. If you have a DNS record that looks like “www.example.com CNAME example.net” then any time you ask your DNS resolver for records of any type for www.example.com it will see that there’s a CNAME record and do a query of the same type for example.net instead. So queries for “www.example.com A” will return whatever the answer for “example.net A” is, queries for “www.example.com MX” will return the same thing as “example.net MX”.

For a long time the main use you saw for CNAMEs was making “www.” hostnames work for webhosting, with “www.example.com CNAME example.com” records so that the www version of your website resolved to the same IP address as the non-www version.

One important thing about CNAMEs is that you should never have both CNAME records and any other sort of record for the same hostname. It breaks things, and now that we rely on DNS for more and more complex configuration and authentication it can break things in complex, inconsistent and hard to diagnose ways.

The concrete example of this today was diagnosing why SPF was failing, despite DNS apparently being set up correctly.

Two return paths – email1.example.com and email2.example.com. Both of them for use at same ESP, one that uses CNAMEs to make user onboarding easy.

email1.example.com 3600 CNAME esp.com
email2.example.com 3600 CNAME esp.com
esp.com             300 TXT "v=spf1 exists:%{i}._spf.esp.com"

Identical DNS configured for both hostnames. Doing a dig from the command line gave the correct SPF record for both hostnames. And yet email2 randomly failed SPF, while email1 always passed SPF, while they were both being sent from the same IP address. That … shouldn’t happen.

My first thought was that there was some misconfiguration at esp.com such that it wasn’t handling email2 properly. But the only macro in that SPF record is “%{i}”, the IP address. So the ESP doesn’t know anything other than the sending IP address when answering that query, so it can’t give different answers for different hostnames2%{h} is the SPF macro for that, if you do need that.

After poking at the eight authoritative nameservers for the example.com zone, and being sidetracked by some other misconfigurations in their DNS, I found the answer. And, despite causing such weird symptoms, it was surprisingly simple.

Someone had added a google-site-verification TXT record for email2.example.com. That breaks the rule that you should never have a CNAME and any other DNS record for the same hostname. The failure works like this:

If I ask my DNS resolver for the SPF TXT record for email2 – “email2.example.com TXT” – and it doesn’t have it cached, then it will go ask one of the authoritative servers – ns04.example.com, say – for “email2.example.com TXT”. ns04 is being asked for a TXT record, and it has a matching TXT record, so it ignores the CNAME and returns the Google site validation record:

email2.example.com 300 TXT "google-site-verification=ZbTqQmfwO0C4..."

There’s no SPF TXT record in that response, so SPF fails. The resolver will hang on to that record for the next 300 seconds, and SPF will fail all that time.

But what if I query for something else, the MX record for email2.example.com – “email2.example.com MX”? Again, my resolver will go ask ns04 for the answer and it’ll get back something like this:

email2.example.com 3600 CNAME esp.com
esp.com            300  MX    mail.esp.com

The resolver will then cache that result, keeping the CNAME around for the next hour, so if I now ask for a TXT record again “email2.example.com TXT” my resolver will find the CNAME record in it’s cache and go “Alright, there’s a CNAME response so I should follow it to get the answer!”

email2.example.com 3600 CNAME esp.com
esp.com            300 TXT "v=spf1 exists:%{i}._spf.esp.com"

So now the answer I get has a validly formatted SPF TXT record in the response and so SPF passes for the message.

This means that depending on the history of queries the recursive resolver at a mailbox provider has seen recently it may have the (incorrect) TXT record cached, and return that, or it may have the (correct) CNAME record cached, and return that along with the (correct) set of TXT records. From the outside it looks like you get one or the other set of answers kind of at random3and just to make it more fun, different DNS resolvers may handle this in different ways.

So the morals of this story are:

  • Avoid CNAMEs when you can
  • Never have CNAMEs on the same hostname as any other sort of DNS record4which does mean you can never put them at the root of a zone, as they’ll always clash there
  • If you have weird flaky maybe DNS related failures and a CNAME is involved, check for a clashing record

You can check for clashes like this, assuming you’re expecting to ask foo.example.com for a TXT record:

$ dig +short example.com ns
ns01.example.com
ns02.example.com

$ dig +short foo.example.com txt @ns01.example.com
foo.example.com 3600 CNAME esp.com

This is the response you hope to get – just a CNAME response, meaning there’s no conflicting TXT record. If instead you don’t get a CNAME response but do get a TXT record then that TXT record conflicts.

Related Posts

Setting up DNS for sending email

Email – and email filtering – makes a lot of use of DNS, and it’s fairly easy to miss something. Here are a few checklists to help:

Read More

TXTing

txt
On Friday I talked a bit about the history behind TXT records, their uses and abuses.
But what’s in a TXT record? How is it used? When and where should you use them?
Here’s what you get if you query for the TXT records for exacttarget.com from a unix or OS X command line with dig exacttarget.com txt

Read More

A brief history of TXT Records

txt
When the Domain Name System was designed thirty years ago the concept behind it was pretty simple. It’s mostly just a distributed database that lets you map hostname / query-type pairs to values.
If you want to know the IP address of cnn.com, you look up {cnn.com, A} and get back a couple of IP addresses. If you want to know where to send mail for aol.com users, you look up {aol.com, MX} and you get a set of four hostname / preference pairs back. If you want to know the hostname for the IP address 206.190.36.45 you look up {45.36.190.206.in-addr.arpa, PTR} and get a hostname back.
There’s a well-defined meaning to each of those query types  – A is for IP addresses, MX is for mailservers, PTR is for hostnames – and that was always the intent for how DNS should work.
When DNS was first standardized, though, there was one query type that didn’t really have any semantic meaning:

Read More