When you query DNS for something you ask your local DNS recursive resolver for all answers it has about a hostname of a certain type. If you’re going to a website your browser asks your resolver for all records for “google.com” of type “A”1or “AAAA”, but that’s not important right now and it will either return all the A records for google.com it has cached, or it will do the complex process of looking up the results from the authoritative servers, cache them for as long as the TTL field for the reply says it should, then return them to you.
There are dozens of different types of records, AAAA for IPv6 IP addresses, MX for mailservers, TXT for arbitrary text, mostly used for various sorts of authentication (including SPF, DKIM and DMARC). And then there’s CNAME.
CNAME stands for “Canonical Name” and means “Go and ask this different question instead”. If you have a DNS record that looks like “www.example.com CNAME example.net” then any time you ask your DNS resolver for records of any type for www.example.com it will see that there’s a CNAME record and do a query of the same type for example.net instead. So queries for “www.example.com A” will return whatever the answer for “example.net A” is, queries for “www.example.com MX” will return the same thing as “example.net MX”.
For a long time the main use you saw for CNAMEs was making “www.” hostnames work for webhosting, with “www.example.com CNAME example.com” records so that the www version of your website resolved to the same IP address as the non-www version.
One important thing about CNAMEs is that you should never have both CNAME records and any other sort of record for the same hostname. It breaks things, and now that we rely on DNS for more and more complex configuration and authentication it can break things in complex, inconsistent and hard to diagnose ways.
The concrete example of this today was diagnosing why SPF was failing, despite DNS apparently being set up correctly.
Two return paths – email1.example.com and email2.example.com. Both of them for use at same ESP, one that uses CNAMEs to make user onboarding easy.
email1.example.com 3600 CNAME esp.com email2.example.com 3600 CNAME esp.com esp.com 300 TXT "v=spf1 exists:%{i}._spf.esp.com"
Identical DNS configured for both hostnames. Doing a dig from the command line gave the correct SPF record for both hostnames. And yet email2 randomly failed SPF, while email1 always passed SPF, while they were both being sent from the same IP address. That … shouldn’t happen.
My first thought was that there was some misconfiguration at esp.com such that it wasn’t handling email2 properly. But the only macro in that SPF record is “%{i}”, the IP address. So the ESP doesn’t know anything other than the sending IP address when answering that query, so it can’t give different answers for different hostnames2%{h} is the SPF macro for that, if you do need that.
After poking at the eight authoritative nameservers for the example.com zone, and being sidetracked by some other misconfigurations in their DNS, I found the answer. And, despite causing such weird symptoms, it was surprisingly simple.
Someone had added a google-site-verification TXT record for email2.example.com. That breaks the rule that you should never have a CNAME and any other DNS record for the same hostname. The failure works like this:
If I ask my DNS resolver for the SPF TXT record for email2 – “email2.example.com TXT” – and it doesn’t have it cached, then it will go ask one of the authoritative servers – ns04.example.com, say – for “email2.example.com TXT”. ns04 is being asked for a TXT record, and it has a matching TXT record, so it ignores the CNAME and returns the Google site validation record:
email2.example.com 300 TXT "google-site-verification=ZbTqQmfwO0C4..."
There’s no SPF TXT record in that response, so SPF fails. The resolver will hang on to that record for the next 300 seconds, and SPF will fail all that time.
But what if I query for something else, the MX record for email2.example.com – “email2.example.com MX”? Again, my resolver will go ask ns04 for the answer and it’ll get back something like this:
email2.example.com 3600 CNAME esp.com esp.com 300 MX mail.esp.com
The resolver will then cache that result, keeping the CNAME around for the next hour, so if I now ask for a TXT record again “email2.example.com TXT” my resolver will find the CNAME record in it’s cache and go “Alright, there’s a CNAME response so I should follow it to get the answer!”
email2.example.com 3600 CNAME esp.com esp.com 300 TXT "v=spf1 exists:%{i}._spf.esp.com"
So now the answer I get has a validly formatted SPF TXT record in the response and so SPF passes for the message.
This means that depending on the history of queries the recursive resolver at a mailbox provider has seen recently it may have the (incorrect) TXT record cached, and return that, or it may have the (correct) CNAME record cached, and return that along with the (correct) set of TXT records. From the outside it looks like you get one or the other set of answers kind of at random3and just to make it more fun, different DNS resolvers may handle this in different ways.
So the morals of this story are:
- Avoid CNAMEs when you can
- Never have CNAMEs on the same hostname as any other sort of DNS record4which does mean you can never put them at the root of a zone, as they’ll always clash there
- If you have weird flaky maybe DNS related failures and a CNAME is involved, check for a clashing record
You can check for clashes like this, assuming you’re expecting to ask foo.example.com for a TXT record:
$ dig +short example.com ns ns01.example.com ns02.example.com $ dig +short foo.example.com txt @ns01.example.com foo.example.com 3600 CNAME esp.com
This is the response you hope to get – just a CNAME response, meaning there’s no conflicting TXT record. If instead you don’t get a CNAME response but do get a TXT record then that TXT record conflicts.
I remembered that section 5.2.2 of RFC 1123 said that CNAMEs can’t be used in MAIL FROM or RCPT TO, but sigging around seems to indicate that 5321 allows them now.
I know that sendmail used to rewrite CNAMEs, so if you sent a message from pat@example.com and example.com was a CNAME for sample.com then the ‘From:’ address would be changed to pat@sample.com after passing through sendmail. Is that sort of thing still a concern?
SMTP allows CNAMEs in email addresses, mostly by not caring about them too much as long as the underlying DNS resolver returns the MX record (or, worst case, A record) it resolves to, so a lot of the hackery around CNAMEs in MTAs has gone away. _But_ you still can’t have CNAMEs at the root of a zone (as they’ll clash with SOA and NS records, apart from anything else) so you can only really use them for email addresses in subdomains.
Email addresses in subdomains are a part of the setup process for most ESP customers, so while they’re still a horrible bit of the DNS protocol, they’re very useful for some aspects of onboarding small customers at an ESP.
The original plan for CNAMEs was that they were temporary placeholders when something moved. That’s why early mail software resolved the CNAME to the real name it pointed to.
Now they’re mostly to do indirection across admin boundaries, along wth the www CNAMEs which are just lazy. (Since the CNAME is in the same zone as the target, rather than the CNAME you could just copy the target.)
I also wish that name servers would fail rather than flake when there’s a CNAME and somethng ehse. It’d make debugging a lot easier.