I can't click through if you don't exist

Recipients can’t click through if you don’t exist
A tale of misconfigured DNS wrecking someone’s campaign.
I got mail this morning from A Large Computer Supplier, asking me to fill in a survey about them. I had some feedback for them, mostly along the lines of “It’s been two decades since I bought anything other than rackmount servers from you, maybe I’m not a good advertising target for $200 consumer laptops?” so I clicked the link.
 
Failed_to_open_page
 
(I’ve replaced the real domain with survey.example.com in this post, to protect the innocent, but everything else is authentic).
That’s not good. The friendly error messages web browsers give sometimes hide the underlying problem, but that looks like a DNS problem. Did they do something stupid, like putting the wrong URL in the mail they sent?
 

~ ∙ host survey.example.com
Host survey.example.com not found: 3(NXDOMAIN)

 
“NXDOMAIN”. That means that there are no records in DNS for the hostname I looked up. From my part of the Internet, at least, that hostname doesn’t exist. I used to build DNS software, so I find the variety of ways in which people break their DNS interesting. Time to dig a little deeper.
 

~ ∙ host -t ns example.com
example.com name server ns2.dreamhost.com.
example.com name server ns1.dreamhost.com.
example.com name server ns3.dreamhost.com.
~ ∙ host survey.example.com ns1.dreamhost.com
Using domain server:
Name: ns1.dreamhost.com
Address: 66.33.206.206#53
Aliases:
Host survey.example.com not found: 3(NXDOMAIN)

 
Here I look up the authoritative servers for the domain, and find it’s hosted by dreamhost. Then I check the records at one of the authoritative servers, ns1.dreamhost.com, and it’s returning NXDOMAIN too. survey.example.com doesn’t exist. Oops.
I told a little fib
Except … I’ve not been entirely truthful about how I investigate DNS issues. “host” is a user-friendly tool, and it provides nice, brief output for normal queries, so it’s the tool I use when I’m showing queries to clients or putting them on the blog. But I’m a DNS geek, so the tool I actually use is “dig“. Dig is anything but user-friendly. The results it gives you aren’t really interpreted at all, just a human-readable representation of the raw DNS packets – verbose, with lots of output that doesn’t necessarily make sense unless you’re familiar with how DNS works under the covers ([rfc 1034] and [rfc 1035] if you really want to know).
This is what that last query looks like using dig:

~ ∙ dig @ns1.dreamhost.com survey.example.com
; <<>> DiG 9.8.3-P1 <<>> @ns1.dreamhost.com survey.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 54756
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;survey.example.com. IN A
;; ANSWER SECTION:
survey.example.com. 14400 IN CNAME example-com.surveygizmo.com.
;; AUTHORITY SECTION:
surveygizmo.com. 14400 IN SOA ns1.dreamhost.com. hostmaster.dreamhost.com. 2011072000 14618 1800 1814400 14400
;; Query time: 54 msec
;; SERVER: 66.33.206.206#53(66.33.206.206)
;; WHEN: Fri Oct 10 10:25:13 2014
;; MSG SIZE rcvd: 147

Well … now I’m interested. With dig we can see exactly what the response from the authoritative server is – and it’s very broken. It’s returning an NXDOMAIN response, saying definitively that there are no records of any type for survey.example.com of any type. But it’s also returning an answer record for survey.example.com – a CNAME that redirects to the survey vendor. That’s really not allowed.
I contacted the firm running the survey and gave them a heads-up that their DNS was broken – and they replied telling me that it was working fine for them. I wonder how that could be.
I have three different DNS resolvers on my network: PowerDNS, a very solid and standards-compliant resolver. BIND, the oldest resolver, often installed by default and full of both features and bugs. And also whatever embedded resolver Mikrotik appliances use, likely similar to the embedded resolvers used in consumer routers.
Lets see what that record looks like through different resolvers:
First, PowerDNS

~ ∙ dig @192.168.80.100 survey.example.com
; <<>> DiG 9.8.3-P1 <<>> @192.168.80.100 survey.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 21305
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;survey.example.com. IN A
;; ANSWER SECTION:
survey.example.com. 14400 IN CNAME example-com.surveygizmo.com.
;; Query time: 194 msec
;; SERVER: 192.168.80.100#53(192.168.80.100)
;; WHEN: Fri Oct 10 14:18:41 2014
;; MSG SIZE rcvd: 93

 
PowerDNS returns what it received from the authoritative server – an NXDOMAIN and an answer. Most applications are going to see the NXDOMAIN and stop there, unable to resolve the hostname.
Secondly, Mikrotik

~ ∙ dig @192.168.80.1 survey.example.com
; <<>> DiG 9.8.3-P1 <<>> @192.168.80.1 survey.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 37002
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;survey.example.com. IN A
;; Query time: 114 msec
;; SERVER: 192.168.80.1#53(192.168.80.1)
;; WHEN: Fri Oct 10 14:18:01 2014
;; MSG SIZE rcvd: 36

 
The embedded resolver Mikrotik uses sees the NXDOMAIN response and provides just that, without the answer record.
 
And finally, BIND

steve@scratch:~$ dig @127.0.0.1 survey.example.com
; <<>> DiG 9.9.5-3-Ubuntu <<>> @127.0.0.1 survey.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1911
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;survey.arborvistallc.com. IN A
;; ANSWER SECTION:
survey.example.com. 14400 IN CNAME example-com.surveygizmo.com.
example-com.surveygizmo.com. 300 IN A 216.152.128.142
;; AUTHORITY SECTION:
surveygizmo.com. 172800 IN NS jack.ns.cloudflare.com.
surveygizmo.com. 172800 IN NS dina.ns.cloudflare.com.
;; ADDITIONAL SECTION:
dina.ns.cloudflare.com. 172800 IN A 173.245.58.107
dina.ns.cloudflare.com. 172800 IN AAAA 2400:cb00:2049:1::adf5:3a6b
jack.ns.cloudflare.com. 172800 IN A 173.245.59.121
jack.ns.cloudflare.com. 172800 IN AAAA 2400:cb00:2049:1::adf5:3b79
;; Query time: 868 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Oct 10 07:58:13 PDT 2014
;; MSG SIZE rcvd: 253

 
BIND handles it differently (and, I think, wrongly). It sees that there’s an answer, so it returns an answer, along with a lot of other related records. And it returns a NOERROR response, instead of the NXDOMAIN it received. Any client application, such as a web browser, will see that as a perfectly reasonable response, and clicking on the link will work, ending up at surveygizmo.
And the moral of this story is…
Steve gets overly excited by obscure DNS bugs, mostly.
But also, it’s possible to mess up your DNS records such that it will work perfectly for you, and some fraction of your recipients, while being broken for the rest of your recipients (anyone at an ISP not using BIND in this example). So if you get reports that your links aren’t working (or your SPF records or DKIM records are bad) don’t assume that it can’t be a DNS problem because it works correctly when you check them.

Related Posts

Collaboration key to fighting crime on the internet

The Pittsburg Post Gazette has a good article on the DNS Changer Working group and how it can serve as a model for future collaboration against cyber crime.

Read More

Weird mail problems today? Clear your DNS cache!

A number of sources are reporting this morning that there was a problem with some domains in the .com zone yesterday. These problems caused the DNS records of these domains to become corrupted. The records are now fixed. Some of the domains, however, had long TTLs. If a recursive resolver pulled the corrupted records, it could take up to 2 days for the new records to naturally age out.
Folks can fix this by flushing their DNS cache, thus forcing the recursive resolver to pull the uncorrupted records.
EDIT: Cisco has published some more information about the problem. ‘Hijacking’ of DNS Records from Network Solutions

Read More

DNS, SERVFAIL, firewalls and Microsoft

When you look up a host name, a mailserver or anything else there are three types of reply you can get. The way they’re described varies from tool to tool, but they’re most commonly referred to using the messages dig returns – NXDOMAIN, NOERROR and SERVFAIL.
NXDOMAIN is the simplest – it means that there’s no DNS record that matches your query (or any other query for the same host name).
NOERROR is usually what you’re hoping for – it means that there is a DNS record with the host name you asked about. There might be an exact match for your query, or there might not, you’ll need to look at the answer section of the response to see. For example, if you do “dig www.google.com MX” you’ll get a NOERROR response – because there is an A record for that hostname, but no answers because there’s no MX record for it.
SERVFAIL is the all purpose “something went wrong” response. By far the most common cause for it is that there’s something broken or misconfigured with the authoritative DNS for the domain you’re querying so that your local DNS server sends out questions and never gets any answers back. After a few seconds of no responses it’ll give up and return this error.
Microsoft
Over the past few weeks we’ve heard from a few people about significant amounts of delivery failures to domains hosted by Microsoft’s live.com / outlook.com, due to SERVFAIL DNS errors. But other people saw no issues – and even the senders whose mail was bouncing could resolve the domains when they queried Microsofts nameservers directly rather than via their local DNS resolvers. What’s going on?
A common cause for DNS failures is inconsistent data in the DNS resolution tree for the target domain. There are tools that can mechanically check for that, though, and they showed no issues with the problematic domains. So it’s not that.
Source ports and destination ports
If you’re even slightly familiar with the Internet you’ve heard of ports – they’re the numbered slots that servers listen on to provide services. Webservers listen on port 80, mailservers on port 25, DNS servers on port 53 and so on. But those are just the destination ports – each connection comes from a source port too (it’s the combination of source port and destination port that lets two communicating computers keep track of what data should go where).
Source ports are usually assigned to each connection pretty much randomly, and you don’t need to worry about them. But DNS has a history of the source port being relevant (it used to always use source port 53, but most servers have switched to using random source ports for security reasons). And there’s been an increasing amount of publicity about using DNS servers as packet amplifiers recently, with people being encouraged to lock them down. Did somebody tweak a firewall and break something?
Both source and destination ports range between 1 and 65535. There’s no technical distinction between them, just a common understanding that certain ports are expected to be used for particular services. Historically they’ve been divided into three ranges – 1 to 1023 are the “low ports” or “well known ports”, 1024-49151 are “registered ports” and 49152 and up are “ephemeral ports”. On some operating systems normal users are prevented from using ports less than 1024, so they’re sometimes treated differently by firewall configurations.
While source ports are usually generated randomly, some tools let you assign them by hand, including dig. Adding the flag -b "0.0.0.0#1337" to dig will make it send queries from  source port 1337. For ports below 1024 you need to run dig as root, but that’s easy enough to do.
A (slightly) broken firewall
sudo dig -b "0.0.0.0#1024" live.com @ns2.msft.net” queries one of Microsofts nameservers for their live.com domain, and returns a good answer.
sudo dig -b "0.0.0.0#1023" live.com @ns2.msft.net” times out. Trying other ports above and below 1024 at random gives similar results. So there’s a firewall or other packet filter somewhere that’s discarding either the queries coming from low ports or the replies going back to those low ports.
Older DNS servers always use port 53 as their source port – blocking that would have caused a lot of complaints.
But “sudo dig -b "0.0.0.0#53" live.com @ns2.msft.net” works perfectly. So the firewall, wherever it is, seems to block DNS queries from all low ports, except port 53. It’s definitely a DNS aware configuration.
DNS packets go through a lot of servers and routers and firewalls between me and Microsoft, though, so it’s possible it could be some sort of problem with my packet filters or firewall. Better to check.
sudo dig -b "0.0.0.0#1000" google.com @ns1.google.com” works perfectly.
So does “sudo dig -b "0.0.0.0#1000" amazon.com @pdns1.ultradns.net“.
And “sudo dig -b "0.0.0.0#1000" yahoo.com @ns1.yahoo.com“.
The problem isn’t at my end of the connection, it’s near Microsoft.
Is this a firewall misconfiguration at Microsoft? Or should DNS queries not be coming from low ports (other than 53)? My take on it is that it’s the former – DNS servers are well within spec to use randomly assigned source ports, including ports below 1024, and discarding those queries is broken behaviour.
But using low source ports (other than 53) isn’t something most DNS servers will tend to do, as they’re hosted on unix and using those low ports on unix requires jumping through many more programming hoops and involves more security concerns than just limiting yourself to ports above 1023. There’s no real standard for DNS source port randomization, which is something that was added to many servers in a bit of a hurry in response to a vulnerability that was heavily publicized in 2008. Bind running on Windows seems to use low ports in some configurations. And even unix hosted nameservers behind a NAT might have their queries rewritten to use low source ports. So discarding DNS queries from low ports is one of the more annoying sorts of network bugs – one that won’t affect most people at all, but those it does affect will see it much of the time.
If you’re seeing DNS issues resolving Microsoft hosted domains, or you’re seeing patterns of unexpected SERVFAILs from other nameservers, check to see if they’re blocking queries from low ports. If they are, take a look and see what ranges of source ports your recursive DNS resolvers are configured to use.
(There’s been some discussion of this recently on the [mailop] mailing list.)

Read More