We often want to know whether two hostnames are controlled by the same person, or not.
One case for that is cookie privacy in web browsers. We want pages at www.blighty.com and images.blighty.com and blighty.com to all be able to set and read cookies for each other – so a user only needs to log in once for pages or images on all of them to work well together. So we allow all of them to access cookies for “*.blighty.com”.
But we don’t want blighty.com and example.com to be able to access each other’s cookies. (Both for privacy reasons and for security, so a hostile page at example.com can’t steal authentication cookies for blighty.com from the user).
Loosely, we want to be able to say that images.blighty.com and www.blighty.com are sort of the same, while images.example.com and www.blighty.com aren’t.
At first that looks pretty simple to do, especially if you’re in the US. Two hostnames are the same if they’re in the same domain. The relevant domains here are obviously blighty.com and example.com. And there’s an obvious way to find the domain: it’s the last two words in the hostname, sometimes called the second level domain and top level domain.
For www.blighty.com the second-level domain would be “blighty”, the top-level domain would be “com” and the domain would be “blighty.com”. We can say that any hostname that ends in “.blighty.com” is in that domain, and reasonably assume that they’re controlled by the same person.
But … if we’re looking at www.blighty.co.uk then the domain is “blighty.co.uk”. Our simple algorithm to find the “domain” doesn’t work.
You’d think that somebody would have thought about this when designing the domain name system, wouldn’t you? But no. They didn’t.
In DNS there’s not really any such thing as a “domain” or a “subdomain”. It’s all hostnames. “www.blighty.com”? Hostname. “blighty.com”? Hostname. “com”? Hostname. “.”? Hostname. And DNS doesn’t provide any information about common ownership between hostnames, at all.
So if we want to find the “domain” for a hostname we’re going to have to come up with some other way of doing it. One obvious approach would be to manually maintain a list of all the possible “top level domains” under which people can register a domain. That’d be quite ridiculous and excrutiatingly painful to maintain. So that’s what we did.
The Public Suffix List is a list of over 8,000 “top level domains”. It has both real ones like “.com”, “.co.uk”, “.zippo” and “. k12.al.us” under which people can register domains, but also domains under which many independent customers can use their own subdomains such as “herokuapp.com” or “blogspot.com”. Using this list you can finally define a unique “domain” part of any hostname. That’ll often be the same as the intuitive idea of a domain – “blighty.com”, “losaltos.k12.ca.us” or “natwest.co.uk”. Sometimes it won’t be – “lecreuset.us.com”, “myapp.herokuapp.com” or “mynas.diskstation.me” – but in those cases it’s a better description of the subdomains that are under the control of a single user.
The public suffix list is included in every web browser, to handle cookie security. It’s useful for other things too. We use it in several of our internal data-clustering tools to canonicalize URLs and MXes. And it’s a critical part of DMARC.
One of the first new concepts DMARC exposes you to is “aligned domains”.
“DMARC passes if either the message is validly DKIM signed and the DKIM d= domain aligns with the domain in the From: field, or if the message passes SPF with a domain that aligns with the domain in the From: field.”
DMARC defines two different sorts of alignment between domains. The less interesting one is “strict alignment”, communicated via adkim=s or aspf=s fields in the DMARC record. Strict alignment just means the domains are identical. The much more interesting one is “relaxed alignment”, communicated via adkim=r or aspf=r. With relaxed alignment then two domains are aligned if they’re in the same Organizational Domain – which is the extended, formalized definition of a domain as defined by the Public Suffix List.
You knew there was going to be something email-related eventually, right?