An email address has two main parts. The local-part is the bit before the @-sign and the domain is the bit after it. Loosely, the domain part tells SMTP how to get an email to the destination mailserver while the local part tells that server whose mailbox to put it in.
I’m just looking at the local part today, the “steve” in “email@example.com”.
Talkin’ ‘Bout a Specification
The original specification for SMTP email delivery, RFC 821, specifies a few things about the local-part. It can’t be more than 64 character ascii characters long, and it must be wrapped in double quotes if it includes any punctuation. But that’s just syntax, nothing to do with what it means. It does mention that it’s case-sensitive: “firstname.lastname@example.org” is not the same recipient as “sTeve@example.com”.
The specification for the structure of email messages, RFC 822, tells us a little more. It clarifies that the local-part is case-sensitive, with the sole exception of the “postmaster” account, which is required to be deliverable as “postmaster”, “POSTMASTER”, “POSTmasTER” or any other variant you like.
The domain-dependent string is uninterpreted, except by the final sub-domain; the rest of the mail service merely transmits it as a literal string.
It’s describing the local-part as the “domain-dependent string”, and it says that it has no semantic meaning to anyone or anything other than the final server it’s delivered to.
The specifications have been updated since, but the meaning of an email local-part hasn’t changed. It’s opaque, and as far as the specification is concerned nobody other than the recipients email server should make any assumptions about what it means or whether two different local-parts deliver to the same recipient. And that also means that you shouldn’t modify or normalize it any way, not even by folding it to lower or upper case.
Behind the Wall
So, that’s what the spec says and what you can universally assume about email addresses. But what semantics do final mailservers actually use, and what can you sometimes assume?
Simplest first. Every non-stunt mailserver I’ve ever seen compares all email addresses case-insensitively. “email@example.com” delivers to the same mailbox as “STEVE@example.com”.
That doesn’t mean that you should fold all email addresses to lower-case or, much worse, upper-case (NOBODY LIKES BEING SHOUTED AT). People are attached to the way their email address looks, and changing it makes your mail look wrong or out of place or slightly rude, all bad things.
But when you’re handling an unsubscribe request, compare it with your list case-insensitively. And when you’re checking for duplicates, check case-insensitively too. Apart from anything else, if someone is using you for subscription bombing and they sign up steve@, Steve@, sTeve@, STeve@, stEve@, StEve@ and so on you really don’t want to be sending that recipient 32 copies of each message. And it’s much worse if they’re called Christopher. Or Hubert.
Tagging, boxing, sub-addressing or plus-addressing
Many mail servers allow a “partial alias”, where they treat the local-part as two strings, separated by a special character. The first part is used by the server to decide who to deliver a message to. The second part is ignored by the server, but often used by the recipients mail client (or local delivery agent) to route mail to a specific folder. “firstname.lastname@example.org” might be delivered to the same recipient as “email@example.com”, who then might use the “mailop” tag to route it to a specific folder.
This has been around a long time. Sendmail has supported it out-of-the-box since V8.7 in 1995:
Allow “user+detail” to be aliased specially: it will first look for an alias for “user+detail”, then for “user+*”, and finally for “user”. This is intended for forwarding mail for system aliases such as root and postmaster to a centralized hub.
It may have been around longer than that, as part of MMDF perhaps. It’s certainly widely supported today. Gmail, Outlook.com, Fastmail, runbox and iCloud support it with a “+” separator. Yahoo supports it with a “-” separator. And most mailservers can be configured to support it, typically with “+”, “-” or “=” separators.
So, should you assume that “firstname.lastname@example.org” and “email@example.com” deliver to the same mailbox? Not in any automatic way, no. Don’t suppress them as duplicates, don’t let an unsub for one unsub the other (I often end up subscribed to the same list on multiple email addresses and the last thing I want when I unsub five of them is for you to also remove the sixth address, that one where I wanted to receive it). But you might want to flatten them for metrics, and if a support or abuse desk is dealing with complaints they might want to check for alternative addresses being subscribed.
You definitely shouldn’t assume that “firstname.lastname@example.org” and “email@example.com” are the same person.
Should you reject attempts to sign up tagged addresses? No! They’re typically a sign that the person signing up is technically competent, wants the mail you’re sending, and wants control over where it’s delivered to. I often use tagged addresses to sidestep some more aggressive spam filtering, for example.
And you probably want to check (with a gmail account, perhaps) that your signup form accepts email addresses with a “+” tag in them – as a “+” is treated specially by HTML form submission and it’s occasionally broken in a way that converts a “+” to a space.
Should you strip tags from tagged addresses? No! If you modify the local-part in any way it may not deliver to the person who signed up. And, even if you correctly identify a tag, stripping that tag is violating the relationship you have with the recipient. If you send me bulk email and it’s not to a tagged address I will assume you’re spamming, because if I had given you an address it would have been tagged.
If you see tagged addresses in a customers list, is that a red flag? It depends. Tagged addresses are usually intentionally subscribed, rather than being harvested or epended, so the presence of some on a list can be a positive sign. But a recipient is unlikely to sign up multiple times intentionally, so a dozen different variants on “firstname.lastname@example.org” might be worth a look.
And tags are usually chosen by hand, the recipient choosing them to match who they gave them to. If Britvic are sending email through you, and they have “email@example.com” or “firstname.lastname@example.org” that’s a sign that the address was given explicitly to them, and a sign of healthy address acquisition practices. If, on the other hand, there’s “email@example.com” on their list that’s a really bad sign. “firstname.lastname@example.org” might be good or might be bad, depending on the client while “email@example.com” tells you where they’re harvesting addresses from.
And then there’s gmail. They support normal tagging, using a “+” separator, but they also strip all periods from the local-part before comparing it, so “steve.atkins@gmail” and “steveatkins@gmail” and “s.t.e.v.e.atkins@gmail” would all deliver to the same users mailbox. It’s a smart decision on gmail’s part, avoiding the problem of two Steve Atkins signing up as steve.atkins and steveatkins and getting each other’s email when they accidentally add or remove the period when signing up for something (as Steve’s corporate email address is probably Steve.Atkins@example.corp and his finger memory is going to add that period without him even seeing it).
This is well-enough understood behaviour that it would be reasonable – for gmail addresses only – to compare local-parts with the periods removed, for duplicate checking and unsubscriptions. Don’t modify the local-part you store, though, still send it to the version you were given. You might want to store “Steve.Atkins@gmail” in a canonicalized way as “steveatkins@gmail” to make an easy primary key for duplicate checking and unsubscription, but you should still send the mail to “Steve.Atkins@gmail”.
“Best” things as seen in bounces:
– Return-paths (which are also email addresses) where the local part is cut down to 64 chars
– Return-paths where the local part is cut after certain characters (e.g. an underscore used by our VERP)
– Return-paths being lower-cased …
In the age of bots does this advice still stand? I have heard that ESPs/senders should be normalizing the periods Gmail uses, because allowing an address to be sent to multiple times by a sender increases the impact of an attack.
For example, as an ESP, when multiple permutations of an address sign up via a form with Gmail periods, should I normalize them into one address, send them just one message, and unsubscribe them as one contact?
Your call on the normalization. If you don’t want to normalize, you can speed up your test for duplicates as local parts can neither start with nor end with a dot.
So you should know what the local part starts with, what it ends with and that it contains a dot (oh and is also a gmail address). That will allow you to pull fewer addresses from your database that would need to be checked.
…or you could just flatten gmail 🙂