What is an email address? (part one)

Given we deal with email addresses every day, dozens or thousands or millions of them, it seems a bit strange to ask what an email address is – but given some of the problems people have with the grubbier corners of address syntax it’s actually an interesting question.
There are two real standards that define what is a valid email address and what isn’t. The most complex is RFC 5322 – Internet Message Format, which describes all sorts of things about the structure of an email, including what’s valid to put in From: and To: headers. It’s really too liberal in what it allows an email address to look like to be terribly useful, but it does provide for one very commonly used feature – the friendly from where the name that’s displayed to the recipient is not just the email address.

    From: "Steve Atkins" 

Here the string that’s displayed to the user (Steve Atkins) comes first, surrounded by double quotes, then the email address itself (steve@example.com) surrounded by angle brackets. You might see other obsolete formats used, including parentheses or no double quotes, but this is the safe one to use.
The other standard is RFC 5321 – Simple Mail Transfer Protocol, which describes how email addresses are used to actually send email. It too is far too liberal in what it allows to be operationally useful for an ESP, but it does define some important features:

  • An email address consists of two parts, a local-part and a domain-part separated by an “@” – in steve@example.com, steve is the local-part and example.com is the domain-part.
  • The domain-part is an internet domain – it’s all you need to know to work out (via a DNS lookup) where an email needs to be sent to.
  • The domain-part is case-insensitive – ExamplE.COM is exactly the same as example.com or EXAMPLE.COM.
  • The local-part is used by the receiving mailserver to work out what to do with the email once it receives it.
  • The local-part is case-sensitive – STEVE@example.com is a different email address to steve@example.com.
  • You can put almost anything in a local-part – letters, numbers, white space, punctuation, quote marks, parentheses – as long as you quote it properly.
  • Only the receiving mailserver can parse the local-part. You might be able to guess what it means, but only the receiving mailserver can say for sure.

I’ve dismissed both of the actual email address standards as too liberal to be useful, so what is useful? I’ll go into some more detail about what it’s operationally sensible to allow and forbid as you’re capturing email addresses, and how to compare and de-dupe them tomorrow (unless I’m preempted by breaking news in the world of email deliveribility, anyway).

Related Posts

Comcast rate limiting

Russell from Port25 posted a comment on my earlier post about changes at Comcast.

Read More

Why do ISPs limit emails per connection?

A few years ago it was “common knowledge” that if you were sending large amounts of email to an ISP the most polite way to do that, the way that would put the least load on the receiving mailserver, was to open a single SMTP session to the mailserver and then to send all the mail for that ISP down that single connection.
That’s because the receiving mailserver is concerned about two main resources when handling inbound email – the pool of “slots” assigned one per inbound SMTP session, and the bandwidth (network and disk, and related resouces such as memory and CPU) consumed by the inbound mail – and this approach means the sender only uses one slot, and it allows the receiving mailserver to control the bandwidth used simply by accepting data on that one connection at a given rate. It also amortizes all the connection setup costs over multiple emails. It’s a beautiful thing – it just doesn’t get any more efficient than that.
That seems perfect for the receiving ISP – but ISPs don’t encourage bulk senders to do this. Instead many of them have been moving from “one connection, lots of mail through it” to “multiple connections, a few messages through each”. They’re even limiting the number of deliveries permitted over a single connection. Why would that be?
The reason for this is driven by three things. One is that the number of simultaneous inbound SMTP sessions that a mailserver can handle is quite tightly limited by the architecture of most mailservers. Another is that the amount of mail that’s being sent to large ISP mailservers keeps going up and up – so there are sometimes more inbound SMTP sessions asking for access than the mailserver can handle. The third is that ISPs know that there are different categories of email being sent to their users – 1:1 mail from their friends that they want to see as soon as possible, wanted bulk mail that their users want to see when it arrives and spam; lots and lots of spam.
So ISPs want to be able to do things like accept 1:1 mail all the time, while deferring bulk mail and spam to allow them to shed traffic at times of peak load. But they can only make decisions about whether to accept or defer delivery in an efficient way at SMTP connection time – they pick and choose amongst the horde of inbound connection attempts to prioritize some and defer others, letting them keep within the number of inbound sessions that they can handle simultaneously.
But once the ISP lets a bulk mailer connect to deliver their mail, they lose most of the ability to further control that delivery as the sender might send thousands of emails down that connection. (Even if the ISP has the ability to throttle bandwidth – as some do to control obvious spam – that just means that the sender would tie up an expensive inbound delivery slot for longer).
So, in order to allow them to prioritize inbound connections effectively the ISP needs to terminate the session after a few deliveries, and then make that sender start competing with other senders for a connection again.
So ISPs aren’t limiting the number of deliveries per SMTP connection to make things difficult for senders, or because they don’t understand how mail works. They’re doing it because that lets them prioritize wanted email to their users. The same is true when they defer your mail with a 4xx response.
It might be annoying to have to deal with these limits on delivery, but for legitimate bulk mail senders all this throttling and prioritization is a good thing. Your mail may be given less priority than 1:1 mail – but, if you maintain a good reputation, you’re given higher priority than all the spam, higher priority than all the email borne viruses, higher priority than all the junk email, higher priority than the 419 spams. And higher priority than mail from those of your competitors who have a worse reputation than yours.

Read More

AOL and AIM mail

Earlier this week a question came up on a mailing list. The questioner recently started seeing an increase in rejections to @aol.com addresses. These rejections said

Read More