What is an email address? (part one)

Given we deal with email addresses every day, dozens or thousands or millions of them, it seems a bit strange to ask what an email address is – but given some of the problems people have with the grubbier corners of address syntax it’s actually an interesting question.
There are two real standards that define what is a valid email address and what isn’t. The most complex is RFC 5322 – Internet Message Format, which describes all sorts of things about the structure of an email, including what’s valid to put in From: and To: headers. It’s really too liberal in what it allows an email address to look like to be terribly useful, but it does provide for one very commonly used feature – the friendly from where the name that’s displayed to the recipient is not just the email address.

    From: "Steve Atkins" 

Here the string that’s displayed to the user (Steve Atkins) comes first, surrounded by double quotes, then the email address itself (steve@example.com) surrounded by angle brackets. You might see other obsolete formats used, including parentheses or no double quotes, but this is the safe one to use.
The other standard is RFC 5321 – Simple Mail Transfer Protocol, which describes how email addresses are used to actually send email. It too is far too liberal in what it allows to be operationally useful for an ESP, but it does define some important features:

  • An email address consists of two parts, a local-part and a domain-part separated by an “@” – in steve@example.com, steve is the local-part and example.com is the domain-part.
  • The domain-part is an internet domain – it’s all you need to know to work out (via a DNS lookup) where an email needs to be sent to.
  • The domain-part is case-insensitive – ExamplE.COM is exactly the same as example.com or EXAMPLE.COM.
  • The local-part is used by the receiving mailserver to work out what to do with the email once it receives it.
  • The local-part is case-sensitive – STEVE@example.com is a different email address to steve@example.com.
  • You can put almost anything in a local-part – letters, numbers, white space, punctuation, quote marks, parentheses – as long as you quote it properly.
  • Only the receiving mailserver can parse the local-part. You might be able to guess what it means, but only the receiving mailserver can say for sure.

I’ve dismissed both of the actual email address standards as too liberal to be useful, so what is useful? I’ll go into some more detail about what it’s operationally sensible to allow and forbid as you’re capturing email addresses, and how to compare and de-dupe them tomorrow (unless I’m preempted by breaking news in the world of email deliveribility, anyway).

Related Posts

AOL and DKIM

Yesterday, on an ESPC call, Mike Adkins of AOL announced upcoming changes to the AOL reputation system. As part of these changes, AOL will be checking DKIM on the inbound. Best estimates are that this will be deployed in the first half of 2009, possibly in Q1. This is something AOL has been hinting at for most of 2008.
As part of this, AOL has deployed an address where any sender can check the validity of a DKIM signature against the AOL DKIM implementation. To check a signature, send an email to any address at dkimtest.aol.com.
I have done a couple of tests, from a domain not signing with either DK or DKIM, from a domain signing with DK and from a domain signing with both DK and DKIM. In all cases, the mail is rejected by AOL. The specific rejection messages are different, however.
Unsighng domain: host dkimtest-d01.mx.aol.com[205.188.103.106] said: 554-ERROR: No DKIM header found 554 TRANSACTION FAILED (in reply to
end of DATA command)
DK signing domain: “205.188.103.106 failed after I sent the message.
Remote host said: 554-ERROR: No DKIM header found
554 TRANSACTION FAILED”
DK/DKIM signing domain: “We tried to delivery your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 554 554-PASS: DKIM authentication verified
554 TRANSACTION FAILED (state 18).”
As you can see, in all cases mail is rejected from that address. However, when there is a valid DKIM signature, the failure message is “554-PASS.”
As I have been recommending for months now, all senders should be planning to sign with DKIM early in 2009. AOL’s announcement that they will be using DKIM signatures as part of their reputation scoring system is just one more reason to do so.

Read More

Why do ISPs limit emails per connection?

A few years ago it was “common knowledge” that if you were sending large amounts of email to an ISP the most polite way to do that, the way that would put the least load on the receiving mailserver, was to open a single SMTP session to the mailserver and then to send all the mail for that ISP down that single connection.
That’s because the receiving mailserver is concerned about two main resources when handling inbound email – the pool of “slots” assigned one per inbound SMTP session, and the bandwidth (network and disk, and related resouces such as memory and CPU) consumed by the inbound mail – and this approach means the sender only uses one slot, and it allows the receiving mailserver to control the bandwidth used simply by accepting data on that one connection at a given rate. It also amortizes all the connection setup costs over multiple emails. It’s a beautiful thing – it just doesn’t get any more efficient than that.
That seems perfect for the receiving ISP – but ISPs don’t encourage bulk senders to do this. Instead many of them have been moving from “one connection, lots of mail through it” to “multiple connections, a few messages through each”. They’re even limiting the number of deliveries permitted over a single connection. Why would that be?
The reason for this is driven by three things. One is that the number of simultaneous inbound SMTP sessions that a mailserver can handle is quite tightly limited by the architecture of most mailservers. Another is that the amount of mail that’s being sent to large ISP mailservers keeps going up and up – so there are sometimes more inbound SMTP sessions asking for access than the mailserver can handle. The third is that ISPs know that there are different categories of email being sent to their users – 1:1 mail from their friends that they want to see as soon as possible, wanted bulk mail that their users want to see when it arrives and spam; lots and lots of spam.
So ISPs want to be able to do things like accept 1:1 mail all the time, while deferring bulk mail and spam to allow them to shed traffic at times of peak load. But they can only make decisions about whether to accept or defer delivery in an efficient way at SMTP connection time – they pick and choose amongst the horde of inbound connection attempts to prioritize some and defer others, letting them keep within the number of inbound sessions that they can handle simultaneously.
But once the ISP lets a bulk mailer connect to deliver their mail, they lose most of the ability to further control that delivery as the sender might send thousands of emails down that connection. (Even if the ISP has the ability to throttle bandwidth – as some do to control obvious spam – that just means that the sender would tie up an expensive inbound delivery slot for longer).
So, in order to allow them to prioritize inbound connections effectively the ISP needs to terminate the session after a few deliveries, and then make that sender start competing with other senders for a connection again.
So ISPs aren’t limiting the number of deliveries per SMTP connection to make things difficult for senders, or because they don’t understand how mail works. They’re doing it because that lets them prioritize wanted email to their users. The same is true when they defer your mail with a 4xx response.
It might be annoying to have to deal with these limits on delivery, but for legitimate bulk mail senders all this throttling and prioritization is a good thing. Your mail may be given less priority than 1:1 mail – but, if you maintain a good reputation, you’re given higher priority than all the spam, higher priority than all the email borne viruses, higher priority than all the junk email, higher priority than the 419 spams. And higher priority than mail from those of your competitors who have a worse reputation than yours.

Read More

Update on Yahoo and the PBL

Last week I requested details about Yahoo rejections for IPs pointing to the PBL when the IP was not on the PBL. A blog reader did provide me with extremely useful logs documenting the problem. Thank you!
Based on my examination of the logs, this appears to be a problem only on some of the Yahoo! MXs. In fact, in the logs I was sent, the email was rejected from 2 machines and then eventually accepted by a third.
I have forwarded those logs onto Yahoo who are looking into the issue. I have also talked with one of the Spamhaus volunteers and Spamhaus is aware of the issue as well.
The right people are looking at the issue and Spamhaus and Yahoo are both working on fixing this.
Thanks for the reports and for the logs.

Read More