The anatomy of From:

Compared with some of the more complex pieces of the email protocol the From: header seems deceptively simple. But I’ve heard several people be confused about what it’s made up of over the past couple of months, so I thought I’d dig a bit deeper into how it’s defined and how it’s used in practice.
Here’s a simple example:
 
anatomyfrom
 
There are two interesting parts.
The first is what’s technically called the display-name, but more commonly known as the “friendly from” in the bulk email industry. It has no meaning within the email protocol, it’s just text that’s displayed to the recipient to describe who an email was sent by. Because it’s just text, you can put anything you like in there, but it’s usually either the name of the person who wrote the mail or the name of the company or brand that sent it.
The second is the actual email address, the thing with an at-sign in it. Surprisingly, this isn’t used at all during the actual delivery of the email; there’s a hidden field (called the return path or the 5321.MailFrom or the envelope sender  or the bounce address) that’s used instead. For person-to-person email it’s usually the same address, but for bulk mail it’s often different.
So what does the actual email address, the 5322.From, mean? For that we go to the document that specifies what email headers mean – RFC 5322, “Internet Message Format”. (RFC 5322 is the updated replacement of the older RFC 822 – and that’s why the actual email address is often called the 822.From or 5322.From when people are being precise about exactly which email address they’re talking about).
RFC 5322 says “The From: field specifies the author of the message, that is, the mailbox of the person or system responsible for the writing of the message.” and “In all cases, the From: field SHOULD NOT contain any mailbox that does not belong to the author of the message”. It’s the email address of the author of the message.
(In some cases the email may have been written by the author, but then sent on their behalf by someone else. RFC 5322 says that in that situation the email address in the From field is still the author of the message. The person who sent the message gets their own field, “Sender:”).
What is the 5322.From used for? During the delivery process it’s used for some sorts of filtering and authentication. In particular, if you’re reading about DMARC you’ll see “identifier alignment” mentioned a lot – which basically means “the only domain we care about authenticating is the one in the 5322.From”. It’s also the usual field that’s used in user-visible mail filtering such as whitelisting email addresses that are in the users address book.
In the mail client itself the most obvious use of the 5322.From is that when you hit reply, that’s the email address your reply will go to by default. The author of the mail can override that by adding a Reply-To field, containing one or more email addresses if they want different behaviour. It’s also commonly used to filter email and to group mails by author.
What’s displayed to the end user? Originally the entire content of the From: header was shown in the recipients mailbox but it’s now fairly common to display just the friendly from, with no mention of the email address at all. That started in mobile clients, where space is at a premium and the friendly from is just, well, friendlier – but it’s spread to desktop and webmail clients too. In Yahoo webmail the 5322.From isn’t displayed anywhere at all unless you find the View Full Header menu option and dig through the raw headers, and my phone doesn’t display it anywhere obvious and only recently made it possible to see it at all.

Related Posts

Who leaked my address, and when?

Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they’ve leaked to spammers.
I’d really like to know who leaked an email address, and when.
All my inbound mail is sorted into “spam” and “not-spam” by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my “recent spam”. That’s a huge amount of data, though, something like 40,000 pieces of spam a month.
Finding the needle of interesting data in that haystack is going to take some automation. As I’ve mentioned before you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.
I’m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can’t rely on the Date: header, as that’s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email – and it records that in the first Received: header in the message.

Read More

You can't technical yourself out of delivery problems

In many cases these days, many more cases than a lot of senders want to admit, delivery problems at the big ISPs are a result of sending mail recipients just don’t care about. The reason your mail is going to bulk? It’s not because you have minor problems in your headers. It’s not because you have some formatting issues. The reason is because your recipients just don’t care if the ISP delivers your mail or not.
A few years ago the bulk of my clients hired me to do technical audits for their mail. I fixed a lot of delivery problems that way. They’d send me their email and I’d run it through tools here and identify things they were doing that were likely to be causing problems. I’d give them some suggestions of things to change. Believe it or not, minor tweaks to headers and configuration actually did make a lot of difference in delivery.
Over time, though those tweaks less effective to fix delivery problems. Some of it is due to the MTA vendors, they’re a lot better at sending technically correct mail than they were before. There are also a lot more people giving good advice on the underlying structure and format of emails so senders can send technically clean email. I started seeing technically perfect emails from clients who were seeing major delivery problems.
There are a number of reasons that technical fixes don’t work like they used to. The short version, though, is that ISPs have dealt with much of the really blatant spam and they can focus more time and energy on the “grey mail”.
This makes my job a little harder. I can no longer just look at an email, maybe run it through some of our tools and provide a few suggestions that fix delivery problems. Delivery isn’t that simple any longer. Filters are really more focused on how the recipients react to mail. That means I need to know a lot more about a clients email program before I can even start to identify what might be causing the delivery issues.
I wish it were still so simple I could give minor technical tweaks that would appear to magically improve a client’s delivery. It was a lot simpler process then. But filters have evolved, and senders must evolve, too.

Read More

Ad-hoc analysis

I often pull emails into a database to analyze them, but sometimes I want something simpler. Emails are typically stored in one of two ways: mbox format, where an entire mailbox is stored in a single file, and maildir format, where a mailbox is a directory with one file in it for each email.
My desktop mail application is Mail.app on OS X, and it stores messages in a maildir-ish format, so I’m going to work with that here. If you’re using mbox format mailboxes it’s a little trickier (but you can use a tool called formmail to split an mbox style format into a maildir directory and go from there).
I want to gather some statistics on mail I’ve sent to abuse desks, so the first thing I do is open up a terminal window and change directory to where my “Sent Messages” mailbox is:
cd Library/Mail/V2/IMAP-steve@misc.wordtothewise.com/Sent Messages.mbox
(Tab completion is really useful for navigating through the mailbox hierarchy.)
Then I need to go through every email (file) in that directory, for each file find the “To:” header and check to see if it was sent to an abuse desk. If it was sent to an abuse desk I want to find the email address for each one, count how many times I see that email address and find the top twenty or so abuse desks I send reports to. I can do all that with a single command line:
find . -type f -exec egrep -m1 '^To:' {} ; | egrep -o 'abuse@[a-zA-Z0-9._-]+' | sort | uniq -c | sort -nr | head -20
(Enter that all as a single line, even though it’s wrapped into two here).
That’s a bit much to understand all at once, so lets redo that in several stages, with an intermediate file so we can see what’s going on.
find . -type f -exec egrep -m1 '^To:' {} ; >tolines.txt
The find command finds all the files in a directory and does something with them. In this case we start looking in the current directory (“.”), look just for files (“-type f”) and for each file we find we run that file through another command (“-exec egrep -m1 ‘^To:’ {} ;”) and write the result of that command to a file (“>tolines.txt”). The egrep command we run for each file goes through the file and prints out the first (“-m1”) line it finds that begins with “To:” (“‘^To:'”). If you run that and take a look at the file it creates you can see one line for each message, containing the “To:” header (or at least the first line of it).
The next thing to do is to go through that and pull out just the email addresses – and just the ones that are sent to abuse desks:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt
This uses egrep a second time, this time to look for lines that look like an email address (“‘abuse@[a-zA-Z0-9._-]+'”) and when it finds one print out just the part of the line that matched the pattern (“-o”).
Running that gives us one line of output for each email we’re interested in, containing the address it was sent to. Next we want to count how many times we see each one. There’s a command line idiom for that:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt | sort | uniq -c
This takes all the lines and sorts (“sort”, reasonably enough) them – so that identical lines will be next to each other – then counts runs of identical lines (“uniq -c”). We’re nearly there – the result of this is a count and an email address on each line. We just need to find the top 20:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt | sort | uniq -c | sort -nr | head -20
Each line begins with the count, so we can use sort again, this time telling it to sort by number, high to low (“sort -nr”). Finally, “head -20” will print just the first 20 lines of the result.
The final result is this:

Read More