Content based filtering

A spam filter looks at many things when it’s deciding whether or not to deliver a message to the recipients inbox, usually divided into two broad categories – the behaviour of the sender and the content of the message.
When we talk about sender behaviour we’ll often dive headfirst into the technical details of how that’s monitored and tracked – history of mail from the same IP address, SPF records, good reverse DNS, send rates and ramping, polite SMTP level behaviour, DKIM and domain-based reputation and so on. If all of those are OK and the mail still doesn’t get delivered then you might throw up your hands, fall back on “it’s content-based filtering” and not leave it at that.
There’s just as much detail and scope for diagnosis in content-based filtering, though, it’s just a bit more complex, so some delivery folks tend to gloss over it. If you’re sending mail that people want to receive, you’re sure you’re sending the mail technically correctly and you have a decent reputation as a sender then it’s time to look at the content.
You want your mail to look just like wanted mail from reputable, competent senders and to look different to unwanted mail, viruses, phishing emails, botnet spoor and so on. And not just to mechanical spam filters – if a postmaster looks at your email, you want it to look clean, honest and competently put together to them too.
Some of the distinctive content differences between wanted and unwanted email are due to the content as written by the sender, some of them are due to senders of unwanted email trying to hide their identity or their content, but many of them are due to the different quality software used to send each sort of mail. Mail clients used by individuals, and content composition software used by high quality ESPs tends to be well written and complies with both the email and MIME RFCs, and the unwritten best common practices for email composition. The software used by spammers, botnets, viruses and low quality ESPs tends not to do so well.
Here’s a (partial) list of some of the things to consider:

MIME Structure
Good email tends to be either plain text or a multipart mail consisting of two versions of the same message, one in HTML and one in plain text.
Bad email often doesn’t have the plain text part. Either it’s missing altogether, or it’s completely different (much shorter) content than the HTML part.
Text Encoding
Bad email often tries to hide it’s content from spam filters. One common way of doing this is to use base64 encoding for text where quoted-printable encoding would be appropriate.
Lazy software developers sometimes base64 encode everything, as it’s less work than deciding which encoding is appropriate for a message part. Doing that looks dishonest or incompetent to filters and postmasters.
Images
Another way bad email tries to hide it’s content is by misuse of images. The most obvious example of this is mail that consists of just a single huge image – sometimes that’s just because it’s easier for the graphic designer to do that way, but more often it’s a spammer trying to hide their content from filters. Either way, it’s much less likely to be delivered.
Including CAN-SPAM required boilerplate (such as the postal address) purely as an image is another thing that’s distinctive to bad email. Bad email hides the contact address in that way so as to avoid people being able to search based on it to track their behaviour across brands and shell companies, and to stop people using it to key targeted spam filters on. Good email doesn’t need to do that.
HTML Structure
If your email is completely unreadable with images not displayed, it’s not going to be a good marketing piece in the (common) case that images aren’t shown. Including appropriate ALT text for each image not only makes it look better to recipients when images are turned off, it also makes it look more legitimate to postmasters with ticketing systems that don’t display images, or only show the raw HTML. It sometimes makes spam filters happier too.
That’s just one example of sending “good” html.
Phishy URLs
Bad email sent by phishers often includes links that look like <a href=”http://phisher.ru/”>bank.com</a>, where the message is trying to look like legitimate email from bank.com, but it’s sending readers to phisher.ru instead. <a href=”http://bank.com.whatever.phisher.ru/”>bank.com</a> is an even more obvious attempt to defraud the recipient.
Otherwise good email sent by naive ESPs often includes links that look like <a href=”http://click.esp.com?trackdata=xxxx&target=bank.com/”>bank.com</a>. To a spam filter, that looks much the same as a typical phishing URL, and the delivery is not going to go well.
Bad Phrasing or Appearance
Even if 100% of your recipients desperately wait for every issue of your newsletter there are some phrases that will cause you more problems than others. “Looking spammy” is one of the worst things for your email if you need to discuss a delivery issue with a postmaster or a filter vendor – if it “looks like spam” they’re much less likely to believe it’s really wanted by recipients.
If your newsletter is about “Moustache Rides” (real example, I’m not making this up) then you might not be able to fix the phrasing, but you should try and make the rest of the newsletter look professionally put together, as much as you can anyway.
URL Reputation
If two emails received “look similar” and the recipient complained about the first one, it’s likely the second one will be unwanted too. But mechanically detecting similar content is complex and expensive to do, so a common trick is to “fingerprint” each email by looking for distinctive features in it, and considering messages that share a fingerprint to be similar.
One of the simpler fingerprints to use is the URLs used in links in the mail, more specifically the hostnames of the links. If someone is sending bad email and you send email using the same URLs or hostnames, it’s likely to be treated poorly.
Fiddly Trivia
There are lots of other fiddly little things that spam filters key on too. You shouldn’t obsess about them too much, but it’s worth being aware of the sort of things that can make a difference. SpamAssassin publish some of the rules they use. If you look at the rules, look at the scores too – a rule with a score of 0.001 isn’t very relevant.

Related Posts

The good, the typical and the ugly

In the theme of the ongoing discussions about ESPs and their role in the email ecosystem, I thought I’d present some examples of how different ESPs work.
The good ESPs are those that set and enforce higher standards than the ISPs. They invest money and time in both proactive and reactive policy enforcement. On Monday I’ll talk about these standards, and the benefits of implementing these policies.
The typical ESPs are those that have standards equivalent to those of the ISPs. They suspend or disconnect customers when the customers generate problems at the ISPs. They have some proactive policy enforcement, but most of their enforcement is reactive. On Tuesday I’ll talk about these standards and how they’re perceived by the ISPs and spam filtering companies.
The ugly ESPs are those that have low standards and few enforcement policies. They let customers send mail without permission. Some of the ugly ESPs even abuse other ESPs to send some of their mail, thus sharing their bad reputations across the industry. On Wednesday I’ll look at some of their practices and discuss how they affect other players in the industry.

Read More

Getting removed from an ISP block

A question came up on a mailing list about how long it typically took to resolve a spam block at an ISP. I don’t think that question actually has a single answer, as each ISP has their own, special, process.
ISPA takes 5 minutes. You fill out a form, it runs through their automated system and you’re usually delisted.
ISPB asks a lot of questions in their form, so it takes about 15 minutes to collect all the data they want and 10 minutes to fill out their form. Then, using very, very short words you keep repeating what you need to the tier 1 person who initially responded. That person eventually figures out they can’t blow you off and throws your request to tier 2, who handles it immediately.
ISPC has a different, somewhat long form. Again, you spend time collecting all the data and then fill out the somewhat obscure form. You get a response, but it’s a boilerplate totally unrelated to the initial request, so you keep answering until you find a tier 1 rep who can read and do what you initially asked.
ISPD has a form that takes about 2 minutes to fill out. Unfortunately, it goes to an outsourced postmaster team in the Far East and response times are ranging from days to months right now.
ISPE has an email address and if you catch them on a good day, they’re very helpful. Sometimes there’s no response, though.
ISPF has a troubleshooting page and accept requests to fix things, but never respond in any visible manner.
ISPG they tells you to talk to Spamfiltering Company H.
Spamfiltering company H answers their email in a prompt and friendly manner. OK, sometimes the answers are just “wow, your client/customer/IP range is sending lots of spam,” but hey, it’s an answer.
Spamfiltering company I is a useless bag of protoplasm and don’t even answer the email address they give you on their webpages. In a fit of fairness, I have heard they will occasionally respond, but usually that response is to tell you to go pay some apparently unrelated company a bribe to get delisted.
Spamfiltering company J doesn’t have a lot of ways to contact them, but have a lot of folks that participate in various semi-public arenas so if you’re even slightly part of the community, you can email them and they’re very helpful.
Spamfiltering company K is totally useless, but will tell you to have recipients whitelist you.

Read More

We're gonna party like it's 1996!

Over on deliverability.com Dela Quist has a long blog post up talking about how changes to Hotmail and Gmail’s priority inbox are a class action suit waiting to happen.
All I can say is that it’s all been tried before. Cyberpromotions v. AOL started the ball rolling when they tried to use the First Amendment to force AOL to accept their unsolicited email. The courts said No.
Time goes on and things change. No one argues Sanford wasn’t spamming, he even admitted as much in his court documents. He was attempting to force AOL to accept his unsolicited commercial email for their users. Dela’s arguments center around solicited mail, though.
Do I really think that minor difference in terminology going to change things?
No.
First off “solicited” has a very squishy meaning when looking at any company, particularly large national brands. “We bought a list” and “This person made a purchase from us” are more common than any email marketer wants to admit to. Buying, selling and assuming permission are par for the course in the “legitimate” email marketing world. Just because the marketer tells me that I solicited their email does not actually mean I solicited their email.
Secondly, email marketers don’t get to dictate what recipients do and do not want. Do ISPs occasionally make boneheaded filtering decisions? I’d be a fool to say no. But more often than not when an ISP blocks your mail or filters it into the bulk folder they are doing it because the recipients don’t want that mail and don’t care that it’s in the bulk folder. Sorry, much of the incredibly important marketing mail isn’t actually that important to the recipient.
Dela mentions things like bank statements and bills. Does he really think that recipients are too stupid to add the from address to their address books? Or create specific filters so they can get the mail they want? People do this regularly and if they really want mail they have the tools, provided by the ISP, to make the mail they want get to where they want it.
Finally, there is this little law that protects ISPs. 47 USC 230 states:

Read More