Content based filtering

A spam filter looks at many things when it’s deciding whether or not to deliver a message to the recipients inbox, usually divided into two broad categories – the behaviour of the sender and the content of the message.
When we talk about sender behaviour we’ll often dive headfirst into the technical details of how that’s monitored and tracked – history of mail from the same IP address, SPF records, good reverse DNS, send rates and ramping, polite SMTP level behaviour, DKIM and domain-based reputation and so on. If all of those are OK and the mail still doesn’t get delivered then you might throw up your hands, fall back on “it’s content-based filtering” and not leave it at that.
There’s just as much detail and scope for diagnosis in content-based filtering, though, it’s just a bit more complex, so some delivery folks tend to gloss over it. If you’re sending mail that people want to receive, you’re sure you’re sending the mail technically correctly and you have a decent reputation as a sender then it’s time to look at the content.
You want your mail to look just like wanted mail from reputable, competent senders and to look different to unwanted mail, viruses, phishing emails, botnet spoor and so on. And not just to mechanical spam filters – if a postmaster looks at your email, you want it to look clean, honest and competently put together to them too.
Some of the distinctive content differences between wanted and unwanted email are due to the content as written by the sender, some of them are due to senders of unwanted email trying to hide their identity or their content, but many of them are due to the different quality software used to send each sort of mail. Mail clients used by individuals, and content composition software used by high quality ESPs tends to be well written and complies with both the email and MIME RFCs, and the unwritten best common practices for email composition. The software used by spammers, botnets, viruses and low quality ESPs tends not to do so well.
Here’s a (partial) list of some of the things to consider:

MIME Structure
Good email tends to be either plain text or a multipart mail consisting of two versions of the same message, one in HTML and one in plain text.
Bad email often doesn’t have the plain text part. Either it’s missing altogether, or it’s completely different (much shorter) content than the HTML part.
Text Encoding
Bad email often tries to hide it’s content from spam filters. One common way of doing this is to use base64 encoding for text where quoted-printable encoding would be appropriate.
Lazy software developers sometimes base64 encode everything, as it’s less work than deciding which encoding is appropriate for a message part. Doing that looks dishonest or incompetent to filters and postmasters.
Images
Another way bad email tries to hide it’s content is by misuse of images. The most obvious example of this is mail that consists of just a single huge image – sometimes that’s just because it’s easier for the graphic designer to do that way, but more often it’s a spammer trying to hide their content from filters. Either way, it’s much less likely to be delivered.
Including CAN-SPAM required boilerplate (such as the postal address) purely as an image is another thing that’s distinctive to bad email. Bad email hides the contact address in that way so as to avoid people being able to search based on it to track their behaviour across brands and shell companies, and to stop people using it to key targeted spam filters on. Good email doesn’t need to do that.
HTML Structure
If your email is completely unreadable with images not displayed, it’s not going to be a good marketing piece in the (common) case that images aren’t shown. Including appropriate ALT text for each image not only makes it look better to recipients when images are turned off, it also makes it look more legitimate to postmasters with ticketing systems that don’t display images, or only show the raw HTML. It sometimes makes spam filters happier too.
That’s just one example of sending “good” html.
Phishy URLs
Bad email sent by phishers often includes links that look like <a href=”http://phisher.ru/”>bank.com</a>, where the message is trying to look like legitimate email from bank.com, but it’s sending readers to phisher.ru instead. <a href=”http://bank.com.whatever.phisher.ru/”>bank.com</a> is an even more obvious attempt to defraud the recipient.
Otherwise good email sent by naive ESPs often includes links that look like <a href=”http://click.esp.com?trackdata=xxxx&target=bank.com/”>bank.com</a>. To a spam filter, that looks much the same as a typical phishing URL, and the delivery is not going to go well.
Bad Phrasing or Appearance
Even if 100% of your recipients desperately wait for every issue of your newsletter there are some phrases that will cause you more problems than others. “Looking spammy” is one of the worst things for your email if you need to discuss a delivery issue with a postmaster or a filter vendor – if it “looks like spam” they’re much less likely to believe it’s really wanted by recipients.
If your newsletter is about “Moustache Rides” (real example, I’m not making this up) then you might not be able to fix the phrasing, but you should try and make the rest of the newsletter look professionally put together, as much as you can anyway.
URL Reputation
If two emails received “look similar” and the recipient complained about the first one, it’s likely the second one will be unwanted too. But mechanically detecting similar content is complex and expensive to do, so a common trick is to “fingerprint” each email by looking for distinctive features in it, and considering messages that share a fingerprint to be similar.
One of the simpler fingerprints to use is the URLs used in links in the mail, more specifically the hostnames of the links. If someone is sending bad email and you send email using the same URLs or hostnames, it’s likely to be treated poorly.
Fiddly Trivia
There are lots of other fiddly little things that spam filters key on too. You shouldn’t obsess about them too much, but it’s worth being aware of the sort of things that can make a difference. SpamAssassin publish some of the rules they use. If you look at the rules, look at the scores too – a rule with a score of 0.001 isn’t very relevant.

Related Posts

Delivery Jobs

There are a couple companies currently looking for delivery specialists.
e-Dialog: Delivery Specialist
Responsys: Delivery Consultant
ThinData: Delivery & ISP Relations Analyst
ThinData: Privacy Analyst
Know anyone else hiring? Leave links in the comments.

Read More

Speaking to executives about deliverability

Exacttarget published a Deliverability whitepaper today. They interviewed a number of people around the email industry and asked them what they would tell C-level executives about email and email marketing.
It’s well worth a read, particularly given there are at least two ISP representatives speaking out about what they think makes a good email marketing program. You’ll see many of the themes we talk about here represented in the various articles.
Good delivery boils down to a few things, the most important of which is sending mail people have asked for and want.

Read More

Tagged Email Addresses

Sept 17, 2019: Shutting down comments on this post because we cannot help you recover any email account and I am concerned about the number of people who are providing PII (including phone numbers, credit card numbers!!! and email addresses) in the comments. 

Read More