Content based filtering

A spam filter looks at many things when it’s deciding whether or not to deliver a message to the recipients inbox, usually divided into two broad categories – the behaviour of the sender and the content of the message.
When we talk about sender behaviour we’ll often dive headfirst into the technical details of how that’s monitored and tracked – history of mail from the same IP address, SPF records, good reverse DNS, send rates and ramping, polite SMTP level behaviour, DKIM and domain-based reputation and so on. If all of those are OK and the mail still doesn’t get delivered then you might throw up your hands, fall back on “it’s content-based filtering” and not leave it at that.
There’s just as much detail and scope for diagnosis in content-based filtering, though, it’s just a bit more complex, so some delivery folks tend to gloss over it. If you’re sending mail that people want to receive, you’re sure you’re sending the mail technically correctly and you have a decent reputation as a sender then it’s time to look at the content.
You want your mail to look just like wanted mail from reputable, competent senders and to look different to unwanted mail, viruses, phishing emails, botnet spoor and so on. And not just to mechanical spam filters – if a postmaster looks at your email, you want it to look clean, honest and competently put together to them too.
Some of the distinctive content differences between wanted and unwanted email are due to the content as written by the sender, some of them are due to senders of unwanted email trying to hide their identity or their content, but many of them are due to the different quality software used to send each sort of mail. Mail clients used by individuals, and content composition software used by high quality ESPs tends to be well written and complies with both the email and MIME RFCs, and the unwritten best common practices for email composition. The software used by spammers, botnets, viruses and low quality ESPs tends not to do so well.
Here’s a (partial) list of some of the things to consider:

MIME Structure
Good email tends to be either plain text or a multipart mail consisting of two versions of the same message, one in HTML and one in plain text.
Bad email often doesn’t have the plain text part. Either it’s missing altogether, or it’s completely different (much shorter) content than the HTML part.
Text Encoding
Bad email often tries to hide it’s content from spam filters. One common way of doing this is to use base64 encoding for text where quoted-printable encoding would be appropriate.
Lazy software developers sometimes base64 encode everything, as it’s less work than deciding which encoding is appropriate for a message part. Doing that looks dishonest or incompetent to filters and postmasters.
Images
Another way bad email tries to hide it’s content is by misuse of images. The most obvious example of this is mail that consists of just a single huge image – sometimes that’s just because it’s easier for the graphic designer to do that way, but more often it’s a spammer trying to hide their content from filters. Either way, it’s much less likely to be delivered.
Including CAN-SPAM required boilerplate (such as the postal address) purely as an image is another thing that’s distinctive to bad email. Bad email hides the contact address in that way so as to avoid people being able to search based on it to track their behaviour across brands and shell companies, and to stop people using it to key targeted spam filters on. Good email doesn’t need to do that.
HTML Structure
If your email is completely unreadable with images not displayed, it’s not going to be a good marketing piece in the (common) case that images aren’t shown. Including appropriate ALT text for each image not only makes it look better to recipients when images are turned off, it also makes it look more legitimate to postmasters with ticketing systems that don’t display images, or only show the raw HTML. It sometimes makes spam filters happier too.
That’s just one example of sending “good” html.
Phishy URLs
Bad email sent by phishers often includes links that look like <a href=”http://phisher.ru/”>bank.com</a>, where the message is trying to look like legitimate email from bank.com, but it’s sending readers to phisher.ru instead. <a href=”http://bank.com.whatever.phisher.ru/”>bank.com</a> is an even more obvious attempt to defraud the recipient.
Otherwise good email sent by naive ESPs often includes links that look like <a href=”http://click.esp.com?trackdata=xxxx&target=bank.com/”>bank.com</a>. To a spam filter, that looks much the same as a typical phishing URL, and the delivery is not going to go well.
Bad Phrasing or Appearance
Even if 100% of your recipients desperately wait for every issue of your newsletter there are some phrases that will cause you more problems than others. “Looking spammy” is one of the worst things for your email if you need to discuss a delivery issue with a postmaster or a filter vendor – if it “looks like spam” they’re much less likely to believe it’s really wanted by recipients.
If your newsletter is about “Moustache Rides” (real example, I’m not making this up) then you might not be able to fix the phrasing, but you should try and make the rest of the newsletter look professionally put together, as much as you can anyway.
URL Reputation
If two emails received “look similar” and the recipient complained about the first one, it’s likely the second one will be unwanted too. But mechanically detecting similar content is complex and expensive to do, so a common trick is to “fingerprint” each email by looking for distinctive features in it, and considering messages that share a fingerprint to be similar.
One of the simpler fingerprints to use is the URLs used in links in the mail, more specifically the hostnames of the links. If someone is sending bad email and you send email using the same URLs or hostnames, it’s likely to be treated poorly.
Fiddly Trivia
There are lots of other fiddly little things that spam filters key on too. You shouldn’t obsess about them too much, but it’s worth being aware of the sort of things that can make a difference. SpamAssassin publish some of the rules they use. If you look at the rules, look at the scores too – a rule with a score of 0.001 isn’t very relevant.

Don't forget to check out the forest

I have the #emailmarketing feed on twitter scrolling live across my screen while I’m working. It’s been an interesting experience as many of the people who tweet #emailmarketing aren’t part of my social network.
Over the last week or so there’s been a lot of tweeting going on about Ben and Jerry’s GIVING UP EMAIL MARKETING!!! Only, come to find out, that’s not what they’re doing. Yes, they are moving more into the social networking arena but they will be continuing to connect with subscribers through email. Today many are tweeting that perhaps they “jumped the cow” with their initial reports of email abandonment by B&J.
Watching the ongoing discussions led me to wonder if a lot of email marketers are so focused on the trees that they miss the forest? Are they so disconnected from how people actually use email, and social networks for that matter, that they spend way to much time chasing a response and not enough time thinking about what they’re saying and doing?
Email marketing discussions often focus on a limited number of things, the biggest are how to get mail to the inbox and how to get recipients to engage. Many marketers spend time and money looking for the elusive combination of factors that will get their mail to the inbox and impel the recipient to give the sender money. The focus is on details like color and pre-headers and length and timing and content above and below the fold and the perfect call to action.
The discussions focus almost exclusively on the sender and only mention the subscriber in passing. That is understandable on one level. Senders can only control one end of the equation and figuring out what inputs compel the best response from the other side is what marketing is all about.
But there’s another part of email marketing, and that is that subscribers invite marketers into their inboxes. When someone subscribes to a newsletter or mail from a company they’re offering that company the opportunity to interact with them in their personal space. This is, in fact, the holy grail of marketing having the customer invite contact from a seller.
I suspect this is why the rumors of Ben and Jerry’s abandoning email had people all up in arms. A company abandoning a channel where they had an engaged and interested audience? PREPOSTEROUS! What’s happening to email as marketing?
I’ll be honest, I didn’t pay much attention because it was such a silly idea. Any marketer worth their salt wouldn’t give up a way to interact with customers. Ben and Jerry’s is a company with an almost cult like following. Anyone who was going to subscribe to a B&J newsletter was going to want that mail (new flavors! coupons! new locations! inside information!).
Someone started a rumor, though, that B&J were abandoning email marketing and everyone focusing on the trees grabbed that story and ran with it. They were so focused on the details they didn’t take a step back and think about what they were repeating. Had they taken a step back and thought about the forest they would have realized how silly the idea of B&Js abandoning email as a customer communication channel was.

The good, the typical and the ugly

In the theme of the ongoing discussions about ESPs and their role in the email ecosystem, I thought I’d present some examples of how different ESPs work.
The good ESPs are those that set and enforce higher standards than the ISPs. They invest money and time in both proactive and reactive policy enforcement. On Monday I’ll talk about these standards, and the benefits of implementing these policies.
The typical ESPs are those that have standards equivalent to those of the ISPs. They suspend or disconnect customers when the customers generate problems at the ISPs. They have some proactive policy enforcement, but most of their enforcement is reactive. On Tuesday I’ll talk about these standards and how they’re perceived by the ISPs and spam filtering companies.
The ugly ESPs are those that have low standards and few enforcement policies. They let customers send mail without permission. Some of the ugly ESPs even abuse other ESPs to send some of their mail, thus sharing their bad reputations across the industry. On Wednesday I’ll look at some of their practices and discuss how they affect other players in the industry.

Getting removed from an ISP block

A question came up on a mailing list about how long it typically took to resolve a spam block at an ISP. I don’t think that question actually has a single answer, as each ISP has their own, special, process.
ISPA takes 5 minutes. You fill out a form, it runs through their automated system and you’re usually delisted.
ISPB asks a lot of questions in their form, so it takes about 15 minutes to collect all the data they want and 10 minutes to fill out their form. Then, using very, very short words you keep repeating what you need to the tier 1 person who initially responded. That person eventually figures out they can’t blow you off and throws your request to tier 2, who handles it immediately.
ISPC has a different, somewhat long form. Again, you spend time collecting all the data and then fill out the somewhat obscure form. You get a response, but it’s a boilerplate totally unrelated to the initial request, so you keep answering until you find a tier 1 rep who can read and do what you initially asked.
ISPD has a form that takes about 2 minutes to fill out. Unfortunately, it goes to an outsourced postmaster team in the Far East and response times are ranging from days to months right now.
ISPE has an email address and if you catch them on a good day, they’re very helpful. Sometimes there’s no response, though.
ISPF has a troubleshooting page and accept requests to fix things, but never respond in any visible manner.
ISPG they tells you to talk to Spamfiltering Company H.
Spamfiltering company H answers their email in a prompt and friendly manner. OK, sometimes the answers are just “wow, your client/customer/IP range is sending lots of spam,” but hey, it’s an answer.
Spamfiltering company I is a useless bag of protoplasm and don’t even answer the email address they give you on their webpages. In a fit of fairness, I have heard they will occasionally respond, but usually that response is to tell you to go pay some apparently unrelated company a bribe to get delisted.
Spamfiltering company J doesn’t have a lot of ways to contact them, but have a lot of folks that participate in various semi-public arenas so if you’re even slightly part of the community, you can email them and they’re very helpful.
Spamfiltering company K is totally useless, but will tell you to have recipients whitelist you.

Content based filtering

Share :

Related Posts

Don't forget to check out the forest

The good, the typical and the ugly

Getting removed from an ISP block