Content based filtering

Content filtering is often hard to explain to people, and I’m not sure I’ve yet come up with a good way to explain it.
A lot of people think content reputation is about specific words in the message. The traditional content explanation is that words like “Free” or too many exclamation points in the subject line are bad and will be filtered. But it’s not the words that are the issue it’s that the words are often found in spam. These days filters are a lot smarter than to just look at individual words, they look at the overall context of the message.
ISP_tolerances
Even when we’re talking content filters, the content is just a way to identify mail that might cause problems. Those problems are evaluated the same way IP reputation is measured: complaints, engagement, bad addresses. But there’s a lot more to content filtering than just the engagement piece. What else is part of content evaluation?

  1. Does the mail have hashbusters? Hashbusters are blocks of text, sometimes invisible to the recipient, that are put in an email in order to break some types of filtering. Ways to hide text include in HTML comments and by making foreground and background text the same color.
  2. Does the mail have valid HTML? Spammers have frequently used invalid HTML tags as a way to avoid filters by breaking up content or as hashbusters.
  3. Does this mail contain malicious content? These filters look for virus signatures or code that may compromise a recipient’s computer. Very few legitimate mailers have mail caught in virus filters, but every incoming mail is still  scanned for viruses or malicious code.
  4. Does this mail look like a phish? These filters look at the domains and authentication, but also look for common words and tricks phishers use. This filter is most likely to catch legitimate mail using tracking links with different URL content in the text portion of the HTML. An example of this kind of trigger is <a href=”http://tracking.example.com/login.html”>http://paypal.com</a>. Making sure there aren’t URLs, email addresses or hostnames in the text portion of a link generally avoids this kind of filter.
  5. Is this an industry with a bad reputation? The most obvious examples here are payday loans. There are so many horrible players in the online payday loan industry that it doesn’t really matter how good or clean individual mailers are. Payday loans are filtered heavily. Stock and financial messages also have challenges because there are so many pump-n-dump spammers out there.

Changing content can cause an improvement in delivery. But if that content was flagged because of user complaints or bad recipient profiles, the content filters will catch up. Continuing to attempt to evade filters by changing content can result in IP based filtering.
These are just a few of the things companies look at when evaluating content.
 

Related Posts

Horses, not zebras

I was first introduced to the maxim “When you hear hoofbeats, think horses not zebras” when I worked in my first molecular biology lab 20-some-odd years ago. I’m no longer a gene jockey, but I still find myself applying this to troubleshooting delivery problems for clients.
It’s not that I think all delivery problems are caused by “horses”, or that “zebras” never cause problems for email delivery. It’s more that there are some very common causes of delivery problems and it’s a more effective use of time to address those common problems before getting into the less common cases.
This was actually something that one of the mailbox provider reps said at M3AAWG in SF last month. They have no problem with personal escalations when there’s something unusual going on. But, the majority of issues can be handled through the standard channels.
What are the horses I look for with delivery problems.

Read More

Ever changing filtering

One of the ongoing challenges sending email, and managing a high volume outbound mail server is dealing with the ongoing changes in filtering. Filters are not static, nor can they be. As ISPs and filtering companies identify new ways to separate out wanted email from unwanted email, spammers find new ways to make their mail look more like wanted mail.
This is one reason traps are useful to filtering companies. With traps there is no discussion about whether or not the mail was requested. No one with any connection to the email address opted in to receive mail. The mail was never requested. While it is possible for trap addresses to get on any list monitoring mail to spam traps is a way to monitor which senders don’t have good practices.
New filtering techniques are always evolving. I mentioned yesterday that Gmail was making filtering changes, and that this was causing a lot of delivery issues for senders. The other major challenge for Gmail is the personalized delivery they are doing. It’s harder and harder for senders to monitor their inbox delivery because almost every inbox is different at Gmail. I’ve seen different delivery in some of my own mailboxes at Gmail.
All of this makes email delivery an ongoing challenge.

Read More

The death of IP based reputation

Back in the dark ages of email delivery the only thing that really mattered to get your email into the inbox was having a good IP reputation. If your IP sent good mail most of the time, then that mail got into the inbox and all was well with the world. All that mattered was that good IP reputation. Even better for the people who wanted to game the system and get their spam into the inbox, there were many ways to get around IP reputation.
Every time the ISPs and spam filtering companies would work out a way to block spam using IP addresses, spammers would figure out a way around the problem. ISPs started blocking IPs so spammers moved to open relays. Filters started blocking open relays, so spammers moved to open proxies. Filters started blocking mail open proxies so spammers created botnets. Filters started blocking botnets, so spammers started stealing IP reputation by compromising ESP and ISP user accounts.  Filters were constantly playing catchup with the next new method of getting a good IP reputation, while still sending spam.
While spammers were adapting and subverting IP based filtering a number of other things were happening. Many smart people in the email space were looking at improving authentication technology. SPF was the beginning, but problems with SPF led to Domains Keys and DKIM. Now we’re even seeing protocols (DMARC) layered on top of DKIM. Additionally, the price of data storage and processing got cheaper and data mining software got better.
The improvement in processing power, data mining and data storage made it actually feasible for ISPs and filtering companies to analyze content at standard email delivery speeds. Since all IPv4 addresses are now allocated, most companies are planning for mail services to migrate to IPv6. There are too many IPv6 IPss to rely on IP reputation for delivery decisions.
What this means is that in the modern email filtering system, IPs are only a portion of the information filters look at when making delivery decisions. Now, filters look at the overall content of the email, including images and URLs. Many filters are even following URLs to confirm the landing pages aren’t hosting malicious software, or isn’t content that’s been blocked before. Some filters are looking at DNS entries like nameservers and seeing if those nameservers are associated with bad mail. That’s even before we get to the user feedback, in the form of “this is spam” or “this is not spam” clicks, which now seem to affect both content, domain and IP reputation.
I don’t expect IP reputation to become a complete non-issue. I think it’s still valuable data for ISPs and filters to evaluate as part of the delivery decision process. That being said, IP reputation is so much less a guiding factor in good email delivery than it was 3 or 4 years ago. Just having an IP with a great reputation is not sufficient for inbox delivery. You have to have a good IP reputation and good content and good URLs.
Anyone who wants good email delivery should consider their IP reputation, but only as one piece of the delivery strategy. Focusing on a great IP reputation will not guarantee good inbox delivery. Look at the whole program, not just a small part of it.

Read More