Content based filtering

laura
September 10, 2014
Industry

Content filtering is often hard to explain to people, and I’m not sure I’ve yet come up with a good way to explain it.
A lot of people think content reputation is about specific words in the message. The traditional content explanation is that words like “Free” or too many exclamation points in the subject line are bad and will be filtered. But it’s not the words that are the issue it’s that the words are often found in spam. These days filters are a lot smarter than to just look at individual words, they look at the overall context of the message.
ISP_tolerances
Even when we’re talking content filters, the content is just a way to identify mail that might cause problems. Those problems are evaluated the same way IP reputation is measured: complaints, engagement, bad addresses. But there’s a lot more to content filtering than just the engagement piece. What else is part of content evaluation?

Does the mail have hashbusters? Hashbusters are blocks of text, sometimes invisible to the recipient, that are put in an email in order to break some types of filtering. Ways to hide text include in HTML comments and by making foreground and background text the same color.
Does the mail have valid HTML? Spammers have frequently used invalid HTML tags as a way to avoid filters by breaking up content or as hashbusters.
Does this mail contain malicious content? These filters look for virus signatures or code that may compromise a recipient’s computer. Very few legitimate mailers have mail caught in virus filters, but every incoming mail is still scanned for viruses or malicious code.
Does this mail look like a phish? These filters look at the domains and authentication, but also look for common words and tricks phishers use. This filter is most likely to catch legitimate mail using tracking links with different URL content in the text portion of the HTML. An example of this kind of trigger is <a href=”http://tracking.example.com/login.html”>http://paypal.com</a>. Making sure there aren’t URLs, email addresses or hostnames in the text portion of a link generally avoids this kind of filter.
Is this an industry with a bad reputation? The most obvious examples here are payday loans. There are so many horrible players in the online payday loan industry that it doesn’t really matter how good or clean individual mailers are. Payday loans are filtered heavily. Stock and financial messages also have challenges because there are so many pump-n-dump spammers out there.

Changing content can cause an improvement in delivery. But if that content was flagged because of user complaints or bad recipient profiles, the content filters will catch up. Continuing to attempt to evade filters by changing content can result in IP based filtering.
These are just a few of the things companies look at when evaluating content.

Content, trigger words and subject lines

laura
Jan 11, 2012

Industry

There’s been quite a bit of traffic on twitter this afternoon about a recent blog post by Hubspot identifying trigger words senders should avoid in an email subject line. A number of email experts are assuring the world that content doesn’t matter and are arguing on twitter and in the post comments that no one will block an email because those words are in the subject line.
As usually, I think everyone else is a little bit right and a little bit wrong.
The words and phrases posted by Hubspot are pulled out of the Spamassassin rule set. Using those words or exact phrases will cause a spam score to go up, sometimes by a little (0.5 points) and sometimes by a lot (3+ points). Most spamassassin installations consider anything with more than 5 points to be spam so a 3 point score for a subject line may cause mail to be filtered.
The folks who are outraged at the blog post, though, don’t seem to have read the article very closely. Hubspot doesn’t actually say that using trigger words will get mail blocked. What they say is a lot more reasonable than that.

laura
May 22, 2012

Asides , Industry

Return Path have an interesting post up about content filtering. I like the model of 3 different kinds of filters, in fact it’s one I’ve been using with clients for over 18 months. Spamfiltering isn’t really about one number or one filter result, it’s a complex interaction of lots of different heuristics designed to answer the question: do recipients want this kind of mail?

laura
Mar 27, 2012

Industry

A colleague of mine was dealing with a client who is experiencing some difficulty delivering to the bulk folder. Said client spent much of a one hour phone call repeating “This is not how a free society works!!”
After the call my colleague commented, “I refuse to get ranty about filter systems.”
I know that filters, and the people who write and maintain them, are a frequent scapegoat for senders. The filters are always the problem, not anything the senders do.
Now, I’ll be the last person who will claim spam filters are perfect, they’re not. Filters sometimes do unexpected things, sometimes they do boneheaded things, sometimes they are broken.
We can’t forget, though, that filters perform a vital role in protecting users from malicious emails. Phishing emails, scams, fake products, viruses are a constant threat. Many end users don’t need to worry about this because filters are so good. But an unfiltered account can get thousands of scams and spams a day (ask me how I know).
Most of us in the delivery space can tell when a filter is working as intended and when there’s an underlying problem. And when the filter is working as intended there’s not a lot of use complaining about them. Ranting about filtering systems often delays a resolution. Senders that focus on what they can control tend to have more success reaching the inbox than those senders that focus on ranting about filtering systems.
Tilting at windmills doesn’t get the mail through.

Content based filtering

Related Posts

Content, trigger words and subject lines

Return Path on Content Filtering

Filters and windmills

Content based filtering

Share :

Related Posts

Content, trigger words and subject lines

Return Path on Content Filtering

Filters and windmills