Spamassassin
Filtering by gestalt
One of those $5.00 words I learned in the lab was gestalt. We were studying fetal alcohol syndrome (FAS) and, at the time, there were no consistent measurements or numbers that would drive a diagnosis of FAS. Diagnosis was by gestalt – that is by the patient looking like someone who had FAS.
It’s a funny word to say, it’s a funny word to hear. But it’s a useful term to describe the future of spam filtering. And I think we need to get used to thinking about filtering acting on more than just the individual parts of an email.
Filtering is not just IP reputation or domain reputation. It’s about the whole message. It’s mail from this IP with this authentication containing these URLs. Earlier this year, I wrote an article about Gmail filtering. The quote demonstrates the sum of the parts, but I didn’t really call it out at the time.
Content, trigger words and subject lines
There’s been quite a bit of traffic on twitter this afternoon about a recent blog post by Hubspot identifying trigger words senders should avoid in an email subject line. A number of email experts are assuring the world that content doesn’t matter and are arguing on twitter and in the post comments that no one will block an email because those words are in the subject line.
As usually, I think everyone else is a little bit right and a little bit wrong.
The words and phrases posted by Hubspot are pulled out of the Spamassassin rule set. Using those words or exact phrases will cause a spam score to go up, sometimes by a little (0.5 points) and sometimes by a lot (3+ points). Most spamassassin installations consider anything with more than 5 points to be spam so a 3 point score for a subject line may cause mail to be filtered.
The folks who are outraged at the blog post, though, don’t seem to have read the article very closely. Hubspot doesn’t actually say that using trigger words will get mail blocked. What they say is a lot more reasonable than that.
The cult of SPF lives
Years ago, prior to the public discussions of Domain Keys, there was SPF as the solution to all our email authentication problems. SPF was going to let people do all sorts of things with email. The proponents even privately asserted that it would solve the spam problem. In essence, SPF was a cult. BoF sessions at meetings had the flavor of a big tent style revival. Those of us who didn’t support SPF were shunned and belittled. How could we not support such a brilliant protocol? Did we want spam to continue being a problem? All our objections no matter how rooted in reality were dismissed out of hand. SPF was an evangelical, cult-like movement.
I am somewhat sad to announce that the cult of SPF still lives. The most recent example is the number of people that have taken me to task for a recent post I wrote pointing out that SPF records aren’t actually that important for email delivery. My example was that a client of mine had incorrect SPF records (with a -all even) but was still getting inbox delivery at Hotmail. We repaired the records, re-registered them with Hotmail and Hotmail not only isn’t checking them but also sent mail to me admitting they don’t check SPF for incoming email.
My statement was that SPF wasn’t really important to getting email delivered. This seems to have upset a number of people. Someone on twitter pointed out that a valid SPF record gave you a positive score with SpamAssassin. What they didn’t mention was that a valid SPF record gives you an entire -0.001 with SpamAssassin.
Today I get a comment from Tom (which seems more like an ad for his company than an actual comment) that says
SpamAssassin Problems
The default SpamAssassin configuration considers any date far in the future to be extremely suspicious, which is pretty reasonable.
However, as @schampeo points out, it also seems to consider any date later than 2009 to be “far in the future”.
That means that until the SpamAssassin folks roll out a fix, and that gets deployed by SpamAssassin users pretty much all email will get an additional 2-3.5 spamminess points. That’s likely to cause a lot of content-based blocking over the next few weeks, until fixed rules are deployed both by SpamAssassin users and by all the various spam filtering appliances that use SpamAssassin rulesets.
(If you’re a SpamAssassin user, add “score FH_DATE_PAST_20XX 0.0” to your local.cf file to disable that rule).
EDIT: Mike has some more background on the bug.
EDIT: Fix it out on the spamassassin homepage.