Pattern matching primates

Why do we see faces where there are none? Paradolia
Why do we look at random noise and see patterns? Patternicity
Why do we think we have discovered what’s causing filtering if we change one thing and email gets through?
It’s all because we’re pattern matching primates, or as Michael Shermer puts it “people believe weird things because of our evolved need to believe nonweird things.”
Our brains are amazing and complex and filter a lot of information so we don’t have to think of it. Our brains also fill in a lot of holes. We’re primed at seeing patterns, even when there’s no real pattern. Our brains can, and do, lie to us all the time. For me, some of the important part of my Ph.D. work was learning to NOT trust what I thought I saw, and rather to effectively observe and test. Testing means setting up experiments in different ways to make it easier to not draw false conclusions.
Humans are also prone to confirmation bias: where we assign more weight to things that agree with our preconceived notions.
Take the email marketer who makes a number of changes to a campaign. They change some of the recipient targeting, they add in a couple URLs, they restructure the mail to change the text to image ratio and they add the word free to the subject line. The mail gets filtered to the bulk folder and they immediately jump to the word free as the proximate cause of the filtering. They changed a lot of things but they focus on the word free. 
Then they remove the word free from the subject line and all of a sudden the emails are delivering. Clearly the filter in question is blocking mail with free in the subject line.
Well, no. Not really. Filters are bigger and more complex than any of us can really understand. I remember a couple years ago, when a few of my close friends were working at AOL on their filter team. A couple times they related stories where the filters were doing things that not even the developers really understood.
That was a good 5 or 6 years ago, and filters have only gotten more complex and more autonomous. Google uses an artificial neural network as their spam filter.  I don’t really believe that anything this complex just looks at free in the subject line and filters based on that.
It may be that one thing used to be responsible for filtering, but those days are long gone. Modern email filters evaluate dozens or hundreds of factors. There’s rarely one thing that causes mail to go to the bulk folder. So many variables are evaluated by filters that there’s really no way to pinpoint the EXACT thing that caused a filter to trigger. In fact, it’s usually not one thing. It could be any number of things all adding up to mean this may not be mail that should go to the inbox.
There are, of course, some filters that are one factor. Filters that listen to p=reject requests can and do discard mail that fails authentication. Virus filters will often discard mail if they detect a virus in the mail. Filters that use blocklists will discard mail simply due to a listing on the blocklist.
Those filters address the easy mail. They leave the hard decisions to the more complex filters. Most of those filters are a lot more accurate than we are at matching patterns. Us pattern matching primates want to see patterns and so we find them.
 

Related Posts

Email problems are costly

Last week Zulily released their quarterly earnings. Their earnings’ report was disappointing, resulting in a drop in their stock prices. The chairman of the company told reporters on a conference call that part of the reason for the drop in earnings were due to deliverability problems “at a large ISP.”

Read More

AHBL Wildcards the Internet

AHBL (Abusive Host Blocking List) is a DNSBL (Domain Name Service Blacklist) that has been available since 2003 and is used by administrators to crowd-source spam sources, open proxies, and open relays.  By collecting the data into a single list, an email system can check this blacklist to determine if a message should be accepted or rejected. AHBL is managed by The Summit Open Source Development Group and they have decided after 11 years they no longer wish to maintain the blacklist.
A DNSBL works like this, a mail server checks the sender’s IP address of every inbound email against a blacklist and the blacklist responses with either, yes that IP address is on the blacklist or no I did not find that IP address on the list.  If an IP address is found on the list, the email administrator, based on the policies setup on their server, can take a number of actions such as rejecting the message, quarantining the message, or increasing the spam score of the email.
The administrators of AHBL have chosen to list the world as their shutdown strategy. The DNSBL now answers ‘yes’ to every query. The theory behind this strategy is that users of the list will discover that their mail is all being blocked and stop querying the list causing this. In principle, this should work. But in practice it really does not because many people querying lists are not doing it as part of a pass/fail delivery system. Many lists are queried as part of a scoring system.
Maintaining a DNSBL is a lot of work and after years of providing a valuable service, you are thanked with the difficulties with decommissioning the list.  Popular DNSBLs like the AHBL list are used by thousands of administrators and it is a tough task to get them to all stop using the list.  RFC6471 has a number of recommendations such as increasing the delay in how long it takes to respond to a query but this does not stop people from using the list.  You could change the page responding to the site to advise people the list is no longer valid, but unlike when you surf the web and come across a 404 page, a computer does not mind checking the same 404 page over and over.
Many mailservers, particularly those only serving a small number of users, are running spam filters in fire-and-forget mode, unmaintained, unmonitored, and seldom upgraded until the hardware they are running on dies and is replaced. Unless they do proper liveness detection on the blacklists they are using (and they basically never do) they will keep querying a list forever, unless it breaks something so spectacularly that the admin notices it.
So spread the word,

Read More

When spam filters fail

Spam filters aren’t perfect. They sometimes catch mail they shouldn’t, although it happens less than some people think. They sometimes fail to catch mail they should.
One of the reason filters fail to catch mail they should is because some spammers invest a lot of time and energy in figuring out how to get past the filters. This is nothing new, 8 or 9 years ago I was in negotiations with a potential client. They told me they had people who started working at 5pm eastern. Their entire job was to craft mail that would get through Hotmail’s filters that day. As soon as they found a particular message that made it to the inbox, they’d blast to their list until the filters caught up. When the filters caught up, they’d start testing again. This went on all night or until the full list was sent.
Since then I’ve heard of a lot of other filter bypass techniques. Some spammers set up thousands of probe accounts at ISPs and would go through and “not spam” their mail to fool the filters (ISPs adapted). Some spammers set up thousands of IPs and rotate through them (ISPs adapted). Some spammers register new domains for every send (ISPs adapted). Some spammers used botnets (ISPs adapted)
I’m sure, even now, there are spammers who are creating new techniques to get through filters. And the ISPs will adapt.

Read More