Email without filters

… or Find the False Positive.
Anyone sending a lot of email has complained about spam filters and false positives at some point. But most people haven’t run a mailbox with no spam filters in front of it in recent years, so don’t have much of a feel for what an unfiltered mailbox looks like, how important filters are and how difficult their job is.
I run no transaction level filters in front of my mailbox, just content filters that route mail to one of several inboxes or a junk folder, so if I want to I can look at what unfiltered email looks like. I took data from all mail that was sent to me yesterday, and put it in a format that really shows the problem filters face and especially the difficulty of spotting which mail in the junk folder is a false positive.
An inbox with no filters looks like this.

Running a spam filter against it, simply categorizing each email as spam (pink) or not-spam (green) looks like this.
 

Even with the messages categorized as spam vs not-spam it’s hard to work out which messages are important and which aren’t, let alone where the false positives might be.
If I sort the categories by hand you get this – where you can see that out of 1200 or so mails about three quarters were spam. Of the three false positives two were bulk email that I didn’t care that I didn’t receive and only one was email that I considered important.
 
 

Related Posts

Why do ISPs do that?

One of the most common things I hear is “but why does the ISP do it that way?” The generic answer for that question is: because it works for them and meets their needs. Anyone designing a mail system has to implement some sort of spam filtering and will have to accept the potential for lost mail. Even the those recipients who runs no software filtering may lose mail. Their spamfilter is the delete key and sometimes they’ll delete a real mail.
Every mailserver admin, whether managing a MTA for a corporation, an ISP or themselves inevitably looks at the question of false positives and false negatives. Some are more sensitive to false negatives and would rather block real mail than have to wade through a mailbox full of spam. Others are more sensitive to false positives and would rather deal with unfiltered spam than risk losing mail.
At the ISPs, many of these decisions aren’t made by one person, but the decisions are driven by the business philosophy, requirements and technology. The different consumer ISPs have different philosophies and these show in their spamfiltering.
Gmail, for instance, has a lot of faith in their ability to sort, classify and rank text. This is, after all, what Google does. Therefore, they accept most of the email delivered to Gmail users and then sort after the fact. This fits their technology, their available resources and their business philosophy. They leave as much filtering at the enduser level as they can.
Yahoo, on the other hand, chooses to filter mail at the MTA. While their spamfoldering algorithms are good, they don’t want to waste CPU and filtering effort on mail that they think may be spam. So, they choose to block heavily at the edge, going so far as to rate limit senders that they don’t know about the mail. Endusers are protected from malicious mail and senders have the ability to retry mail until it is accepted.
The same types of entries could be written about Hotmail or AOL. They could even be written about the various spam filter vendors and blocklists. Every company has their own way of doing things and their way reflects their underlying business philosophy.

Read More

Spamtraps

There is a lot of mythology surrounding spamtraps, what they are, what they mean, how they’re used and how they get on lists.
Spamtraps are very simply unused addresses that receive spam. They come from a number of places, but the most common spamtraps can be classified in a few ways.

Read More

Comments on Holomaxx post

I’m putting together a longer analysis of the Holomaxx case that will look at the claims against the various defendants. There’s some deep mis-understanding of how various things works (hint: wiretapping? not so much).
There was one comment from “The Other Barry” about complaints that I think bears highlighting.

Read More