Email without filters

… or Find the False Positive.
Anyone sending a lot of email has complained about spam filters and false positives at some point. But most people haven’t run a mailbox with no spam filters in front of it in recent years, so don’t have much of a feel for what an unfiltered mailbox looks like, how important filters are and how difficult their job is.
I run no transaction level filters in front of my mailbox, just content filters that route mail to one of several inboxes or a junk folder, so if I want to I can look at what unfiltered email looks like. I took data from all mail that was sent to me yesterday, and put it in a format that really shows the problem filters face and especially the difficulty of spotting which mail in the junk folder is a false positive.
An inbox with no filters looks like this.

Running a spam filter against it, simply categorizing each email as spam (pink) or not-spam (green) looks like this.
 

Even with the messages categorized as spam vs not-spam it’s hard to work out which messages are important and which aren’t, let alone where the false positives might be.
If I sort the categories by hand you get this – where you can see that out of 1200 or so mails about three quarters were spam. Of the three false positives two were bulk email that I didn’t care that I didn’t receive and only one was email that I considered important.
 
 

Related Posts

Comments on Holomaxx post

I’m putting together a longer analysis of the Holomaxx case that will look at the claims against the various defendants. There’s some deep mis-understanding of how various things works (hint: wiretapping? not so much).
There was one comment from “The Other Barry” about complaints that I think bears highlighting.

Read More

We're gonna party like it's 1996!

Over on deliverability.com Dela Quist has a long blog post up talking about how changes to Hotmail and Gmail’s priority inbox are a class action suit waiting to happen.
All I can say is that it’s all been tried before. Cyberpromotions v. AOL started the ball rolling when they tried to use the First Amendment to force AOL to accept their unsolicited email. The courts said No.
Time goes on and things change. No one argues Sanford wasn’t spamming, he even admitted as much in his court documents. He was attempting to force AOL to accept his unsolicited commercial email for their users. Dela’s arguments center around solicited mail, though.
Do I really think that minor difference in terminology going to change things?
No.
First off “solicited” has a very squishy meaning when looking at any company, particularly large national brands. “We bought a list” and “This person made a purchase from us” are more common than any email marketer wants to admit to. Buying, selling and assuming permission are par for the course in the “legitimate” email marketing world. Just because the marketer tells me that I solicited their email does not actually mean I solicited their email.
Secondly, email marketers don’t get to dictate what recipients do and do not want. Do ISPs occasionally make boneheaded filtering decisions? I’d be a fool to say no. But more often than not when an ISP blocks your mail or filters it into the bulk folder they are doing it because the recipients don’t want that mail and don’t care that it’s in the bulk folder. Sorry, much of the incredibly important marketing mail isn’t actually that important to the recipient.
Dela mentions things like bank statements and bills. Does he really think that recipients are too stupid to add the from address to their address books? Or create specific filters so they can get the mail they want? People do this regularly and if they really want mail they have the tools, provided by the ISP, to make the mail they want get to where they want it.
Finally, there is this little law that protects ISPs. 47 USC 230 states:

Read More

Gmail and the PBL

Yesterday I wrote about the underlying philosophy of spam filtering and how different places have different philosophies that drive their filtering decisions. That post was actually triggered by a blog post I read where the author was asking why Gmail was using the PBL but instead of rejecting mail from PBL listed hosts they instead accepted and bulkfoldered the mail.
The blog post ends with a question:

Read More