Filtering more than spam

The obvious application of machine learning for email is to send spam to the junk/bulk folder. Most services use some level of machine learning for filters. Places like Gmail have extensive machine learning filters to filter spam and unwanted mail away from their users.
Some organizations are taking the filtering process a step further. Almost every mail client more advanced than PINE has the ability for users to create rules to sort mail into folders.  Late last year, Office 365 rolled out a feature, Clutter that tracks how a user interacts with mail and filters unimportant mail. This allows each user to have their own filters, but without the overhead of having to create the filters.
The Clutter engine looks at both how the user interacts with mail and things it knows about the organization. For example, if Exchange is tied into Active Directory, then mail from a manager will be prioritized while mail from a co-worker may end up in the clutter folder.
Email is a critical business tool. A significant number of companies rely on email for internal and external communication. Many users treat their inbox as a todo list, prioritizing what they work on based on what’s in their mail box. Despite the needs of users, the mail client hasn’t really changed.
Over the last few years, we’ve seen different online services attempt to build a more effective email client. Some of these features were things like tabs and priority inbox at Gmail. Microsoft created the “sweep” feature for Outlook/Hotmail users to manage inbox clutter. Third parties have created services to try and improve the mailbox experience for their users. 
Many of the email filters, up to this point, have really been focused on protecting users from spam and malicious emails. Applying that filtering knowledge to more than just spam, but to the different kinds of emails makes sense to me. I’ve always had a fairly extensive set of filters, initially procmail but now sieve, to process and organize incoming mail. But I kinda like the idea that my mail client learns how I filter messages and do the right thing on its own.
I’d love to see some improvements in the mail client, that make it easier to manage and organize incoming email. It remains to be seen if this is a feature that takes off and makes its way to other clients or not.
 
 

Related Posts

Thoughts on Hotmail filtering

One of the new bits of information to come out of the EEC15 deliverability discussions is how Hotmail is looking at engagement differently than other webmail providers.
Many webmail providers really do look at overall engagement with a mail when making delivery decisions. And this really impacts new subscribers the most. If there is a mailing where a lot of subscribers are engaged, then new subscribers will see the mail in their inbox. Based on what was said at the webinar earlier this week engagement has no effect at Hotmail outside of the individual user’s box.
I’ve certainly seen this with clients who’ve tried trimming subscriber lists but that doesn’t really help get mail moved from the Hotmail bulk folder to the inbox.
 
Instead of subscriber lists, Hotmail is really looking at bounces. They’re watching the number of nonexistent accounts senders are mailing to and they’re counting and a sender hits too many bad addresses and that is a major hit to their reputation.
All of this makes remediation at Hotmail challenging. Right now, we can remediate a bad reputation at a lot of ISPs and the filters catch up and mail starts flowing back to the inbox. Hotmail has set up a system that they say is “hard for spammers to game.” This seems to translate into hard for legitimate senders to fix their reputation.
Hotmail is, IMO, the current tough nut in terms of deliverability. Develop a bad reputation there and it’s difficult to fix it. I’m sure it’s possible, though.

Read More

Deliverability and IP addresses

Almost 2 years ago I wrote a blog post titled The Death of IP Based Reputation. These days I’m even more sure that IP based reputation is well and truly dead for legitimate senders.
There are a lot of reasons for this continued change. Deliverability is hard when some people like the same email other people think is spam

Read More

AHBL Wildcards the Internet

AHBL (Abusive Host Blocking List) is a DNSBL (Domain Name Service Blacklist) that has been available since 2003 and is used by administrators to crowd-source spam sources, open proxies, and open relays.  By collecting the data into a single list, an email system can check this blacklist to determine if a message should be accepted or rejected. AHBL is managed by The Summit Open Source Development Group and they have decided after 11 years they no longer wish to maintain the blacklist.
A DNSBL works like this, a mail server checks the sender’s IP address of every inbound email against a blacklist and the blacklist responses with either, yes that IP address is on the blacklist or no I did not find that IP address on the list.  If an IP address is found on the list, the email administrator, based on the policies setup on their server, can take a number of actions such as rejecting the message, quarantining the message, or increasing the spam score of the email.
The administrators of AHBL have chosen to list the world as their shutdown strategy. The DNSBL now answers ‘yes’ to every query. The theory behind this strategy is that users of the list will discover that their mail is all being blocked and stop querying the list causing this. In principle, this should work. But in practice it really does not because many people querying lists are not doing it as part of a pass/fail delivery system. Many lists are queried as part of a scoring system.
Maintaining a DNSBL is a lot of work and after years of providing a valuable service, you are thanked with the difficulties with decommissioning the list.  Popular DNSBLs like the AHBL list are used by thousands of administrators and it is a tough task to get them to all stop using the list.  RFC6471 has a number of recommendations such as increasing the delay in how long it takes to respond to a query but this does not stop people from using the list.  You could change the page responding to the site to advise people the list is no longer valid, but unlike when you surf the web and come across a 404 page, a computer does not mind checking the same 404 page over and over.
Many mailservers, particularly those only serving a small number of users, are running spam filters in fire-and-forget mode, unmaintained, unmonitored, and seldom upgraded until the hardware they are running on dies and is replaced. Unless they do proper liveness detection on the blacklists they are using (and they basically never do) they will keep querying a list forever, unless it breaks something so spectacularly that the admin notices it.
So spread the word,

Read More