Filtering more than spam

The obvious application of machine learning for email is to send spam to the junk/bulk folder. Most services use some level of machine learning for filters. Places like Gmail have extensive machine learning filters to filter spam and unwanted mail away from their users.
Some organizations are taking the filtering process a step further. Almost every mail client more advanced than PINE has the ability for users to create rules to sort mail into folders.  Late last year, Office 365 rolled out a feature, Clutter that tracks how a user interacts with mail and filters unimportant mail. This allows each user to have their own filters, but without the overhead of having to create the filters.
The Clutter engine looks at both how the user interacts with mail and things it knows about the organization. For example, if Exchange is tied into Active Directory, then mail from a manager will be prioritized while mail from a co-worker may end up in the clutter folder.
Email is a critical business tool. A significant number of companies rely on email for internal and external communication. Many users treat their inbox as a todo list, prioritizing what they work on based on what’s in their mail box. Despite the needs of users, the mail client hasn’t really changed.
Over the last few years, we’ve seen different online services attempt to build a more effective email client. Some of these features were things like tabs and priority inbox at Gmail. Microsoft created the “sweep” feature for Outlook/Hotmail users to manage inbox clutter. Third parties have created services to try and improve the mailbox experience for their users. 
Many of the email filters, up to this point, have really been focused on protecting users from spam and malicious emails. Applying that filtering knowledge to more than just spam, but to the different kinds of emails makes sense to me. I’ve always had a fairly extensive set of filters, initially procmail but now sieve, to process and organize incoming mail. But I kinda like the idea that my mail client learns how I filter messages and do the right thing on its own.
I’d love to see some improvements in the mail client, that make it easier to manage and organize incoming email. It remains to be seen if this is a feature that takes off and makes its way to other clients or not.
 
 

Related Posts

Email filtering: not going away.

VirusBlockI don’t do a whole lot of filtering of comments here. There are a couple people who are moderated, but generally if the comments contribute to a discussion they get to be posted. I do get the occasional angry or incoherent comment. And sometimes I get a comment that is triggers me to write an entire blog post pointing out the problems with the comment.
Today a comment from Joe King showed up for The Myth of the Low Complaint Rate.

Read More

ISPs speak at M3AAWG

Last week at M3AAWG representatives from AOL, Yahoo, Gmail and Outlook spoke about their anti-spam technologies and what the organizations were looking for in email.
This session was question and answers, with the moderator asking the majority of the questions. These answers are paraphrased from my notes or the MAAWG twitter stream from the session.
What are your biggest frustrations?
AOL: When senders complain they can’t get mail in and we go look at their stats and complaints are high. Users just don’t love that mail. If complaints are high look at what you may have done differently, content does have an effect on complaints.
Outlook: When we tightened down filters 8 years ago we had to do it. Half of the mail in our users inbox was spam and we were losing a steady number of customers. The filter changes disrupted a lot of senders and caused a lot of pain. But these days only 0.5% of mail in the inbox is spam.  Things happen so fast, though, that the stress can frustrate the team.
Gmail: Good senders do email badly sometimes and their mail gets bulked. Senders have to get the basic email hygiene practices right. Love your users and they’ll love you back.
What’s your philosophy and approach towards mail?
AOL: There is a balance that needs to be struck between good and bad mail. The postmaster team reminds the blocking team that not all mail is bad or malicious. They are the sender advocates inside AOL. But the blocking team deals with so much bad mail, they sometimes forget that some mail is good.
Yahoo: User experience. The user always comes first. We strive to protect them from malicious mail and provide them with the emails they want to see. Everything else is secondary.
Gmail: The faster we stop spam the less spam that gets sent overall. We have highly adaptive filters that can react extremely quickly to spam. This frustrates the spammers and they will give up.
Outlook: The core customer is the mailbox user and they are a priority. We think we have most of the hardcore spam under control, and now we’re focused on personalizing the inbox for each user. Everyone online should hold partners accountable and they should expect to be held accountable in turn. This isn’t just a sender / ESP thing, ISPs block each other if there are spam problems.
What are some of your most outrageous requests?
We’ve been threatened with lawsuits because senders just don’t want to do the work to fix things. Some senders try to extort us. Other senders go to the advertising execs and get the execs to yell at the filtering team.
Coming to MAAWG and getting cornered to talk about a particular sender problem. Some senders have even offered money just to get mail to the spam folder.
Senders who escalate through the wrong channels. We spent all this money and time creating channels where you can contact us, and then senders don’t use them.
Confusing business interests with product interests. These are separate things and we can’t change the product to match your business interest.
What are your recommendations for changing behaviors?
Outlook: We provide lots of tools to let you see what your recipients are doing. USE THE TOOLS. Pay attention to your recipient interaction with mail. Re-opt-in recipients periodically. Think about that mail that is never opened. Monitor how people interact with your mail. When you have a problem, use our webpages and our forms. Standard delivery problems have a play book. We’re going to follow that playbook and if you try to get personal attention it’s going to slow things down. If there’s a process problem, we are reachable and can handle them personally. But use the postmaster page for most things.
Gmail: Get your hygiene right. If you get your hygiene right, deliverability just works. If you’re seeing blocking, that’s because users are marking your mail as spam. Pay attention to what the major receivers publish on their postmaster pages. Don’t just follow the letter of the law, follow the spirit as well. Our responsibility, as an ISP, is to detect spam and not spam. Good mailers make that harder on us because they do thinks that look like spammers. This doesn’t get spammer mail in more, it gets legitimate mail in less. Use a real opt-in system, don’t just rely on an implied opt-in because someone made a purchase or something.
Yahoo: ESPs are pretty good about screening their customers, so pay attention to what your ESPs are saying. Send mail people want. Verify that the email addresses given to you actually belong to people who want your mail. Have better sender practices.
What do you think about seed accounts?
The panel wasn’t very happy about the use of seed accounts. Seeds are not that useful any longer, as the ISPs move to more and more personalized delivery. Too much time and too many cycles are used debugging seed accounts. The dynamic delivery works all ways.
When things go wrong what should we do?
AOL: Open a ticket. We know we’ve been lax recently, but have worked out of our backlog and are caught up to date. Using the ticketing system also justifies us getting more headcount and makes everyone’s experience better. Also, don’t continue what you’re doing. Pausing sending while you’re troubleshooting the issue. We won’t adjust a rep for you, but we may be able to help you.
Gmail: Do not jump the gun and open a ticket on the first mail to the spam folder. Our filters are so dynamic, they update every few minutes in some cases. Be sure there is a problem. If you are sure you’re following the spirit and letter of the sender guidelines you can submit a ticket. We don’t respond to tickets, but we work every single one. When you’re opening a ticket provide complete information and full headers, and use the headers from your own email address not headers from a seed account. Give us a clear and concise description of the problem. Also, use the gmail product forum, it is monitored by employees and it’s our preferred way of getting information to the anti-abuse team. Common issues lots of senders are having will get addressed faster.
Outlook: Dig in and do your own troubleshooting, don’t rely on us to tell you what to fix. The support teams don’t have a lot of resources so use our public information. If you make our job harder, then it takes longer to get things done. But tell us what changes you’ve made. If you’ve fixed something, and tell us, our process is different than if you’re just asking for a delisting or asking for information. When you’ve fixed things we will respond faster.
How fast should users expect filters to respond after making changes?
Filters update continually so they should start seeing delivery changes almost immediately. What we find is people tell us they’ve made changes, but they haven’t made enough or made the right ones. If the filters don’t update, then you’ve not fixed the problem.

Read More

URL reputation and shorteners

A bit of  a throwback post from Steve a few years ago. The problem has gotten a little better as some shortening companies are actually disabling spammed URLs, and blocking URLs with problematic content. I still don’t recommend using a public URL shortener in email messages, though.
Any time you put a URL in mail you send out, you’re sharing the reputation of everyone who uses URLs with that hostname. So if other people send unwanted email that has the same URL in it that can cause your mail to be blocked or sent to the bulk folder.
That has a bunch of implications. If you run an affiliate programme where your affiliates use your URLs then spam sent by your affiliates can cause your (clean, opt-in, transactional) email to be treated as spam. If you send a newsletter with advertisers URLs in it then bad behaviour by other senders with the same advertisers can cause your email to be spam foldered. And, as we discussed yesterday, if spammers use the same URL shortener you do, that can cause your mail to be marked as spam.
Even if the hostname you use for your URLs is unique to you, if it resolves to the same IP address as a URL that’s being used in spam, that can cause delivery problems for you.
What does this mean when it comes to using URL shorteners (such as bit.ly, tinyurl.com, etc.) in email you send out? That depends on why you’re using those URL shorteners.
The URLs in the text/html parts of my message are big and ugly
Unless the URL you’re using is, itself, part of your brand identity then you really don’t need to make the URL in the HTML part of the message visible at all. Instead of using ‘<a href=”long_ugly_url”> long_ugly_url </a>’ or ‘<a href=”shortened_url”> shortened_url </a>’ use ‘<a href=”long_ugly_url”> friendly phrase </a>’.
(Whatever you do, don’t use ‘<a href=”long_ugly_url”> different_url </a>’, though – that leads to you falling foul of phishing filters).
The URLs in the text/plain parts of my message are big and ugly
The best solution is to fix your web application so that the URLs are smaller and prettier. That will make you seem less dated and clunky both when you send email, and when your users copy and paste links to your site via email or IM or twitter or whatever. “Cool” or “friendly” URLs are great for a lot of reasons, and this is just one. Tim Berners-Lee has some good thoughts on this, and AListApart has two good articles on how to implement them.
If you can’t do that, then using your own, branded URL shortener is the next best thing. Your domain is part of your brand – you don’t want to hide it.
I want to use a catchy URL shortener to enhance my brand
That’s quite a good reason. But if you’re doing that, you’re probably planning to use your own domain for your URL shortener (Google uses goo.gl, Word to the Wise use wttw.me, etc). That will avoid many of the problems with using a generic URL shortener, whether you implement it yourself or use a third party service to run it.
I want to hide the destination URL from recipients and spam filters
Then you’re probably spamming. Stop doing that.
I want to be able to track clicks on the link, using bit.ly’s neat click track reporting
Bit.ly does have pretty slick reporting. But it’s very weak compared to even the most basic clickthrough reporting an ESP offers. An ESP can tell you not just how many clicks you got on a link, but also which recipients clicked and how many clicks there were for all the links in a particular email or email campaign, and how that correlates with “opens” (however you define that).
So bit.ly’s tracking is great if you’re doing ad-hoc posts to twitter, but if you’re sending bulk email you (or your ESP) can do so much better.
I want people to have a short URL to share on twitter
Almost all twitter clients will abbreviate a URL using some URL shortener automatically if it’s long. Unless you’re planning on using your own branded URL shortener, using someone else’s will just hide your brand. It’s all probably going to get rewritten as t.co/UgLy in the tweet itself anyway.
If your ESP offers their own URL shortener, integrating into their reporting system for URLs in email or on twitter that’s great – they’ll be policing users of that just the same as users of their email service, so you’re unlikely to be sharing it with bad spammers for long enough to matter.
All the cool kids are using bit.ly, so I need to to look cool
This one I can’t help with. You’ll need to decide whether bit.ly links really look cool to your recipient demographic (Spoiler: probably not) and, if so, whether it’s worth the delivery problems they risk causing.
And, remember, your domain is part of your brand. If you’re hiding your domain, you’re hiding your branding.
So… I really do need a URL shortener. Now what?
It’s cheap and easy to register a domain for just your own use as a URL shortener. Simply by having your own domain, you avoid most of the problems. You can run a URL shortener yourself – there are a bunch of freely available packages to do it, or it’s only a few hours work for a developer to create from scratch.
Or you can use a third-party provider to run it for you. (Using a third-party provider does mean that you’re sharing the same IP address as other URL shorteners – but everyone you’re sharing with are probably people like you, running a private URL shortener, so the risk is much, much smaller than using a freely available public URL shortener service.)
These are fairly simple fixes for a problem that’s here today, and is going to get worse in the future.

Read More