Gmail, machine learning, filters
I’m sure by now readers have seen the article from Gmail “Spam does not bring us joy — ridding Gmail of 100 million more spam messages with TensorFlow.” If you haven’t seen it, go read it. It’s not often companies write about their filtering philosophy and what tools they’re using to manage incoming bad mail.
There were a few parts of the article that confirmed some of my theories about Gmail and a few things that were unexpected.
Open source tools
It’s no surprise that Google uses a machine learning engine built in house. What I didn’t know was it was called TensorFlow and was open sourced by Google. Many companies in the email space open source some of their tools. Exacttarget open sourced FuleUX long before they were SFMC and maintain a GitHub account with a number of tools. Mailchimp also maintains an account with their open source code. Steve releases a bunch of tools and code he writes both for work and for fun.
Open source software runs a whole lot more of the internet than many people know. Some of the primary contributors do the work on their own time. But many companies, large and small, understand how vital open source tools are to their business. They hire and support open source developers to maintain and extend the software.
Catching the hard spam
Google catches a lot of spam, and they’re always trying to catch the stuff that falls through the cracks. My recent call volume about going to spam at Gmail told me that Gmail had implemented some new filters. Many people were telling me that things were fine and then, with no change in what they were doing, mail started going to bulk. Other delivery folks were also talking about their customers getting caught up in filters.
We’ve gotten to the point, particularly with Google but also with the other webmail providers, where the bulk of egregious spam is blocked. What’s left is not some spammer sending 10MM messages, but a much more difficult problem. Spam that reaches the inbox is sent in much smaller quantities. It’s also heavily targeted. Spammers are trying to look like legitimate marketers but still sending mail without permission.
This targeted spam is something I’ve been thinking about a lot lately. Mostly because anti-spammers did a pretty good job making not-spamming look like it was beneficial to senders. Many deliverability recommendations boil down to stop spamming but phrased in a way that makes the advice more palatable. Much of the type of spam that’s getting caught in the new filters follows deliverability recommendations. The piece it misses is that it’s not being sent with the permission of the recipient.
Believe it or not, spam filters started out as protecting users from mail they didn’t ask for. As the internet as grown and email has become a channel for crime the focus of filters have changed. But, fundamentally, deep down, the original purpose of keeping mail boxes useful by stopping unsolicited mail is still there. The ML filters are giving Google, and others, tools to actually address that mail better.
The trend is clear. Filters are getting more an more able to address unsolicited email in a complex sender and user environment. Machine learning is driving a lot of that, and Google is at the front of the pack. They’re doing their best to stop the small scale spammers that have avoided a lot of the last generation of filters.