Gmail, machine learning, filters

I’m sure by now readers have seen the article from Gmail “Spam does not bring us joy — ridding Gmail of 100 million more spam messages with TensorFlow.” If you haven’t seen it, go read it. It’s not often companies write about their filtering philosophy and what tools they’re using to manage incoming bad mail.

There were a few parts of the article that confirmed some of my theories about Gmail and a few things that were unexpected.

Open source tools

It’s no surprise that Google uses a machine learning engine built in house. What I didn’t know was it was called TensorFlow and was open sourced by Google. Many companies in the email space open source some of their tools. Exacttarget open sourced FuleUX long before they were SFMC and maintain a GitHub account with a number of tools. Mailchimp also maintains an account with their open source code. Steve releases a bunch of tools and code he writes both for work and for fun.

Open source software runs a whole lot more of the internet than many people know. Some of the primary contributors do the work on their own time. But many companies, large and small, understand how vital open source tools are to their business. They hire and support open source developers to maintain and extend the software.

Catching the hard spam

Google catches a lot of spam, and they’re always trying to catch the stuff that falls through the cracks. My recent call volume about going to spam at Gmail told me that Gmail had implemented some new filters. Many people were telling me that things were fine and then, with no change in what they were doing, mail started going to bulk. Other delivery folks were also talking about their customers getting caught up in filters.

We’ve gotten to the point, particularly with Google but also with the other webmail providers, where the bulk of egregious spam is blocked. What’s left is not some spammer sending 10MM messages, but a much more difficult problem. Spam that reaches the inbox is sent in much smaller quantities. It’s also heavily targeted. Spammers are trying to look like legitimate marketers but still sending mail without permission.

This targeted spam is something I’ve been thinking about a lot lately. Mostly because anti-spammers did a pretty good job making not-spamming look like it was beneficial to senders. Many deliverability recommendations boil down to stop spamming but phrased in a way that makes the advice more palatable. Much of the type of spam that’s getting caught in the new filters follows deliverability recommendations. The piece it misses is that it’s not being sent with the permission of the recipient.

Believe it or not, spam filters started out as protecting users from mail they didn’t ask for. As the internet as grown and email has become a channel for crime the focus of filters have changed. But, fundamentally, deep down, the original purpose of keeping mail boxes useful by stopping unsolicited mail is still there. The ML filters are giving Google, and others, tools to actually address that mail better.

The trend is clear. Filters are getting more an more able to address unsolicited email in a complex sender and user environment. Machine learning is driving a lot of that, and Google is at the front of the pack. They’re doing their best to stop the small scale spammers that have avoided a lot of the last generation of filters.


Related Posts

Parasites hurt email marketing

As a small business owner I am a ripe target for many companies. They buy my address from some lead generation firm, or they scrape it off LinkedIn, and they send me a message that pretends to be personalized but isn’t really.
“I looked at your website… we have a list of email addresses to sell you.”
“We offer cold calling services… can I set up a call with you?”
“I have scheduled a meeting tomorrow so I can tell you about our product that will solve all your technical issues and is also a floor wax.”
None of these emails are anything more than spam. They’re fake personalized. There’s no permission. On a good day they’ll have an opt out link. On a normal day they might include an actual name.
These are messages coming to an email address I’ve spent years trying to protect from getting onto mailing lists. I don’t do fishbowls, I’m careful about who I give my card to, I never use it to sign up for anything. And, still, that has all been for naught.
I don’t really blame the senders, I mean I do, they’re the ones that bought my address and then invested in business automation software that sends me regular emails trying to get me to give them a phone number. Or a contact for “the right person at your business to talk to about this great offer that will change your business.”
The real blame lies with the people who pretend that B2B spam is somehow not spam. Who have pivoted their businesses from selling consumer lists to business lists because permission doesn’t matter when it comes to businesses. The real blame lies with companies who sell “marketing automation software” that plugs into their Google Apps account and hijacks their reputation to get to the inbox. The real blame lies with list cleansing companies who sell list buyers a cleansing service that only hides the evidence of spamming.
There are so many parasites in the email space. They take time, energy and resources from large and small businesses, offering them services that seem good, but really are worthless.
The biologically interesting thing about parasites, though, is that they do better if they don’t overwhelm the host system. They have to stay small. They have to stay hidden. They have to not cause too much harm, otherwise the host system will fight back.
Email fights back too. Parasites will find it harder and harder to get mail delivered in any volume as the host system adapts to them. Already if I look in my junk folder, my filters are correctly flagging these messages as spam. And my filters see a very small portion of mail. Filtering companies and the business email hosting systems have a much broader view and much better defenses.
These emails annoy me, but I know that they are a short term problem.  As more and more businesses move to hosted services, like Google Apps and Office365 the permission rules are going to apply to business addresses as well as consumer addresses. The parasites selling products and services to small business owners can’t overwhelm email. The defenses will step in first.
 

Read More

Want some history?

I was doing some research today for an article I’m working on. The research led me to a San Francisco Law Review article from 2001 written by David E. Sorkin. Technical and Legal Approaches to Unsolicited Electronic Mail (.pdf link). The text itself is a little outdated, although not as much as I expected. There’s quite a good discussion of various ways to control spam, most of which are still true and even relevant.

From a historical perspective, the footnotes are the real meat of the document. Professor Sorkin discusses many different cases that together establish the rights of ISPs to filter mail, some of which I wasn’t aware of. He also includes links to then-current news articles about filtering and spam. He also mentions different websites and articles written by colleagues and friends from ‘back in the day’ discussing spam on a more theoretical level.
CNET articles on spam and filtering was heavily referenced by Professor Sorkin. One describes the first Yahoo spam folder. Some things never change, such as Yahoo representatives refusing to discuss how their system works. There were other articles discussing Hotmail deploying the MAPS RBL (now a part of Trend Micro) and then adding additional filters into the mix a few weeks later.
We were all a little naive back then. We thought the volumes of email and spam were out of control. One article investigated the effectiveness of filters at Yahoo and Hotmail, and quoted a user who said the filters were working well.

Read More

Filtering is not just about spam

A lot of filters started out just as filters against spam. But over the years they’ve morphed into more general blocks against dangerous or problematic email. There’s a lot of crime and bad behavior on the internet, much of it using email as a conduit or vector. Filtering is so much more than stopping spam now. It’s as much, or more, about stopping crime.
Email filters are essential to protect us from scammers. Sometimes I forget this, and then I read about a grandmother getting swindled by a Nigerian scammer and ending up dead.
There are real consequences to poor filtering and there is real crime facilitated by email. It’s easy to forget this as we deal with the email that gets caught in filters when they shouldn’t.
Filters are one of the first lines of defense against online crime.
Not only does filtering stop crime, but they also keep email working. An unfiltered mail stream is an ugly, unreadable, unworkable mess.

Read More