Filter complexity

URLBlockingForBlogDuring the Q&A last week, I mentioned an example of a type of filter trying to demonstrate how complex the filters are. There was some confusion about what I was saying, so I thought I’d write a blog post explaining this.

Background

This story came from another deliverability person, let’s call her ESPer. One of their customers (Customers) is using a 3rd party service that provides tracking links (Tracker). Tracker sent email to their customers saying that mails with more than 3 links were getting blocked.

It has come to our attention that Google has recently started flagging emails with multiple tracked links as suspicious or malicious. For example, if you have an email with more than 3 links (including any in your signature) and have Tracker link tracking turned on, recipients who use Gmail may see your message flagged with a warning. If your email contains 3 or fewer tracked links then you will be unaffected by this issue.

This triggered some Customers to call the ESP and start asking if Google was blocking mail with 3 or more links.

The Investigation

Multiple ESP folks checked their systems and found no correlation between multiple links in an email and bulk foldering at Gmail. I checked my Gmail account and a number of emails in my inbox have 4 or 5 or 6 links in them. None with the Tracker tracking cookie, though.
In an effort to test this a little more, I tried to sign up for a free account with the Tracker to do a little more checking. Tracker is used through an add on for use in Firefox, but it’s unsigned so I decided not to install it. It’s probably not malware, but if they can’t be bothered to sign their Add-on, I’m not going to risk installing it on my machine, even for my readers.

What we know

  1. Gmail is blocking mail with 3 or more links with one that is a Tracker link.
  2. Remove the Tracker link then mail goes to the inbox.
  3. Send with less than 3 links and a Tracker link then mail goes to the inbox.

What we speculate

One of the customer of Tracker is sending spam with 3 or more links plus the tracking links. Google has identified this mail as a problem and is blocking mail that has the same characteristics.
Removing the Tracker link should get the mail into the inbox.
Removing links so there are less than 3 links should get the mail to the inbox.

What this tells us

Filtering is complex. Like Really Really Complex. It’s not the presence of the tracking URL, it’s the presence of the tracking URL and 3 other URLs. Generally when we here at Word to the Wise try and test “what’s wrong” we’ll start removing URLs to see if one particular URL is causing a problem. In this case, that testing would have led us to an erroneous conclusion. We might find one URL “responsible” but only because we’d lowered the total number of URLs under 3.
I’ve been telling people and clients that filters are complex. More than 3 URLs + a specific URL is something that people wouldn’t normally identify as a filter criteria. But the neural net / machine learning / AI filters in use at Gmail noticed that mail with a particular number of links plus the Tracker link aren’t wanted by the recipients. The filters then started blocking mail selectively based on those criteria.
Filters aren’t magic, but sometimes the complexity makes them seem like it.
 
 
 

Related Posts

July 2015: The Month in Email

Once again, we reviewed some of the ways brands are trying (or might try) to improve engagement with customers. LinkedIn, who frequently top lists of unwanted-but-legitimate email, announced that they’ll be sending less mail. Josh wrote about giving subscribers options for both the type and frequency of messages, and about setting expectations for new subscribers. In each case, it’s about respecting that customers really want to engage with brands in the email channel, but don’t want the permission they’ve granted to be abused. I also wrote a brief post following up on our June discussion on purchased lists, and as you’d predict, I continue to discourage companies from mailing to these recipients.

Read More

Why do ISPs do that?

One of the most common things I hear is “but why does the ISP do it that way?” The generic answer for that question is: because it works for them and meets their needs. Anyone designing a mail system has to implement some sort of spam filtering and will have to accept the potential for lost mail. Even the those recipients who runs no software filtering may lose mail. Their spamfilter is the delete key and sometimes they’ll delete a real mail.
Every mailserver admin, whether managing a MTA for a corporation, an ISP or themselves inevitably looks at the question of false positives and false negatives. Some are more sensitive to false negatives and would rather block real mail than have to wade through a mailbox full of spam. Others are more sensitive to false positives and would rather deal with unfiltered spam than risk losing mail.
At the ISPs, many of these decisions aren’t made by one person, but the decisions are driven by the business philosophy, requirements and technology. The different consumer ISPs have different philosophies and these show in their spamfiltering.
Gmail, for instance, has a lot of faith in their ability to sort, classify and rank text. This is, after all, what Google does. Therefore, they accept most of the email delivered to Gmail users and then sort after the fact. This fits their technology, their available resources and their business philosophy. They leave as much filtering at the enduser level as they can.
Yahoo, on the other hand, chooses to filter mail at the MTA. While their spamfoldering algorithms are good, they don’t want to waste CPU and filtering effort on mail that they think may be spam. So, they choose to block heavily at the edge, going so far as to rate limit senders that they don’t know about the mail. Endusers are protected from malicious mail and senders have the ability to retry mail until it is accepted.
The same types of entries could be written about Hotmail or AOL. They could even be written about the various spam filter vendors and blocklists. Every company has their own way of doing things and their way reflects their underlying business philosophy.

Read More

SPF debugging

Someone mentioned on a mailing list that mail “from” intuit.com was being filed in the gmail spam folder, with the warning “Our systems couldn’t verify that this message was really sent by intuit.com“. That warning means that Gmail thinks it may be phishing mail. Given they’re a well-known financial services organization, I’m sure there is a lot of phishing mail claiming to be from them.
But I’d expect that a company the size of Intuit would be authenticating their mail, and that Gmail should be able to use that authentication to know that the mail wasn’t a phish.
Clearly something is broken somewhere. Lets take a look.
Looking at the headers, the mail was being sent from Salesforce, and (despite Salesforce offering DKIM) it wasn’t DKIM signed by anyone. So … look at SPF.
SPF passes:

Read More