Filter complexity

URLBlockingForBlogDuring the Q&A last week, I mentioned an example of a type of filter trying to demonstrate how complex the filters are. There was some confusion about what I was saying, so I thought I’d write a blog post explaining this.

Background

This story came from another deliverability person, let’s call her ESPer. One of their customers (Customers) is using a 3rd party service that provides tracking links (Tracker). Tracker sent email to their customers saying that mails with more than 3 links were getting blocked.

It has come to our attention that Google has recently started flagging emails with multiple tracked links as suspicious or malicious. For example, if you have an email with more than 3 links (including any in your signature) and have Tracker link tracking turned on, recipients who use Gmail may see your message flagged with a warning. If your email contains 3 or fewer tracked links then you will be unaffected by this issue.

This triggered some Customers to call the ESP and start asking if Google was blocking mail with 3 or more links.

The Investigation

Multiple ESP folks checked their systems and found no correlation between multiple links in an email and bulk foldering at Gmail. I checked my Gmail account and a number of emails in my inbox have 4 or 5 or 6 links in them. None with the Tracker tracking cookie, though.
In an effort to test this a little more, I tried to sign up for a free account with the Tracker to do a little more checking. Tracker is used through an add on for use in Firefox, but it’s unsigned so I decided not to install it. It’s probably not malware, but if they can’t be bothered to sign their Add-on, I’m not going to risk installing it on my machine, even for my readers.

What we know

  1. Gmail is blocking mail with 3 or more links with one that is a Tracker link.
  2. Remove the Tracker link then mail goes to the inbox.
  3. Send with less than 3 links and a Tracker link then mail goes to the inbox.

What we speculate

One of the customer of Tracker is sending spam with 3 or more links plus the tracking links. Google has identified this mail as a problem and is blocking mail that has the same characteristics.
Removing the Tracker link should get the mail into the inbox.
Removing links so there are less than 3 links should get the mail to the inbox.

What this tells us

Filtering is complex. Like Really Really Complex. It’s not the presence of the tracking URL, it’s the presence of the tracking URL and 3 other URLs. Generally when we here at Word to the Wise try and test “what’s wrong” we’ll start removing URLs to see if one particular URL is causing a problem. In this case, that testing would have led us to an erroneous conclusion. We might find one URL “responsible” but only because we’d lowered the total number of URLs under 3.
I’ve been telling people and clients that filters are complex. More than 3 URLs + a specific URL is something that people wouldn’t normally identify as a filter criteria. But the neural net / machine learning / AI filters in use at Gmail noticed that mail with a particular number of links plus the Tracker link aren’t wanted by the recipients. The filters then started blocking mail selectively based on those criteria.
Filters aren’t magic, but sometimes the complexity makes them seem like it.
 
 
 

Related Posts

Do system administrators have too much power?

Yesterday, Laura brought a thread from last week to my attention, and the old-school ISP admin and mail geek in me felt the need to jump up and say something in response to Paul’s comment. My text here is all my own, and is based upon personal experience as well as those of my friends. That said, I’m not speaking on their behalf, either. 🙂
I found Paul’s use of the word ‘SysAdmin’ to be a mighty wide (and — in my experience — probably incorrect) brush to be painting with, particularly when referring to operations at ISPs with any significant number of mailboxes. My fundamental opposition to use of the term comes down to this: It’s no longer 1998.
The sort of rogue (or perhaps ‘maverick’) behavior to which you refer absolutely used to be a thing, back when a clean 56k dial-up connection was the stuff of dreams and any ISP that had gone through the trouble to figure out how to get past the 64k user limit in the UNIX password file was considered both large and technically competent. Outside of a few edge cases, I don’t know many system administrators these days who are able to (whether by policy or by access controls) — much less want to — make such unilateral deliverability decisions.
While specialization may be for insects, it’s also inevitable whenever a system grows past a certain point. When I started in the field, there were entire ISPs that were one-man shows (at least on the technical side). This simply doesn’t scale. Eventually, you start breaking things up into departments, then into services, then teams assigned to services, then parts of services assigned to teams, and back up the other side of the mountain, until you end up with a whole department whose job it is to run one component of one service.
For instance, let’s take inbound (just inbound) email. It’s not uncommon for a large ISP to have several technical teams responsible for the processing of mail being sent to their users:

Read More

Gmail and the PBL

Yesterday I wrote about the underlying philosophy of spam filtering and how different places have different philosophies that drive their filtering decisions. That post was actually triggered by a blog post I read where the author was asking why Gmail was using the PBL but instead of rejecting mail from PBL listed hosts they instead accepted and bulkfoldered the mail.
The blog post ends with a question:

Read More

July 2015: The Month in Email

Once again, we reviewed some of the ways brands are trying (or might try) to improve engagement with customers. LinkedIn, who frequently top lists of unwanted-but-legitimate email, announced that they’ll be sending less mail. Josh wrote about giving subscribers options for both the type and frequency of messages, and about setting expectations for new subscribers. In each case, it’s about respecting that customers really want to engage with brands in the email channel, but don’t want the permission they’ve granted to be abused. I also wrote a brief post following up on our June discussion on purchased lists, and as you’d predict, I continue to discourage companies from mailing to these recipients.

Read More