Mentally modelling filters

When we talk about filters, we often think there is one filter. But, in many cases there are multiple stages of filters, each examining mail in a different way.

Simple model of an email filter that takes mail and puts it in the inbox or spam folder

In deliverability terms the easiest filters to ignore are the individual user filters. Mostly because there’s nothing we can do about those. These are the baysean style filters built into a lot of email clients as well as specific filters users create to handle their own mail. As bulk senders, there’s not much we can do here. Senders have to accept users will do whatever they want with mail. Sometimes it benefits senders like when a user writes a rule to mark a particular message as important. Other times it doesn’t benefit senders, like when a user decides to trash a message without reading it. In both cases, senders don’t get a say.

It’s these user filters, and individual user actions on messages, that feed back into what we generally describe as “machine learning” filters. These are the black box style filters that measure thousands of different things about an email and make decisions about the whole mailstream. Many email delivery folks understand how SpamAssassin works. I think of SA as the precursor to a lot of the machine learning filters. While ML is much more complicated, the filters basically look at everything about an email and work out a score. That score determines where an email is delivered to the “average” user that doesn’t have any specific filters for that sender.

Machine learning filters are extremely conditional and will deliver mail to different places for different recipients. They’re adaptive and they learn. They’re under constant development and refinement to catch types of bad mail they missed and to let through types of good mail that they caught.

There’s another level of filter here, the SMTP level filters. These are very non-conditional filters. They’re basically hard and fast rules that are pushed out to the MX by the machine learning filter. The questions this filter asks are almost all yes or no questions. Examples of these kinds of questions

  • Is this IP or domain is on a blocklist? If yes, reject. If no, pass it on.
  • Does this email mentions a URL we’ve seen in phishing mail? If yes, reject the message.
  • Is this email is part of a stream we like? If yes, let it in and let it in fast.

There are other parts to these filters as well, but again the MX filters really ask simple yes or no questions.

  • Does this email address exist?
  • Is this message authenticated?
  • Is there a DMARC record and does the message pass DMARC?”

This isn’t a model that encompasses all the complexity of email filters. But it does help drive what we can and should do to troubleshoot delivery problems.

Related Posts

Filters do what we tell them

In the email space we talk about filters as if they were sentient beings. “The filters decided…” “The filters said…” This is convenient shorthand, but tends to mask that filters aren’t actually deciding or saying anything. Filters are software processes that follow rules dictated by the people who create and maintain them. The rules flow from the goals set by the mailbox provider. The mailbox provider sets goals based on what their users tell them. Users communicate what they want by how they interact with email.

What we end up with is a model where a set of people make decisions about what mail should be let in. They pass that decision on to the people who write the filters. The people who write the filters create software that evaluates email based on those goals using information collected from many places, including the endusers.
What mail should be let in is an interesting question, with answers that differ depending on the environment the filter is deployed in.
Consumer ISPs typically want to keep their users happy and safe. Their goals are to stop harmful mail like phishing, or mail containing viruses or malware. They also want to deliver mail that makes their users happy. As one ISP employee put it, “We want our users to be delighted with your mail.”
Businesses have a few other goals when it comes to filters. They, too, need filters to protect their network from malicious actors. As businesses are often directly targeted by bad actors, this is even more important. They also want to get business related email, whether that be from customers or vendors. They may want to ensure that certain records are kept and laws are followed.
Governments have another set of goals. Universities and schools have yet another set of goals. And, of course, there are folks who run their own systems for their own use.
Complicating the whole thing is that some groups have different tolerances for mistakes. For instance, many of our customers are folks dealing with being blocked by commercial filters. Therefore, we don’t run commercial filters. That does mean we see a lot of viruses and malware and rely on other strategies to stop a compromise, strategies that wouldn’t be as viable in a different environment.
Filters are built to meet specific user needs. What they do isn’t random, it’s not unknowable. They are designed to accomplished certain goals and generally they’re pretty good at what they do. Understanding the underlying goals of filters can help drive solutions to poor delivery.
Use the shorthand, talk about what filters are doing. But remember that there are people behind the filters. Those filters are constantly maintained in order to keep up with ever changing mail streams. They aren’t static and they aren’t forgotten. They are updated regularly. They are fluid, just like the mail they act on.

Read More

Getting removed from an ISP block

A question came up on a mailing list about how long it typically took to resolve a spam block at an ISP. I don’t think that question actually has a single answer, as each ISP has their own, special, process.
ISPA takes 5 minutes. You fill out a form, it runs through their automated system and you’re usually delisted.
ISPB asks a lot of questions in their form, so it takes about 15 minutes to collect all the data they want and 10 minutes to fill out their form. Then, using very, very short words you keep repeating what you need to the tier 1 person who initially responded. That person eventually figures out they can’t blow you off and throws your request to tier 2, who handles it immediately.
ISPC has a different, somewhat long form. Again, you spend time collecting all the data and then fill out the somewhat obscure form. You get a response, but it’s a boilerplate totally unrelated to the initial request, so you keep answering until you find a tier 1 rep who can read and do what you initially asked.
ISPD has a form that takes about 2 minutes to fill out. Unfortunately, it goes to an outsourced postmaster team in the Far East and response times are ranging from days to months right now.
ISPE has an email address and if you catch them on a good day, they’re very helpful. Sometimes there’s no response, though.
ISPF has a troubleshooting page and accept requests to fix things, but never respond in any visible manner.
ISPG they tells you to talk to Spamfiltering Company H.
Spamfiltering company H answers their email in a prompt and friendly manner. OK, sometimes the answers are just “wow, your client/customer/IP range is sending lots of spam,” but hey, it’s an answer.
Spamfiltering company I is a useless bag of protoplasm and don’t even answer the email address they give you on their webpages. In a fit of fairness, I have heard they will occasionally respond, but usually that response is to tell you to go pay some apparently unrelated company a bribe to get delisted.
Spamfiltering company J doesn’t have a lot of ways to contact them, but have a lot of folks that participate in various semi-public arenas so if you’re even slightly part of the community, you can email them and they’re very helpful.
Spamfiltering company K is totally useless, but will tell you to have recipients whitelist you.

Read More

Politics and Delivery

Last week I posted some deliverability advice for the DNC based on their acquisition of President Obama’s 2012 campaign database. Paul asked a question on that post that I think is worth some attention.

Read More