Mentally modelling filters
When we talk about filters, we often think there is one filter. But, in many cases there are multiple stages of filters, each examining mail in a different way.
In deliverability terms the easiest filters to ignore are the individual user filters. Mostly because there’s nothing we can do about those. These are the baysean style filters built into a lot of email clients as well as specific filters users create to handle their own mail. As bulk senders, there’s not much we can do here. Senders have to accept users will do whatever they want with mail. Sometimes it benefits senders like when a user writes a rule to mark a particular message as important. Other times it doesn’t benefit senders, like when a user decides to trash a message without reading it. In both cases, senders don’t get a say.
It’s these user filters, and individual user actions on messages, that feed back into what we generally describe as “machine learning” filters. These are the black box style filters that measure thousands of different things about an email and make decisions about the whole mailstream. Many email delivery folks understand how SpamAssassin works. I think of SA as the precursor to a lot of the machine learning filters. While ML is much more complicated, the filters basically look at everything about an email and work out a score. That score determines where an email is delivered to the “average” user that doesn’t have any specific filters for that sender.
Machine learning filters are extremely conditional and will deliver mail to different places for different recipients. They’re adaptive and they learn. They’re under constant development and refinement to catch types of bad mail they missed and to let through types of good mail that they caught.
There’s another level of filter here, the SMTP level filters. These are very non-conditional filters. They’re basically hard and fast rules that are pushed out to the MX by the machine learning filter. The questions this filter asks are almost all yes or no questions. Examples of these kinds of questions
- Is this IP or domain is on a blocklist? If yes, reject. If no, pass it on.
- Does this email mentions a URL we’ve seen in phishing mail? If yes, reject the message.
- Is this email is part of a stream we like? If yes, let it in and let it in fast.
There are other parts to these filters as well, but again the MX filters really ask simple yes or no questions.
- Does this email address exist?
- Is this message authenticated?
- Is there a DMARC record and does the message pass DMARC?”
This isn’t a model that encompasses all the complexity of email filters. But it does help drive what we can and should do to troubleshoot delivery problems.