Filtering by gestalt

One of those $5.00 words I learned in the lab was gestalt. We were studying fetal alcohol syndrome (FAS) and, at the time, there were no consistent measurements or numbers that would drive a diagnosis of FAS. Diagnosis was by gestalt – that is by the patient looking like someone who had FAS.
It’s a funny word to say, it’s a funny word to hear. But it’s a useful term to describe the future of spam filtering. And I think we need to get used to thinking about filtering acting on more than just the individual parts of an email.

Filtering is not just IP reputation or domain reputation. It’s about the whole message. It’s mail from this IP with this authentication containing these URLs.  Earlier this year, I wrote an article about Gmail filtering. The quote demonstrates the sum of the parts, but I didn’t really call it out at the time.

Gmail uses a 10+ year old neural network that analyzes thousands of factors, related to email, IP, and web, integrated with all Google products, and with 99.9%+ accuracy for identifying certain types of messages, combined with an email-specific domain-based reputation system that combines IP reputation, content, read rates, reputation of other senders with similar content.

With filters, Gmail looks at the whole picture. They look at all the data and assess the whole.  Gmail filters by Gestalt. I think other companies are catching up and this is the filtering of the future.

So… what’s that mean?

That means that we’re not looking at warming up an IP or a domain. Instead we’re warming up a domain on an IP. Take the domain to another IP, and the reputation doesn’t carry. Change a domain on an IP and that needs to be warmed up as a domain/IP pair.
But even that is overly simplified from reality. It’s not a domain/IP pair, it’s this SPF domain, that d= domain, this IP, this DMARC alignment, these URLs, and on and on. A recent talk referred to warming up resources in relationship to each other, where resources were things like IPs, domains, and URLs.

Spamassassin with relative scores

I think most readers have a good feeling for how Spamassassin works. It has a bunch of rules, and assigns scores based to each rule. All the scores are added together and if they’re higher than a certain value the mail is filtered.
In more modern filtering, particularly at Gmail, scoring is dynamic. There are still rules and they still assign scores. But the scores themselves can be modified by other scores in the process. It’s not a simple sum of scores so changing anything can change the overall status of a message.
Take two identical messages and two IP addresses one with an arbitrary reputation of 5 and another with an arbitrary reputation of 10. By the score and sum method, the final email reputation scores would be message+5 and message+10. With relative scoring, though, the IP reputations might turn out to be 2 and 13.

Look at the whole picture

There’s a West Wing episode where Jeb is playing chess with multiple members of the White House staff while negotiating the international crisis of the week. Throughout the episode he tells staff to “look at the whole board.” This is really what we have to be doing in deliverability right now. We have to look at the whole board. We have to look at the whole face. We have to see the gestalt.
We can’t just look at the domains and URLs in a message, we have to consider them in context with the IP addresses. All mailstreams affect each other. No longer can we look at transactional messages as separate from marketing messages. The reputation of each affects the other.
This is actually good. It means that different mailstreams, even with the same URLs from the same IPs can develop independent reputations. It makes it easier to use shared IPs. Reputation isn’t reliant on keeping everything separate. It’s the whole picture that’s important.
Email is much more than the sum of its parts.
 
 

Related Posts

IP Reputation

A throwback post from a few years ago on IP reputation.

Read More

Gmail filtering in a nutshell

Gmail’s approach to filtering; as described by one of the old timers. This person was dealing with network abuse back when I was still slinging DNA around as my job and just reading headers as a hobby.

Read More

Reputation is about behavior

meter19
Reputation is calculated based on actions. Send mail people want and like and interact with and get a good reputation. Send mail people don’t want and don’t like and don’t interact with and get a bad reputation.
 
Reputation is not
… about who the sender is.
… about legitimacy.
… about speech.
… about message.
Reputation is
… about sender behavior.
… about recipient behavior.
… about how wanted a particular mail is forecast to be.
… based on facts.
Reputation isn’t really that complicated, but there are a lot of different beliefs about reputation that seem to make it complicated.
The reputation of a sender can be different at different receivers.
Senders sometimes target domains differently. That means one receiver may see acceptable behavior but another receiver may see a completely different behavior.  
Receivers sometimes have different standards. These include standards for what bad behavior is and how it is measured. They may also have different thresholds for things like complaints and bounces.
What this means is that delivery at one receiver has no impact on delivery at another. Just because ISP A delivers a particular mail to the inbox doesn’t mean that ISP B will accept the same mail. Each receiver has their own standards and sometimes senders need to tune mail for a specific receiver. One of my clients, for instance, tunes engagement filters based on the webmail domain in the email address. Webmail domain A needs a different level of engagement than webmail domain B.
Public reputation measures are based on data feeds.
There are multiple public sources where senders can check their reputation. Most of these sources depend on data feeds from receiver partners. Sometimes they curate and maintain their own data sources, often in the form of spamtrap feeds. But these public sources are only as good as their data analysis. Sometimes, they can show a good reputation where there isn’t one, or a bad reputation where there isn’t one.
Email reputation is composed of lots of different reputations. 
Email reputation determines delivery.  Getting to the inbox doesn’t mean sending from an IP with a good reputation. IP reputation is combined with domain reputation and content reputation to get the email reputation. IP reputation is often treated as the only valuable reputation because of the prevalence of IP based blocking. But there are SMTP level blocks against domains as well, often for phishing or virus links. Good IP reputation is necessary but not sufficient for good email delivery.
Reputation is about what a sender does, not about who a sender is.
Just because a company is a household name doesn’t mean their practices are good enough to make it to the inbox. Email is a meritocracy. Send mail that merits the inbox and it will get to recipients. Send email that doesn’t, and suffer the repercussions.

Read More