Filtering by gestalt
One of those $5.00 words I learned in the lab was gestalt. We were studying fetal alcohol syndrome (FAS) and, at the time, there were no consistent measurements or numbers that would drive a diagnosis of FAS. Diagnosis was by gestalt – that is by the patient looking like someone who had FAS.
It’s a funny word to say, it’s a funny word to hear. But it’s a useful term to describe the future of spam filtering. And I think we need to get used to thinking about filtering acting on more than just the individual parts of an email.
Filtering is not just IP reputation or domain reputation. It’s about the whole message. It’s mail from this IP with this authentication containing these URLs. Earlier this year, I wrote an article about Gmail filtering. The quote demonstrates the sum of the parts, but I didn’t really call it out at the time.
Gmail uses a 10+ year old neural network that analyzes thousands of factors, related to email, IP, and web, integrated with all Google products, and with 99.9%+ accuracy for identifying certain types of messages, combined with an email-specific domain-based reputation system that combines IP reputation, content, read rates, reputation of other senders with similar content.
With filters, Gmail looks at the whole picture. They look at all the data and assess the whole. Gmail filters by Gestalt. I think other companies are catching up and this is the filtering of the future.
So… what’s that mean?
That means that we’re not looking at warming up an IP or a domain. Instead we’re warming up a domain on an IP. Take the domain to another IP, and the reputation doesn’t carry. Change a domain on an IP and that needs to be warmed up as a domain/IP pair.
But even that is overly simplified from reality. It’s not a domain/IP pair, it’s this SPF domain, that d= domain, this IP, this DMARC alignment, these URLs, and on and on. A recent talk referred to warming up resources in relationship to each other, where resources were things like IPs, domains, and URLs.
Spamassassin with relative scores
I think most readers have a good feeling for how Spamassassin works. It has a bunch of rules, and assigns scores based to each rule. All the scores are added together and if they’re higher than a certain value the mail is filtered.
In more modern filtering, particularly at Gmail, scoring is dynamic. There are still rules and they still assign scores. But the scores themselves can be modified by other scores in the process. It’s not a simple sum of scores so changing anything can change the overall status of a message.
Take two identical messages and two IP addresses one with an arbitrary reputation of 5 and another with an arbitrary reputation of 10. By the score and sum method, the final email reputation scores would be message+5 and message+10. With relative scoring, though, the IP reputations might turn out to be 2 and 13.
Look at the whole picture
There’s a West Wing episode where Jeb is playing chess with multiple members of the White House staff while negotiating the international crisis of the week. Throughout the episode he tells staff to “look at the whole board.” This is really what we have to be doing in deliverability right now. We have to look at the whole board. We have to look at the whole face. We have to see the gestalt.
We can’t just look at the domains and URLs in a message, we have to consider them in context with the IP addresses. All mailstreams affect each other. No longer can we look at transactional messages as separate from marketing messages. The reputation of each affects the other.
This is actually good. It means that different mailstreams, even with the same URLs from the same IPs can develop independent reputations. It makes it easier to use shared IPs. Reputation isn’t reliant on keeping everything separate. It’s the whole picture that’s important.
Email is much more than the sum of its parts.