Filtering by gestalt

One of those $5.00 words I learned in the lab was gestalt. We were studying fetal alcohol syndrome (FAS) and, at the time, there were no consistent measurements or numbers that would drive a diagnosis of FAS. Diagnosis was by gestalt – that is by the patient looking like someone who had FAS.
It’s a funny word to say, it’s a funny word to hear. But it’s a useful term to describe the future of spam filtering. And I think we need to get used to thinking about filtering acting on more than just the individual parts of an email.

Filtering is not just IP reputation or domain reputation. It’s about the whole message. It’s mail from this IP with this authentication containing these URLs.  Earlier this year, I wrote an article about Gmail filtering. The quote demonstrates the sum of the parts, but I didn’t really call it out at the time.

Gmail uses a 10+ year old neural network that analyzes thousands of factors, related to email, IP, and web, integrated with all Google products, and with 99.9%+ accuracy for identifying certain types of messages, combined with an email-specific domain-based reputation system that combines IP reputation, content, read rates, reputation of other senders with similar content.

With filters, Gmail looks at the whole picture. They look at all the data and assess the whole.  Gmail filters by Gestalt. I think other companies are catching up and this is the filtering of the future.

So… what’s that mean?

That means that we’re not looking at warming up an IP or a domain. Instead we’re warming up a domain on an IP. Take the domain to another IP, and the reputation doesn’t carry. Change a domain on an IP and that needs to be warmed up as a domain/IP pair.
But even that is overly simplified from reality. It’s not a domain/IP pair, it’s this SPF domain, that d= domain, this IP, this DMARC alignment, these URLs, and on and on. A recent talk referred to warming up resources in relationship to each other, where resources were things like IPs, domains, and URLs.

Spamassassin with relative scores

I think most readers have a good feeling for how Spamassassin works. It has a bunch of rules, and assigns scores based to each rule. All the scores are added together and if they’re higher than a certain value the mail is filtered.
In more modern filtering, particularly at Gmail, scoring is dynamic. There are still rules and they still assign scores. But the scores themselves can be modified by other scores in the process. It’s not a simple sum of scores so changing anything can change the overall status of a message.
Take two identical messages and two IP addresses one with an arbitrary reputation of 5 and another with an arbitrary reputation of 10. By the score and sum method, the final email reputation scores would be message+5 and message+10. With relative scoring, though, the IP reputations might turn out to be 2 and 13.

Look at the whole picture

There’s a West Wing episode where Jeb is playing chess with multiple members of the White House staff while negotiating the international crisis of the week. Throughout the episode he tells staff to “look at the whole board.” This is really what we have to be doing in deliverability right now. We have to look at the whole board. We have to look at the whole face. We have to see the gestalt.
We can’t just look at the domains and URLs in a message, we have to consider them in context with the IP addresses. All mailstreams affect each other. No longer can we look at transactional messages as separate from marketing messages. The reputation of each affects the other.
This is actually good. It means that different mailstreams, even with the same URLs from the same IPs can develop independent reputations. It makes it easier to use shared IPs. Reputation isn’t reliant on keeping everything separate. It’s the whole picture that’s important.
Email is much more than the sum of its parts.
 
 

Related Posts

Domains need to be warmed, too

One thing that came out of the ISP session at M3AAWG is that domains need to be warmed up, too. I can’t remember exactly which ISP rep said it, but there was general nodding across the panel when this was said.
This isn’t just the domain in the reverse DNS of the sending IP, but also domains used in the Return Path (Envelope From) and visible from.
From the ISP’s perspective, this makes tons of sense. Some of the most prolific snowshoe spammers use new domains and new IPs for every send. They’re not trying to establish a reputation, rather they’re trying to avoid one. ISPs respond by distrusting any mail from a new IP with a new domain.

Read More

IP reputation and email delivery

IP reputation is a measure of how much wanted mail a particular IP address sends.  This wanted mail is measured as a portion of the total email sent from that IP. Initially IP reputation was really the be all and end all of reputation, there was no real good way to authenticate a domain or a from address. Many ISPs built complex IP reputation models to evaluate mail based on the IP that sent the mail.
These IP reputation models were the best we had, but there were a lot of ways for spammers to game the system. Some spammers would create lots of accounts at ISPs and use them to open and interact with mail. Other spammers would trickle their mail out over hundreds or thousands of IPs in the hopes of diluting the badness enough to get to the inbox. Through it all they kept trying to get mail out through reputable ESPs, either by posing as legitimate customers or compromising servers.
These things worked for a while, but the ISPs started looking harder at the recipient pool in order to figure out if the interactions were real or not. They started looking at the total amount of identical mail coming from multiple IP addresses. The ISPs couldn’t rely on IP reputation so they started to dig down and get into content based filtering.
As the ISPs got better at identifying content and filtering on factors other than source IP, the importance of the IP address on inbox delivery changed. No longer was it good enough to have a high reputation IP sending mail.
These days your IP reputation dictates how fast you can send mail to a particular ISP. But a high reputation IP isn’t sufficient to get all the mail in the inbox. It’s really content that drives the inbox / bulk folder decisions these days.
 
Generally IPs that the ISP has not seen email traffic from before start out with a slight negative reputation. This is because most new IPs are actually infected machines. The negative reputation translates to rate limiting. The rate limiting minimizes people getting spam while the ISP works out if this is a real sender or a spammer.
Some ISPs put mail in the inbox and bulk foldering during the whitelisting process. In this case what they’re doing is seeing if your recipients care enough about your mail to look for it in the bulk folder. If they do, and they mark the mail as “not spam” then this feeds back to the sender reputation and the IP reputation.
If you’re seeing a lot of bulk foldering of mail, it’s unlikely there’s anything IP reputation based to do. Instead of worrying about IP reputation, focus instead on the content of the mail and see what you may need to do to improve the reputation of the domains and URLs (or landing pages) in the emails.

Read More

Insight into Gmail filtering

Last week I posted a link to an article discussing how Gmail builds defenses to protect their users from malicious mail. One of the things I found very interesting in that article was the discussion about how Gmail deploys many changes at once, to prevent people from figuring out what the change was.
Let’s take a look at what Gmail said.

Read More