Troubleshooting: part 3

As I continue to think about how people troubleshoot email delivery I keep finding other things to talk about. Today we’re going to talk about the question most folks start with when troubleshooting delivery. “Did ISP change something?”

image of a head with gears and ideas floating around it

At least once a week I check some delivery or email fora and some form of the question is sitting there.

“Did X change something? We haven’t done anything different and our delivery went way down overnight.”

Did Y change their filters? Our delivery is tanking and all our authentication is fine.”

Anyone hear of a change at Z? We have been having increasing difficulty reaching the inbox and we don’t understand why. Looking for suggestions.

In reality, the answer to this question Does Not Matter and asking it is only going to delay actually resolving your delivery issue.

When filters change

The reality is, filters are continually changing. ISPs and filtering companies are always tuning filters. These changes are roughly in 3 categories.

  • Ongoing tweaking and improvement to provide a better experience for their users
  • Changes done to address a emergent threat (Yahoo deploying p=reject is one example of this)
  • Specific changes to catch a type of spam they had previously been unable to effectively identify and filter.

Filters are not static. They are continually adjusting based on a number of things. We can always assume the answer to the question is yes. Something changed. Now what?

There are basically 3 situations here.

  • The filters did something unexpected and caught mail it wasn’t intended to catch, causing recipients to complain to the ISP.
  • The filter change was intentional but caught more mail than was intended, causing recipients to complain to the ISP.
  • The filter change was intentional and caught exactly the mail that was intended and the recipients didn’t care enough to notice that mail was missing.

In the first two cases, the ISP is going to fix things. They’re going to listen to their users and adjust the filters. In the first case, I expect to see changes and rollback within 24 – 48 hours. In the second, I expect to see changes in 24 – 96 hours.

The third case is the interesting one. Does anyone care about mail they don’t care about going to the bulk folder? Folks sending mail, even opt-in mail, that the users don’t complain about when it’s missing is the definition of grey mail. Filter maintainers listen to their users. If users complain they’ll change things, if users don’t complain they’ll assume the filters are working as intended.

The answer to the question did the filters changed tells you nothing. Of course the filters changed. Either they’re doing something that the maintainers don’t intend, which means they’ll be fixed or they’re catching mail they’re intended to catch.

Instead of asking if the filters changed, flip the question. Why are my users not interested enough in my mail to notice it when it’s gone? Start your troubleshooting from that perspective.

Related Posts

Who are mimecast?

Mimecast is a filter primarily used by businesses. They’re fairly widely used. In some of the data analysis I’ve done for clients, they’re a top 10 or top 20 filter.
Earlier today someone asked on Facebook if mimecast may be blocking emails based on the TLD. The short answer is it’s unlikely. I’ve not seen huge issues with them blocking based on TLD of the domain. They’re generally more selective than that.

The good news is mimecast is really pretty good about giving you explanations for why they’re blocking. They’ll even tell you if it’s mimecast related or if it’s a specific user / user-company block.
Some example rejection messages from a recent dive into some bounce logs.

Read More

Troubleshooting delivery is hard, but doable

Even for those of us who’ve been around for a while, and who have a lot of experience troubleshooting delivery problems things are getting harder. It used to be we could identify some thing about an email and if that thing was removed then the email would get to the inbox. Often this was a domain or a URL in the message that was triggering bulk foldering.
Filters aren’t so simple now. And we can’t just randomly send a list of URLs to a test account and discover which URL is causing the problem. Sure, one of the URLs could be the issue, but that’s typically in context with other things. It’s rare that I can identify the bad URLs sending mail through my own server these days.
There are also a lot more “hey, help” questions on some of the deliverability mailing lists. Most of these questions are sticky problems that don’t map well onto IP or domain reputation.
One of my long term clients recently had a bad mail that caused some warnings at Gmail.
We tried a couple of different things to try and isolate the problem, but never could discover what was triggering the warnings. Even more importantly, we weren’t getting the same results for identical tests done hours apart. After about 3 days, all the warnings went away and all their mail was back in the inbox.
It seemed that one mailing was really bad and resulted in a bad reputation, temporarily. But as the client fixed the problem and kept mailing their reputation recovered.
Deliverability troubleshooting is complicated and this flowchart sums up what it’s like.

Here at Word to the Wise, we get a lot of clients who have gone through the troubleshooting available through their ESPs and sometimes even other deliverability consultants. We get the tough cases that aren’t easy to figure out.
What we do is start from the beginning. First thing is to confirm that there aren’t technical problems, and generally we’ll find some minor problems that should be fixed, but aren’t enough to cause delivery problems. Then we look at the client’s data. How do they collect it? How do they maintain it? What are they doing that allows false addresses on their list?
Once we have a feel for their data processes, we move on to how do we fix those processes. What can we do to collect better, cleaner data in the future? How can we improve their processes so all their recipients tell the ISP that this is wanted mail?
The challenging part is what to do with existing data, but we work with clients individually to make sure that bad addresses are expunged and good addresses are kept.
Our solutions aren’t simple. They’re not easy. But for clients who listen to us and implement our recommendations it’s worth it. Their mail gets into the inbox and deliverability becomes a solved problem.

Read More

AHBL Wildcards the Internet

AHBL (Abusive Host Blocking List) is a DNSBL (Domain Name Service Blacklist) that has been available since 2003 and is used by administrators to crowd-source spam sources, open proxies, and open relays.  By collecting the data into a single list, an email system can check this blacklist to determine if a message should be accepted or rejected. AHBL is managed by The Summit Open Source Development Group and they have decided after 11 years they no longer wish to maintain the blacklist.
A DNSBL works like this, a mail server checks the sender’s IP address of every inbound email against a blacklist and the blacklist responses with either, yes that IP address is on the blacklist or no I did not find that IP address on the list.  If an IP address is found on the list, the email administrator, based on the policies setup on their server, can take a number of actions such as rejecting the message, quarantining the message, or increasing the spam score of the email.
The administrators of AHBL have chosen to list the world as their shutdown strategy. The DNSBL now answers ‘yes’ to every query. The theory behind this strategy is that users of the list will discover that their mail is all being blocked and stop querying the list causing this. In principle, this should work. But in practice it really does not because many people querying lists are not doing it as part of a pass/fail delivery system. Many lists are queried as part of a scoring system.
Maintaining a DNSBL is a lot of work and after years of providing a valuable service, you are thanked with the difficulties with decommissioning the list.  Popular DNSBLs like the AHBL list are used by thousands of administrators and it is a tough task to get them to all stop using the list.  RFC6471 has a number of recommendations such as increasing the delay in how long it takes to respond to a query but this does not stop people from using the list.  You could change the page responding to the site to advise people the list is no longer valid, but unlike when you surf the web and come across a 404 page, a computer does not mind checking the same 404 page over and over.
Many mailservers, particularly those only serving a small number of users, are running spam filters in fire-and-forget mode, unmaintained, unmonitored, and seldom upgraded until the hardware they are running on dies and is replaced. Unless they do proper liveness detection on the blacklists they are using (and they basically never do) they will keep querying a list forever, unless it breaks something so spectacularly that the admin notices it.
So spread the word,

Read More