Troubleshooting: part 3

As I continue to think about how people troubleshoot email delivery I keep finding other things to talk about. Today we’re going to talk about the question most folks start with when troubleshooting delivery. “Did ISP change something?”

image of a head with gears and ideas floating around it

At least once a week I check some delivery or email fora and some form of the question is sitting there.

“Did X change something? We haven’t done anything different and our delivery went way down overnight.”

Did Y change their filters? Our delivery is tanking and all our authentication is fine.”

Anyone hear of a change at Z? We have been having increasing difficulty reaching the inbox and we don’t understand why. Looking for suggestions.

In reality, the answer to this question Does Not Matter and asking it is only going to delay actually resolving your delivery issue.

When filters change

The reality is, filters are continually changing. ISPs and filtering companies are always tuning filters. These changes are roughly in 3 categories.

  • Ongoing tweaking and improvement to provide a better experience for their users
  • Changes done to address a emergent threat (Yahoo deploying p=reject is one example of this)
  • Specific changes to catch a type of spam they had previously been unable to effectively identify and filter.

Filters are not static. They are continually adjusting based on a number of things. We can always assume the answer to the question is yes. Something changed. Now what?

There are basically 3 situations here.

  • The filters did something unexpected and caught mail it wasn’t intended to catch, causing recipients to complain to the ISP.
  • The filter change was intentional but caught more mail than was intended, causing recipients to complain to the ISP.
  • The filter change was intentional and caught exactly the mail that was intended and the recipients didn’t care enough to notice that mail was missing.

In the first two cases, the ISP is going to fix things. They’re going to listen to their users and adjust the filters. In the first case, I expect to see changes and rollback within 24 – 48 hours. In the second, I expect to see changes in 24 – 96 hours.

The third case is the interesting one. Does anyone care about mail they don’t care about going to the bulk folder? Folks sending mail, even opt-in mail, that the users don’t complain about when it’s missing is the definition of grey mail. Filter maintainers listen to their users. If users complain they’ll change things, if users don’t complain they’ll assume the filters are working as intended.

The answer to the question did the filters changed tells you nothing. Of course the filters changed. Either they’re doing something that the maintainers don’t intend, which means they’ll be fixed or they’re catching mail they’re intended to catch.

Instead of asking if the filters changed, flip the question. Why are my users not interested enough in my mail to notice it when it’s gone? Start your troubleshooting from that perspective.

Related Posts

Why do ISPs do that?

One of the most common things I hear is “but why does the ISP do it that way?” The generic answer for that question is: because it works for them and meets their needs. Anyone designing a mail system has to implement some sort of spam filtering and will have to accept the potential for lost mail. Even the those recipients who runs no software filtering may lose mail. Their spamfilter is the delete key and sometimes they’ll delete a real mail.
Every mailserver admin, whether managing a MTA for a corporation, an ISP or themselves inevitably looks at the question of false positives and false negatives. Some are more sensitive to false negatives and would rather block real mail than have to wade through a mailbox full of spam. Others are more sensitive to false positives and would rather deal with unfiltered spam than risk losing mail.
At the ISPs, many of these decisions aren’t made by one person, but the decisions are driven by the business philosophy, requirements and technology. The different consumer ISPs have different philosophies and these show in their spamfiltering.
Gmail, for instance, has a lot of faith in their ability to sort, classify and rank text. This is, after all, what Google does. Therefore, they accept most of the email delivered to Gmail users and then sort after the fact. This fits their technology, their available resources and their business philosophy. They leave as much filtering at the enduser level as they can.
Yahoo, on the other hand, chooses to filter mail at the MTA. While their spamfoldering algorithms are good, they don’t want to waste CPU and filtering effort on mail that they think may be spam. So, they choose to block heavily at the edge, going so far as to rate limit senders that they don’t know about the mail. Endusers are protected from malicious mail and senders have the ability to retry mail until it is accepted.
The same types of entries could be written about Hotmail or AOL. They could even be written about the various spam filter vendors and blocklists. Every company has their own way of doing things and their way reflects their underlying business philosophy.

Read More

Feedback loops

There are a lot of different perspectives on Feedback Loops (FBLs) and “this is spam” buttons across the email industry.
Some people think FBLs are the best thing since sliced bread and can’t figure out why more ISPs don’t offer them. These people use use the data to clean addresses off their lists, lower complaints and send better mail. They use the complaints as a data source to help them send mail their recipients want. Too many recipients opted out on a particular offer? Clearly there is a problem with the offer or the segmentation or something.
Other people, though, think the existence of “this is spam” buttons and FBLs is horrible.  They call people who click “this is spam” terrorists or anti-commerce-net-nazis. They want to be able to dispute every click of the button. They think that too many ISPs offer this is spam buttons and too many ESPs and network providers pay way to much attention to complaints. The argue ISPs should remove these buttons and stop paying attention to what recipients think.
Sadly, I’m not actually making up the terminology in the last paragraph. There really are who think that the problem isn’t with the mail that they’re sending but that the recipients can actually express an opinion about it and the ISPs listen to those opinions. “Terrorists” and “Nazis” are the least of the things they have called people who complain about their mail.
One of the senior engineers at Cloudmark recently posted an article talking about FBLs and “this is spam” buttons. I think it’s a useful article to read as it explains what value FBLs play in helping spam filters become more accurate.

Read More

AHBL Wildcards the Internet

AHBL (Abusive Host Blocking List) is a DNSBL (Domain Name Service Blacklist) that has been available since 2003 and is used by administrators to crowd-source spam sources, open proxies, and open relays.  By collecting the data into a single list, an email system can check this blacklist to determine if a message should be accepted or rejected. AHBL is managed by The Summit Open Source Development Group and they have decided after 11 years they no longer wish to maintain the blacklist.
A DNSBL works like this, a mail server checks the sender’s IP address of every inbound email against a blacklist and the blacklist responses with either, yes that IP address is on the blacklist or no I did not find that IP address on the list.  If an IP address is found on the list, the email administrator, based on the policies setup on their server, can take a number of actions such as rejecting the message, quarantining the message, or increasing the spam score of the email.
The administrators of AHBL have chosen to list the world as their shutdown strategy. The DNSBL now answers ‘yes’ to every query. The theory behind this strategy is that users of the list will discover that their mail is all being blocked and stop querying the list causing this. In principle, this should work. But in practice it really does not because many people querying lists are not doing it as part of a pass/fail delivery system. Many lists are queried as part of a scoring system.
Maintaining a DNSBL is a lot of work and after years of providing a valuable service, you are thanked with the difficulties with decommissioning the list.  Popular DNSBLs like the AHBL list are used by thousands of administrators and it is a tough task to get them to all stop using the list.  RFC6471 has a number of recommendations such as increasing the delay in how long it takes to respond to a query but this does not stop people from using the list.  You could change the page responding to the site to advise people the list is no longer valid, but unlike when you surf the web and come across a 404 page, a computer does not mind checking the same 404 page over and over.
Many mailservers, particularly those only serving a small number of users, are running spam filters in fire-and-forget mode, unmaintained, unmonitored, and seldom upgraded until the hardware they are running on dies and is replaced. Unless they do proper liveness detection on the blacklists they are using (and they basically never do) they will keep querying a list forever, unless it breaks something so spectacularly that the admin notices it.
So spread the word,

Read More