GFI/SORBS considered harmful, part 2

Act 1Act 2IntermezzoAct 3Act 4Act 5
Management Summary, Redistributable Documents and Links
Yesterday I talked about GFI responsiveness to queries and delisting requests about SORBS listings. Today I’m going to look at data accuracy.
The two issues are tightly intertwined – a blacklist that isn’t responsive to reports of false positive listings will end up with a lot of stale or inaccurate data, and a blacklist that has many false positives will likely be overwhelmed with complaints and delisting requests, and won’t be able to respond to them – leading to a spiral of dissatisfaction and inaccurate data feeding off each other.

Because it is so difficult to remove an IP address from the list, SORBS as a blacklist produces many false positives.
XS4ALL also consider SORBS harmful

GFI/SORBS maintains nine different IP address based blacklists, but they’re usually bundled together and treated as a single “Don’t accept email from this address” blacklist. Each of the nine lists have somewhat different listing policies, though there’s been some scope creep and blurring of the lines over the years.
I’m going to focus on just one of them, dul.dnsbl.sorbs.net. This is intended to list “Dynamic IP Address ranges” – consumer internet connections, such as DSL lines, cable modems and dialup modem pools where the IP address assigned to a user will change over time. These systems don’t typically send legitimate email directly to recipients (rather they send mail via their ISPs smarthost) and often contain a lot of consumer windows machines, which tend to get infected and send viruses and spam, so declining to accept mail from this sort of address pool is a fairly sensible decision. (Spamhaus maintain a list with similar goals, the PBL, as do Trend Micro).
GFI/SORBS have had a number of database accidents that have repeatedly caused false listings in a number of their lists, but because the DUL zone tends to list large ranges of IP addresses, data handling mistakes there tend to cause more visible problems.

the SORBS DUHL list has become badly broken, flagging thousands maybe millions of static IP’s as dynamic. This setting will flag as spam email from numerous legitimate sources incorrectly. Numerous attempts by mail admins the world over have failed to get Sorbs to fix the mess yet.SmarterMail Support Forum

ISPs tell me that GFI/SORBS also refuse to accept notifications about false positive listings in their DUL zone. Or they do update their database, but then reload the bad data a few weeks later. And if the ISP asks GFI for a status update about a false listing, their policy is to move that request to “the bottom of the pile”, ensuring that the inaccurate data that’s causing noticeable problems continues to be published.

If an ISP has reported to SORBS that a CIDR is no longer dynamic, and the (repeat) notifications have been ignored for 6-12 months… at what point does it go from lack of responsiveness, to data quality, to negligence, to willful malice?frustrated anonymous system administrator

The dul list was originally seeded based on data acquired from the dynablock list in 2003. I’m told that stale data, possibly dating back to 2003, is repeatedly being loaded into the GFI/SORBS DUL list, leading to a huge number of false positives. I can’t tell whether that is the case, but there’s certainly a lot of bad data leading to false listings.

Your problem in researching bad data in SORBS is not going to be finding examples of false listings, it’s going to be whittling that forest down to a manageable stack of wood.Comment from IRC

Very true. Lets choose a particular example: “n2.bullet.mail.sp2.yahoo.com” aka 67.195.134.51. This is one of the mailservers for Yahoo Groups, and sends a lot of mailing list mail. There’s nothing at all to suggest that it’s an end-user, dynamically assigned address machine. Just the opposite, it’s listed at dnswl.org as a Yahoo server that shouldn’t be blacklisted. It’s listed by ARIN as part of a /16 (65536 addresses) assigned to Yahoo. There’s nothing in the hostname to suggest it’s dynamically assigned, it even has the word “mail” in the hostname, a common sign of a legitimate mailserver. McAfee TrustedSource list it as a clean mailserver with a history of sending significant volumes of email, as do SenderBase.
And yet it’s listed in the dnsbl.sorbs.net zone, with a return value of 127.0.0.10 meaning GFI/SORBS are claiming it’s a dynamic IP address.
Looking up that IP address on the SORBS website was a fairly painful exercise (I’ll go into more detail about that, and other SORBS operational problems on Monday) but this is what I found:
That IP address is categorized, wrongly, by GFI/SORBS as a dynamic address.
Why did I choose this particular server as an example, rather than one of the countless other false positives I could have picked? Well, it’s not just a single IP address that’s listed as a false psitive: GFI/SORBS are listing all of 67.194.0.0/15 – that’s 131,072 Yahoo servers that are categorized wrongly. And i know that GFI staff were explicitly notified about that particular listing early yesterday morning. Yet GFI are still publishing that data (as well as at least dozens of other false positive listings of similar size).

An outage usually means someone works quickly to resolve it. Having it still be an issue after 5 days is gross negligence. #sorbsTwitter

A blacklist should have checks in place that make it unlikely that badly wrong data is published, though even the best blacklists will very occasionally have a problem and publish bad data. How they respond to false positives is really important. If a blacklist is notified of false positive listings of this magnitude the safe thing to do is to pull all dubious listings from the published blacklist data (or if it can’t be narrowed down, pull all listings) until the problem is resolved.
That will eliminate the loss of legitimate email to the blacklists customers (and most of the spam the blacklist might have stopped will likely be blocked or filtered by other parts of the spam filters they use). GFI/SORBS have not done this, rather they’re following the same practice they’ve used during previous database catastrophes – continuing to publish known bad data.

I do not doubt that there are mail admins rationally fearing for their jobs this week. I am lucky enough to no longer be in the sort of pathological enterprise where a burst of excess false positives is a risk to an admin’s employment, but I am sure that not everyone who was using the SORBS DUHL until this week is so fortunate.Senior Security Consultant

More on Monday.

Related Posts

Content based filtering

A spam filter looks at many things when it’s deciding whether or not to deliver a message to the recipients inbox, usually divided into two broad categories – the behaviour of the sender and the content of the message.
When we talk about sender behaviour we’ll often dive headfirst into the technical details of how that’s monitored and tracked – history of mail from the same IP address, SPF records, good reverse DNS, send rates and ramping, polite SMTP level behaviour, DKIM and domain-based reputation and so on. If all of those are OK and the mail still doesn’t get delivered then you might throw up your hands, fall back on “it’s content-based filtering” and not leave it at that.
There’s just as much detail and scope for diagnosis in content-based filtering, though, it’s just a bit more complex, so some delivery folks tend to gloss over it. If you’re sending mail that people want to receive, you’re sure you’re sending the mail technically correctly and you have a decent reputation as a sender then it’s time to look at the content.
You want your mail to look just like wanted mail from reputable, competent senders and to look different to unwanted mail, viruses, phishing emails, botnet spoor and so on. And not just to mechanical spam filters – if a postmaster looks at your email, you want it to look clean, honest and competently put together to them too.
Some of the distinctive content differences between wanted and unwanted email are due to the content as written by the sender, some of them are due to senders of unwanted email trying to hide their identity or their content, but many of them are due to the different quality software used to send each sort of mail. Mail clients used by individuals, and content composition software used by high quality ESPs tends to be well written and complies with both the email and MIME RFCs, and the unwritten best common practices for email composition. The software used by spammers, botnets, viruses and low quality ESPs tends not to do so well.
Here’s a (partial) list of some of the things to consider:

Read More

It's not illegal to block mail

My post “We’re going to party like it’s 1996” is still getting a lot of comments from people. Based on the comments, either people aren’t reading or my premise wasn’t clear.
Back in 1996 the first lawsuits were brought against ISPs to stop ISPs from blocking email. These suits were failures. Since that time, other senders have attempted to sue ISPs and lost. Laws have been written protecting the rights of the ISPs to block content they deem to be harmful.
Dela says that he was just attempting to open up a conversation, but I don’t see what he thinks the  conversation is. That ISPs shouldn’t block mail their customers want? Sure, OK. We’re agreed on that. Now, define what mail recipients want. I want what mail I want, not what someone else decides I might want.
Marketers need to get over the belief that they own end users mailboxes and that they have some right to send mail to people. You don’t.
When marketers actually start sending wanted mail, to people who actually subscribe – not just make a purchase, or register online or happen to have an easily discoverable email address – then perhaps marketers will have some standing to claim they are being treated illegally. Until and unless that happens, the ISPs are well within their rights to block mail that their users don’t want.

Read More

Legitimate mail in spamfilters

It can be difficult and frustrating for a sender to understand they whys and wherefores of spam filtering. Clearly the sender is not spamming, so why is their mail getting caught in spam filters?
I have a client that goes through this frustration on rare occasions. They send well crafted, fun, engaging content that their users really want. They have a solid reputation at the ISPs and their inbox stats are always above 98%. Very, very occasionally, though, they will see some filtering difficulties at Postini. It’s sad for all of us because Postini doesn’t tell us enough about what they’re doing to understand what my client is doing to trigger the filters. They get frustrated because they don’t know what’s going wrong; I get frustrated because I can’t really help them, and I’m sure their recipients are frustrated because they don’t get their wanted mail.
Why do a lot of filter vendors not communicate back to listees? Because not all senders are like my clients. Some senders send mail that recipients can take or leave. If the newsletter shows up in their inbox they may read it. If the ad gets in front of their face, they may click through. But, if the mail doesn’t show up, they don’t care. They certainly aren’t going to look for the mail in their bulk folder. Other senders send mail that users really don’t want. It is, flat out, spam.
The thing is, all these senders describe themselves as legitimate email marketers. They harvest addresses, they purchase lists, they send mail to spamtraps, and they still don’t describe themselves as spammers. Some of them have even ended up in court for violating various anti-spam laws and they still claim they’re not spammers.
Senders are competing with spammers for bandwidth and resources at the ISPs, they’re competing for postmaster attention at the ISPs and they’re competing for eyeballs in crowded inboxes.
It’s the sheer volume of spam and the crafty evilness of spammers that drives the constant change and improvement in spamfilters. It’s tough to keep up with the spamfilters because they’re trying to keep up with the spammers. And the spammers are continually looking for new ways to exploit recipients.
It can be a challenge to send relevant, engaging email while dealing with spamfilters and ISPs. But that’s what makes this job so much fun.

Read More