GFI/SORBS – a DDoS Intermezzo

steve
December 6, 2010
Industry

Act 1 • Act 2 • Intermezzo • Act 3 • Act 4 • Act 5
Management Summary, Redistributable Documents and Links
I’ve been stage-managing for a production of The Nutcracker this week, so musical terminology is on my mind. In opera, the intermezzo is a comedic interlude between acts of an opera series.
This comedic interlude is about the “DDoS” – a distributed denial of service attack. What is a denial of service attack?

… an attempt to make a computer resource unavailable to its intended users.
One common method of attack involves saturating the target machine with external communications requests, such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively unavailable.Wikipedia on DoS attacks

That’s pretty much what we’re discussing here. There are a variety of ways to mount a DoS attack but by far the most common, and the sort that’s characterized by descriptions of “gigabits per second” or “packets per second”, is simply sending high volumes of network traffic aimed at a webserver or network routers the webserver relies on. The network traffic might be pure garbage, or it might be valid web requests, but the goal of the attacker is to overwhelm either the server itself or, more commonly, the network pipe connecting the server to the internet such that it can’t provide it’s service to the public. At it’s most basic level the symptom of a DoS attack is that you can’t reach any web page on the server that’s under attack.
(What’s a distributed denial of service attack? It’s an implementation detail – the attacker uses multiple machines to attack simultaneously, enabling them to provide more attack traffic and to make that traffic a little harder to block.)
So that’s what a DoS attack can do – make a service unavailable. What can’t a DoS attack do? It can’t make any changes to the server(s) under attack. It can’t deface a web page. It can’t cause a web service to give wrong answers. It can’t corrupt information stored in a database. It can’t cause a blacklist to add false listings. That last point is fairly important – no DDoS against any blacklist infrastructure can cause it to add false listings.

How does a DDoS insert bad records into your blacklist? That is the real issue. If that would stop happening, then a DDoS would have no impact on removal of the bad records.insightful comment on Fridays post

And yet, GFI/SORBS has blamed their operational problems such as false positives, publishing of stale data, refusal to delist addresses and so on on “DDoS attacks” sufficiently often that it’s a running joke in the email industry.

Once again SORBS has reactivated old DUL-Listings, e.g. for 85.25.230.x. This happened back in october as well, and that time you also claimed DDoS.From a comment on Friday’s post

So lets look at the evolution of the latest SORBS incident. I don’t usually pay that much attention, as SORBS listings don’t affect me or my customers at all, but this time around I was researching this series of posts so I watched what was going on reasonably carefully.
On Thursday the www.sorbs.net webserver was reasonably responsive, static pages and files were returned fairly quickly, but pages that needed to access the backend database (for account creation, authentication etc.) had some problems. They were either throwing database errors (perl or PHP, apparently, and the errors looked like race conditions or invariate violations caused by page reloads) or the scgi web application process was taking too long, causing the webserver to return a “500 Internal Error” page. Reloading would (eventually) get the page. All of this behaviour will be very familiar to any web developer who’s messed up their database design to the extent that the queries needed to render a web page take too long. There was no sign of any network or webserver level problems, certainly no obvious symptoms that looked like any sort of DoS, just a database that wasn’t returning data quickly enough. If GFI were doing batch database operations against the production database, as part of trying to fix whatever was going on, that could cause the database to be sluggish, but it could also be caused by a huge number of people trying to log in (to try and get their false listings removed) or just poor database design.

I’ve been trying to create an account on Sorbs to request a delisting (from DUHL), but I keep getting an error, with perl error code echoed to screen, after waiting for minutes for the register an account to process.another satisfied commenter

On Saturday the situation changed entirely. I was unable to access the www.sorbs.net website at all (though the corporate gfi.com website was fine). That change in “DDoS” behaviour seemed very strange, and the timing (relative to GFI employees noticing that I was using the sorbs website to investigate the false listings) seemed rather convenient, so I looked in a bit more detail.
The sorbs webserver is hosted on five separate IP addresses (which one you end up at will be picked semi-randomly). That’s quite a lot – most websites, including the GFI corporate website, are hosted on a single address and even facebook.com only needs three.
None of those five addresses are in the address space allocated to GFI, rather three of them are in 111.125.160.128/26 – address space allocated to Matthew Sullivan personally while the other two are in 208.43.0.0/16 – address space assigned to softlayer technologies, most likely colocated servers or virtual servers being rented by GFI. (Note: This Matthew Sullivan, and the Michelle Sullivan who commented on Friday’s post are the same person, the GFI employee who founded SORBS originally and, to the best of my knowledge, still operates it.)
If the web server cluster were under a sustained DDoS I’d expect the five addresses to be attacked in the same way. Yet the symptoms were quite different. The three servers directly controlled by Matthew Sullivan (111.*) were completely unresponsive – no packets were being returned, 100% packet loss. The two softlayer hosted servers (208.*) were responsive at a packet level, accepting connections immediately on port 80 but not returning any results at the http level.
The behaviour of the softlayer servers is hard to explain in a DDoS related way, particularly as it was repeatable from several different networks. If all the apache sessions were occupied but there were still space available in the kernel level accept queue, that would explain the symptoms – but that would require an implausibly careful DDoS.
If the sorbs web application were broken in a way that it hung or crashed even on trivial static page requests then I’d expect the webserver to time out the app and return a 500 “I’m Broken” error, which it didn’t seem to do. That also wouldn’t explain the different behavior of the 111.* servers and the 208.* servers.
If the 208.* servers were pure proxy servers that just tunneled all web requests through to the 111.* servers that would explain what’s seen – a request to the 208.* server is accepted, then forwarded to the 111.* servers, which just hang. This would be an implausibly badly designed network architecture (it adds two servers which do nothing but add bandwidth costs, increase latency and reduce system reliability) but it’s just barely plausible. Given there’s no obvious packet level issues connecting to the 208.* servers, though it would require that the anonymous ddosers don’t bother attacking the 208.* servers as they know they’re “fake” servers and need not be attacked to take www.sorbs.net off the air.

Shouldn’t any competently-run DNSBL have plans in place for handling DDOS attacks? Why is Spamhaus stable but SORBS down relatively often?@delivery_kitty

So what else could explain the differences in behavior between the two sets of servers? The 111.* servers are in network space assigned directly to Matthew Sullivan, and he presumably has full, network level control over them, while the 208.* servers are hosted, so GFI may have a different level of access, perhaps mostly at an application or control panel level.
There is one explanation for the symptoms that explains the odd behaviour seen, and also explains the “elephant in the room” – that the degraded website behavior tends to appear soon after there’s been a rash of false data added to the database.

Or – excuse my wild speculation here – but maybe it’s not actually a DDOS attack against SORBS, but mistakes on the part of the operators?@delivery_kitty

If I were to fake up the symptoms of a DDoS where I had complete control over the network I’d pretend that I was being “packeted to death” by gigabits of traffic, and configure my router to drop all inbound packets. That would simulate reasonably accurately the effects of a massive DDoS, and also be the defensive approach you’d put in to place to defend against a real DDos. it’s exactly the behavior I see from the natively hosted sorbs web servers in 111.*.
If I only had access to the web application level (either because I was running on a hosted server, or if I didn’t have sufficient private control over the server such that I could create packet filtering without notice) the best I could do would be to make the web application hang, and possibly configure the webserver to have an extremely long timeout. That wouldn’t simulate a DDoS particularly well, but it would be good enough to convince anyone who were just using a web browser rather than looking at the lower level traffic. It’s exactly the behavior I see from the softlayer hosted sorbs web servers in 208.*.

Every time when SORBS makes just another mistake, you claim DDoS for either the problem or your inability to fix the mistake. And because you claim this every time, nobody believes you any longer.Hans

I’d be loath to suggest such a theory, even though it’s the most plausible explanation of the symptoms I’ve seen if it weren’t for the reputation sorbs has of having “convenient” DDoS attacks to explain false positive listings (which, as we explained earlier, cannot possibly be caused by any sort of DoS attack). Additionally, even though GFI were claiming to have been under a DDoS since early last week when they loaded millions of false positives into their database, the SORBS webserver had been up and basically functional, if slow. Shortly after a GFI employee – the same employee who has direct control over the 111.* webservers – commented on my blog post on Friday that explained I was looking into data inaccuracy, the “DDoS” symptoms changed to something entirely different, something that prevented me from looking at further SORBS data. Given SORBS history with respect to “DDoS attacks” I’m suspicious of both the timing and the details of the symptoms.
Fortunately, I’d actually gathered most of the data I needed for tomorrows post by Friday, so the “DDoS attack” didn’t really inconvenience me anywhere near as much as it did all the postmasters trying to investigate and resolve SORBS false listings.

the last response I got back … was that the entire /24 block was ‘inelligible’ for de-listing. The parent company sites a DDoS attack as well and says their management team is aware of the issue and working to resolve it ASAP. We’ll see what happens…anonymous commenter

GFI would benefit from some transparency about their processes, SORBS day-to-day operations and details about the mistakes they’re making, how they’re fixing them and how they’re ensuring they’re not repeated again and again. And some explicit details about exactly what sort of “DDoS” they’re seeing might help them gain some credibility. This level of communication isn’t helping with that.
More tomorrow.

Guide to resolving ISP issues

laura
Sep 14, 2010

Best Practices

I often get a chuckle out of watching some people, who are normally on the blocking end of the delivery equation, struggle through their own blocking issues. A recent situation came up on a mailing list where someone who has very vehement opinions about how to approach her particular blocklist for delisting and that the lists policies are immutable. The company she works for is having some delivery issues and she’s looking for a contact to resolve the issues.
While digging through my blog posts to see if there was any help I could provide, I realized I don’t have a guide to resolving blocking issues at ISPs. Much of the troubleshooting can be done without ever contacting the ISPs or the blocklists.
Identify the issue.
There are a number of techniques that ISPs use to protect their users from malicious or problematic mail, from rate-liming incoming mail, putting mail in the bulk folder, or blocking specific IP addresses. Step one to resolving any delivery problem is to identify what is happening to the mail. In order to resolve the issue, you have to know what the issue is.
All too often, the description of a delivery problem is: My mail isn’t getting delivered. But that isn’t very clear as to what the actual problem is. Are you being temp failed? Is mail being blocked? Is mail going to the bulk folder? Is this something affecting just you or is it a widespread problem?
Troubleshoot your side.
Collect as much data about the problem as you can. Dig through logs and get copies of any rejection messages. Follow any URLs that are present in the bounce messages. Try sending a bare bones email to yourself at that ISP with just URLs, is it still blocked? What if you send from a different IP, does the same thing happen?
There is a lot of troubleshooting a sender can do without having to contact an ISP, and the information can lead to resolution that doesn’t involve having to contact the ISP. Also, many current ISP blocks are dynamic, they come up and go down without any human intervention. Those blocks that require contact to get them resolved have clear instructions in the bounce message.
Fix your stuff.
Whether it’s a reputation issue or a minor technical issue, fix the problem on your end. Just moving IP addresses or changing a URL isn’t a sustainable fix. There is a reason mail is being blocked or filtered and if you don’t fix that issue, the blocks are just going to come back. After you do fix your stuff, expect to see changes in a few days or a week. The ISP filters are generally quite responsive to sender improvements so if you’ve fixed the stuff you should see changes pretty quickly. Expect unblocking or filtering to take a little longer than the block was in place.
If you can’t figure out what the problem is, hire a consultant. Here at Word to the Wise we can often quickly identify a problem and provide a path to resolution. Sometimes the problem isn’t even the ISPs, we’ve had multiple cases where our clients were using custom software and their software wasn’t SMTP compliant and we were able to identify the problem and get their mail working again. There are a host of other independent consultants out there that can also help you identify and resolve blocking problems.
Contact the ISPs.
If there is a hard block or after fixing what you think the underlying problem is, you’ll have to contact the ISP. Many ISPs provide self service websites and contact forms to facilitate this process. Generally, though, most issues aren’t going to require contact.

laura
Jul 13, 2010

Best Practices

Recently, an abuse desk rep asked what to do when customers were complaining about being assigned an IP address located on a blocklist. Because not every blocklist actually affects mail delivery it’s helpful to identify if the listing is causing a problem before diving in and trying to resolve the issue.

laura
Jul 17, 2010

Industry

A question came up on a mailing list about how long it typically took to resolve a spam block at an ISP. I don’t think that question actually has a single answer, as each ISP has their own, special, process.
ISPA takes 5 minutes. You fill out a form, it runs through their automated system and you’re usually delisted.
ISPB asks a lot of questions in their form, so it takes about 15 minutes to collect all the data they want and 10 minutes to fill out their form. Then, using very, very short words you keep repeating what you need to the tier 1 person who initially responded. That person eventually figures out they can’t blow you off and throws your request to tier 2, who handles it immediately.
ISPC has a different, somewhat long form. Again, you spend time collecting all the data and then fill out the somewhat obscure form. You get a response, but it’s a boilerplate totally unrelated to the initial request, so you keep answering until you find a tier 1 rep who can read and do what you initially asked.
ISPD has a form that takes about 2 minutes to fill out. Unfortunately, it goes to an outsourced postmaster team in the Far East and response times are ranging from days to months right now.
ISPE has an email address and if you catch them on a good day, they’re very helpful. Sometimes there’s no response, though.
ISPF has a troubleshooting page and accept requests to fix things, but never respond in any visible manner.
ISPG they tells you to talk to Spamfiltering Company H.
Spamfiltering company H answers their email in a prompt and friendly manner. OK, sometimes the answers are just “wow, your client/customer/IP range is sending lots of spam,” but hey, it’s an answer.
Spamfiltering company I is a useless bag of protoplasm and don’t even answer the email address they give you on their webpages. In a fit of fairness, I have heard they will occasionally respond, but usually that response is to tell you to go pay some apparently unrelated company a bribe to get delisted.
Spamfiltering company J doesn’t have a lot of ways to contact them, but have a lot of folks that participate in various semi-public arenas so if you’re even slightly part of the community, you can email them and they’re very helpful.
Spamfiltering company K is totally useless, but will tell you to have recipients whitelist you.

GFI/SORBS – a DDoS Intermezzo

Related Posts

Guide to resolving ISP issues

I'm on a blocklist! HELP!

Getting removed from an ISP block

GFI/SORBS – a DDoS Intermezzo

Share :

Related Posts

Guide to resolving ISP issues

I'm on a blocklist! HELP!

Getting removed from an ISP block