GFI/SORBS – a DDoS Intermezzo
Act 1 • Act 2 • Intermezzo • Act 3 • Act 4 • Act 5
Management Summary, Redistributable Documents and Links
I’ve been stage-managing for a production of The Nutcracker this week, so musical terminology is on my mind. In opera, the intermezzo is a comedic interlude between acts of an opera series.
This comedic interlude is about the “DDoS” – a distributed denial of service attack. What is a denial of service attack?
… an attempt to make a computer resource unavailable to its intended users.
One common method of attack involves saturating the target machine with external communications requests, such that it cannot respond to legitimate traffic, or responds so slowly as to be rendered effectively unavailable.Wikipedia on DoS attacks
That’s pretty much what we’re discussing here. There are a variety of ways to mount a DoS attack but by far the most common, and the sort that’s characterized by descriptions of “gigabits per second” or “packets per second”, is simply sending high volumes of network traffic aimed at a webserver or network routers the webserver relies on. The network traffic might be pure garbage, or it might be valid web requests, but the goal of the attacker is to overwhelm either the server itself or, more commonly, the network pipe connecting the server to the internet such that it can’t provide it’s service to the public. At it’s most basic level the symptom of a DoS attack is that you can’t reach any web page on the server that’s under attack.
(What’s a distributed denial of service attack? It’s an implementation detail – the attacker uses multiple machines to attack simultaneously, enabling them to provide more attack traffic and to make that traffic a little harder to block.)
So that’s what a DoS attack can do – make a service unavailable. What can’t a DoS attack do? It can’t make any changes to the server(s) under attack. It can’t deface a web page. It can’t cause a web service to give wrong answers. It can’t corrupt information stored in a database. It can’t cause a blacklist to add false listings. That last point is fairly important – no DDoS against any blacklist infrastructure can cause it to add false listings.
How does a DDoS insert bad records into your blacklist? That is the real issue. If that would stop happening, then a DDoS would have no impact on removal of the bad records.insightful comment on Fridays post
And yet, GFI/SORBS has blamed their operational problems such as false positives, publishing of stale data, refusal to delist addresses and so on on “DDoS attacks” sufficiently often that it’s a running joke in the email industry.
Once again SORBS has reactivated old DUL-Listings, e.g. for 85.25.230.x. This happened back in october as well, and that time you also claimed DDoS.From a comment on Friday’s post
So lets look at the evolution of the latest SORBS incident. I don’t usually pay that much attention, as SORBS listings don’t affect me or my customers at all, but this time around I was researching this series of posts so I watched what was going on reasonably carefully.
On Thursday the www.sorbs.net webserver was reasonably responsive, static pages and files were returned fairly quickly, but pages that needed to access the backend database (for account creation, authentication etc.) had some problems. They were either throwing database errors (perl or PHP, apparently, and the errors looked like race conditions or invariate violations caused by page reloads) or the scgi web application process was taking too long, causing the webserver to return a “500 Internal Error” page. Reloading would (eventually) get the page. All of this behaviour will be very familiar to any web developer who’s messed up their database design to the extent that the queries needed to render a web page take too long. There was no sign of any network or webserver level problems, certainly no obvious symptoms that looked like any sort of DoS, just a database that wasn’t returning data quickly enough. If GFI were doing batch database operations against the production database, as part of trying to fix whatever was going on, that could cause the database to be sluggish, but it could also be caused by a huge number of people trying to log in (to try and get their false listings removed) or just poor database design.
I’ve been trying to create an account on Sorbs to request a delisting (from DUHL), but I keep getting an error, with perl error code echoed to screen, after waiting for minutes for the register an account to process.another satisfied commenter
On Saturday the situation changed entirely. I was unable to access the www.sorbs.net website at all (though the corporate gfi.com website was fine). That change in “DDoS” behaviour seemed very strange, and the timing (relative to GFI employees noticing that I was using the sorbs website to investigate the false listings) seemed rather convenient, so I looked in a bit more detail.
The sorbs webserver is hosted on five separate IP addresses (which one you end up at will be picked semi-randomly). That’s quite a lot – most websites, including the GFI corporate website, are hosted on a single address and even facebook.com only needs three.
None of those five addresses are in the address space allocated to GFI, rather three of them are in 126.96.36.199/26 – address space allocated to Matthew Sullivan personally while the other two are in 188.8.131.52/16 – address space assigned to softlayer technologies, most likely colocated servers or virtual servers being rented by GFI. (Note: This Matthew Sullivan, and the Michelle Sullivan who commented on Friday’s post are the same person, the GFI employee who founded SORBS originally and, to the best of my knowledge, still operates it.)
If the web server cluster were under a sustained DDoS I’d expect the five addresses to be attacked in the same way. Yet the symptoms were quite different. The three servers directly controlled by Matthew Sullivan (111.*) were completely unresponsive – no packets were being returned, 100% packet loss. The two softlayer hosted servers (208.*) were responsive at a packet level, accepting connections immediately on port 80 but not returning any results at the http level.
The behaviour of the softlayer servers is hard to explain in a DDoS related way, particularly as it was repeatable from several different networks. If all the apache sessions were occupied but there were still space available in the kernel level accept queue, that would explain the symptoms – but that would require an implausibly careful DDoS.
If the sorbs web application were broken in a way that it hung or crashed even on trivial static page requests then I’d expect the webserver to time out the app and return a 500 “I’m Broken” error, which it didn’t seem to do. That also wouldn’t explain the different behavior of the 111.* servers and the 208.* servers.
If the 208.* servers were pure proxy servers that just tunneled all web requests through to the 111.* servers that would explain what’s seen – a request to the 208.* server is accepted, then forwarded to the 111.* servers, which just hang. This would be an implausibly badly designed network architecture (it adds two servers which do nothing but add bandwidth costs, increase latency and reduce system reliability) but it’s just barely plausible. Given there’s no obvious packet level issues connecting to the 208.* servers, though it would require that the anonymous ddosers don’t bother attacking the 208.* servers as they know they’re “fake” servers and need not be attacked to take www.sorbs.net off the air.
Shouldn’t any competently-run DNSBL have plans in place for handling DDOS attacks? Why is Spamhaus stable but SORBS down relatively often?@delivery_kitty
So what else could explain the differences in behavior between the two sets of servers? The 111.* servers are in network space assigned directly to Matthew Sullivan, and he presumably has full, network level control over them, while the 208.* servers are hosted, so GFI may have a different level of access, perhaps mostly at an application or control panel level.
There is one explanation for the symptoms that explains the odd behaviour seen, and also explains the “elephant in the room” – that the degraded website behavior tends to appear soon after there’s been a rash of false data added to the database.
Or – excuse my wild speculation here – but maybe it’s not actually a DDOS attack against SORBS, but mistakes on the part of the operators?@delivery_kitty
If I were to fake up the symptoms of a DDoS where I had complete control over the network I’d pretend that I was being “packeted to death” by gigabits of traffic, and configure my router to drop all inbound packets. That would simulate reasonably accurately the effects of a massive DDoS, and also be the defensive approach you’d put in to place to defend against a real DDos. it’s exactly the behavior I see from the natively hosted sorbs web servers in 111.*.
If I only had access to the web application level (either because I was running on a hosted server, or if I didn’t have sufficient private control over the server such that I could create packet filtering without notice) the best I could do would be to make the web application hang, and possibly configure the webserver to have an extremely long timeout. That wouldn’t simulate a DDoS particularly well, but it would be good enough to convince anyone who were just using a web browser rather than looking at the lower level traffic. It’s exactly the behavior I see from the softlayer hosted sorbs web servers in 208.*.
Every time when SORBS makes just another mistake, you claim DDoS for either the problem or your inability to fix the mistake. And because you claim this every time, nobody believes you any longer.Hans
I’d be loath to suggest such a theory, even though it’s the most plausible explanation of the symptoms I’ve seen if it weren’t for the reputation sorbs has of having “convenient” DDoS attacks to explain false positive listings (which, as we explained earlier, cannot possibly be caused by any sort of DoS attack). Additionally, even though GFI were claiming to have been under a DDoS since early last week when they loaded millions of false positives into their database, the SORBS webserver had been up and basically functional, if slow. Shortly after a GFI employee – the same employee who has direct control over the 111.* webservers – commented on my blog post on Friday that explained I was looking into data inaccuracy, the “DDoS” symptoms changed to something entirely different, something that prevented me from looking at further SORBS data. Given SORBS history with respect to “DDoS attacks” I’m suspicious of both the timing and the details of the symptoms.
Fortunately, I’d actually gathered most of the data I needed for tomorrows post by Friday, so the “DDoS attack” didn’t really inconvenience me anywhere near as much as it did all the postmasters trying to investigate and resolve SORBS false listings.
the last response I got back … was that the entire /24 block was ‘inelligible’ for de-listing. The parent company sites a DDoS attack as well and says their management team is aware of the issue and working to resolve it ASAP. We’ll see what happens…anonymous commenter
GFI would benefit from some transparency about their processes, SORBS day-to-day operations and details about the mistakes they’re making, how they’re fixing them and how they’re ensuring they’re not repeated again and again. And some explicit details about exactly what sort of “DDoS” they’re seeing might help them gain some credibility. This level of communication isn’t helping with that.