Analysing a data breach – CheetahMail

I often find myself having to analyze volumes of email, looking for common factors, source addresses, URLs and so on as part of some “forensics” work, analyzing leaked emails or received spam for use as evidence in a case.
For large volumes of mail where I might want to dig down in a lot of detail or generate graphical or statistical reports I tend to use Abacus to slurp in and analyze all the emails, store them in a SQL database in an easy to handle format and then do the ad-hoc work from a SQL commandline. For smaller work, though, you can get a long way with unix commandline tools and some basic perl scripting.
This morning I received Ukrainian bride spam to a tagged address that I’d only given to one vendor, RedEnvelope, so that address has leaked to criminal spammers from somewhere. Looking at a couple of RedEnvelope’s emails I see they’re sending from a number of sources, so I decided to dig a little deeper.
I started by searching for all emails to that tagged address in my mail client, then copied all the matching emails to a newly created folder. Then I took a copy of that folder and split it into one file per email using a shell one-liner:

formail -ds sh -c 'cat >msg.$FILENO' 

I'm interested in the IP address they were sent from, so I write a tiny perl script, getips.pl, that'll look for the first Received line, and print out the IP address:

#!/usr/bin/perl
foreach my $file (@ARGV) {
    open IF, $file or die;
    while() {
        chomp;
        last if /^$/;
        if(/^Received:.*[(d+.d+.d+.d+)]/) {
            print "$1n";
            last;
        }
    }
}

I use it, along with the standard tools "sort" and "uniq" to summarize the sending IP addresses:

./getips.pl msg.* | sort | uniq -c | sort -nr

That takes the list of IP addresses generated by the perl script, then sorts them (so identical IP addresses are adjacent to each other), then counts how many times each discrete email address is found, then sorts them most to fewest. If you want to see how it does that, play around with the command line, removing commands off the end one-by-one to see the intermediate data it produces.
The result looks like this:

  27 208.49.63.243
  20 208.49.63.242
  20 208.49.63.240
  15 208.49.63.245
  15 208.49.63.241
  11 208.49.63.244
   3 38.107.108.146
   2 38.107.108.149
   2 209.112.253.83
   1 89.230.132.139
   1 38.107.108.144
   1 209.112.253.90
   1 209.112.253.85

The IP address beginning with 89 is the Ukrainian bride spam itself. Of the remainder it's easy to see that they come from three main groups of addresses - 208.49.63.0/24, 38.107.108.0/24 and 209.112.253.0/24.
The 208 and 209 ranges are both IP space owned by RedEnvelope, so that was mail sent by them directly (both transactional and advertising mail). The 38 range is space owned by the ESP Cheetahmail. (I'll go into how I worked all that out in a future post).
What does this mean? It means that either a russian bride spammer just happened to guess the email address I'd created solely to give to RedEnvelope (incredibly unlikely in this particular case, due to how I handle tagged addresses) or they stole it from somewhere. There are four places they could plausibly have stolen it from:

  1. My mail client, by compromising my laptop
  2. My mail server, by compromising the machine or the staff who run it
  3. RedEnvelope, by compromising their servers, employees or employee machines
  4. Cheetahmail, by compromising their servers, employees or employee machines

I run my own mailserver, so I know exactly which email addresses it handles. There are many hundreds or thousands of email addresses which it handles, and to which mail has been sent legitimately (as well as countless billions of addresses that it would accept email to if you made them up, but which have never been used). If it had been compromised in any way, I would expect many of those email addresses to be sent spam as part of this spam run (it's not at all unusual for 40 or 50 of my email addresses to receive copies of any given spam). But only the RedEnvelope-specific email address received the Ukrainian bride spam.
Similarly for my laptop. It has hundreds of email addresses in it's mailboxes. If it had been compromised, I'd have expected to see this spam sent to many of those email addresses, and I don't. If someone had stolen multiple email addresses of mine, I'd expect them to be sending spam to all of them, unless they were doing something clever and deceptive like spear-phishing - and Ukrainian bride spam isn't clever, subtle or targeted.
That leaves just RedEnvelope or CheetahMail as likely sources of the stolen address. Conveniently, Laura also has an account with RedEnvelope, and also uses a tagged address with them. She's seen no spam at all to her RedEnvelope-specific address. Doing the same analysis with the legitimate RedEnvelope mail she's received to that address I get this:

   2 209.112.253.90
   2 209.112.253.85
   2 208.49.63.242
   2 208.49.63.241
   1 209.112.253.83
   1 208.49.63.245

There are only three significant differences between Laura's account and mine. Mine was created in June 2010, while hers was created in December. Mine has been emailed via CheetahMail, hers hasn't. And mine received russian bride spam, hers didn't.
One possibility is that RedEnvelope were compromised prior to December, so only my address was taken. But if that were the case I'd have expected to see that address misused before today. It's possible, but not the most likely explanation.
More likely is that CheetahMail were compromised some time in the past few days, and the email address was stolen from there.
This isn't conclusive proof by any means, but if I were RedEnvelope or CheetahMail I'd be looking very closely at other reports of stolen addresses, to see if there are patterns of theft from RedEnvelope lists sent across multiple ESPs or compromises of data from multiple CheetahMail customers.
 

Related Posts

Real. Or. Phish?

After Epsilon lost a bunch of customer lists last week, I’ve been keeping an eye open to see if any of the vendors I work with had any of my email addresses stolen – not least because it’ll be interesting to see where this data ends up.
Yesterday I got mail from Marriott, telling me that “unauthorized third party gained access to a number of Epsilon’s accounts including Marriott’s email list.”. Great! Lets start looking for spam to my Marriott tagged address, or for phishing targeted at Marriott customers.
I hit what looks like paydirt this morning. Plausible looking mail with Marriott branding, nothing specific to me other than name and (tagged) email address.
It’s time to play Real. Or. Phish?
1. Branding and spelling is all good. It’s using decent stock photos, and what looks like a real Marriott logo.
All very easy to fake, but if it’s a phish it’s pretty well done. Then again, phishes often steal real content and just change out the links.
Conclusion? Real. Maybe.
2. The mail wasn’t sent from marriott.com, or any domain related to it. Instead, it came from “Marriott@marriott-email.com”.
This is classic phish behaviour – using a lookalike domain such as “paypal-billing.com” or “aolsecurity.com” so as to look as though you’re associated with a company, yet to be able to use a domain name you have full control of, so as to be able to host websites, receive email, sign with DKIM, all that sort of thing.
Conclusion? Phish.
3. SPF pass
Given that the mail was sent “from” marriott-email.com, and not from marriott.com, this is pretty meaningless. But it did pass an SPF check.
Conclusion? Neutral.
4. DKIM fail
Authentication-Results: m.wordtothewise.com; dkim=fail (verification failed; insecure key) header.i=@marriott-email.com;
As the mail was sent “from” marriott-email.com it should have been possible for the owner of that domain (presumably the phisher) to sign it with DKIM. That they didn’t isn’t a good sign at all.
Conclusion? Phish.
5. Badly obfuscated headers
From: =?iso-8859-1?B?TWFycmlvdHQgUmV3YXJkcw==?= <Marriott@marriott-email.com>
Subject: =?iso-8859-1?B?WW91ciBBY2NvdW50IJYgVXAgdG8gJDEwMCBjb3Vwb24=?=

Base 64 encoding of headers is an old spammer trick used to make them more difficult for naive spam filters to handle. That doesn’t work well with more modern spam filters, but spammers and phishers still tend to do it so as to make it harder for abuse desks to read the content of phishes forwarded to them with complaints. There’s no legitimate reason to encode plain ascii fields in this way. Spamassassin didn’t like the message because of this.
Conclusion? Phish.
6. Well-crafted multipart/alternative mail, with valid, well-encoded (quoted-printable) plain text and html parts
Just like the branding and spelling, this is very well done for a phish. But again, it’s commonly something that’s stolen from legitimate email and modified slightly.
Conclusion? Real, probably.
7. Typical content links in the email
Most of the content links in the email are to things like “http://marriott-email.com/16433acf1layfousiaey2oniaaaaaalfqkc4qmz76deyaaaaa”, which is consistent with the from address, at least. This isn’t the sort of URL a real company website tends to use, but it’s not that unusual for click tracking software to do something like this.
Conclusion? Neutral
8. Atypical content links in the email
We also have other links:

Read More

Clicktracking 2: Electric Boogaloo

A week or so back I talked about clicktracking links, and how to put them together to avoid abuse and blocking issues.
Since then I’ve come across another issue with click tracking links that’s not terribly obvious, and that you’re not that likely to come across, but if you do get hit by it could be very painful – phishing and malware filters in web browsers.
Visting this site may harm your computer
First, some background about how a lot of malware is distributed, what’s known as “drive-by malware”. This is where the hostile code infects the victims machine without them taking any action to download and run it, rather they just visit a hostile website and that website silently infects their computer.
The malware authors get people to visit the hostile website in quite a few different ways – email spam, blog comment spam, web forum spam, banner ads purchased on legitimate websites and compromised legitimate websites, amongst others.
That last one, compromised legitimate websites, is the type we’re interested in. The sites compromised aren’t usually a single, high-profile website. Rather, they tend to be a whole bunch of websites that are running some vulnerable web application – if there’s a security flaw in, for example, WordPress blog software then a malware author can compromise thousands of little blog sites, and embed malware code in each of them. Anyone visiting any of those sites risks being infected, and becoming part of a botnet.
Because the vulnerable websites are all compromised mechanically in the same way, the URLs of the infected pages tend to look much the same, just with different hostnames – http://example.com/foo/bar/baz.html, http://www.somewhereelse.invalid/foo/bar/baz.html and http://a.net/foo/bar/baz.html – and they serve up just the same malware (or, just as often, redirect the user to a site in russia or china that serves up the malware that infects their machine).
A malware filter operator might receive a report about http://example.com/foo/bar/baz.html and decide that it was infected with malware, adding example.com to a blacklist. A smart filter operator might decide that this might be just one example of a widespread compromise, and go looking for the same malware elsewhere. If it goes to http//a.net/foo/bar/baz.html and finds the exact same content, it’ll know that that’s another instance of the infection, and add a.net to the blacklist.
What does this have to do with clickthrough links?
Well, an obvious way to implement clickthrough links is to use a custom hostname for each customer (“click.customer.com“), and have all those pointing at a single clickthrough webserver. It’s tedious to setup the webserver to respond to each hostname as you add a new customer, though, so you decide to have the webserver ignore the hostname. That’ll work fine – if you have customer1 using a clickthrough link like http://click.customer1.com/123/456/789.html you’d have the webserver ignore “click.customer1.com” and just read the information it needs from “123/456/789.html” and send the redirect.
But that means that if you also have customer2, using the hostname click.customer2.com, then the URL http://click.customer2.com/123/456/789.html it will redirect to customer1’s content.
If a malware filter decides that http://click.customer1.com/123/456/789.html redirects to a phishing site or a malware download – either due to a false report, or due to the customers page actually being infected – then they’ll add click.customer1.com to their blacklist, meaning no http://click.customer1.com/ URLs will work. So far, this isn’t a big problem.
But if they then go and check http://click.customer2.com/123/456/789.html and find the same redirect, they’ll blacklist click.customer2.com, and so on for all the clickthrough hostnames of yours they know about. That’ll cause any click on any URL in any email a lot of your customers send out to go to a “This site may harm your computer!” warning – which will end up a nightmare even if you spot the problem and get the filter operators to remove all those hostnames from the blacklist within a few hours or a day.
Don’t let this happen to you. Make sure your clickthrough webserver pays attention to the hostname as well as the path of the URL.
Use different hostnames for different customers clickthrough links. And if you pick a link from mail sent by Customer A, and change the hostname of that link to the clickthrough hostname of Customer B, then that link should fail with an error rather than displaying Customer A’s content.

Read More

What is Two Factor Authentication?

Two factor authentication, or the snappy acronym 2FA, is something that you’re going to be hearing a lot about over the next year or so, both for use by ESP employees (in an attempt to reduce the risks of data theft) and by ESP customers (attempting to reduce the chance of an account being misused to send spam). What is Authentication?
In computer security terms authentication is proving who you are – when you enter a username and a password to access your email account you’re authenticating yourself to the system using a password that only you know.
Authentication (“who you are”) is the most visible part of computer access control, but it’s usually combined with two other A’s – authorization (“what you are allowed to do”) and accounting (“who did what”) to form an access control system.
And what are the two factors?
Two factor authentication means using two independent sources of evidence to demonstrate who you are. The idea behind it is that it means an attacker need to steal two quite different bits of information, with different weaknesses and attack vectors, in order to gain access. This makes the attack scenario much more complex and difficult for an attacker to carry out.
It’s important that the different factors are independent – requiring two passwords doesn’t count as 2FA, as an attack that can get the first password can just as easily get the second password. Generally 2FA requires the user to demonstrate their identity via two out of three broad ways:

Read More