I often find myself having to analyze volumes of email, looking for common factors, source addresses, URLs and so on as part of some “forensics” work, analyzing leaked emails or received spam for use as evidence in a case.
For large volumes of mail where I might want to dig down in a lot of detail or generate graphical or statistical reports I tend to use Abacus to slurp in and analyze all the emails, store them in a SQL database in an easy to handle format and then do the ad-hoc work from a SQL commandline. For smaller work, though, you can get a long way with unix commandline tools and some basic perl scripting.
This morning I received Ukrainian bride spam to a tagged address that I’d only given to one vendor, RedEnvelope, so that address has leaked to criminal spammers from somewhere. Looking at a couple of RedEnvelope’s emails I see they’re sending from a number of sources, so I decided to dig a little deeper.
I started by searching for all emails to that tagged address in my mail client, then copied all the matching emails to a newly created folder. Then I took a copy of that folder and split it into one file per email using a shell one-liner:
formail -ds sh -c 'cat >msg.$FILENO'
I'm interested in the IP address they were sent from, so I write a tiny perl script, getips.pl, that'll look for the first Received line, and print out the IP address:
#!/usr/bin/perl
foreach my $file (@ARGV) {
open IF, $file or die;
while() {
chomp;
last if /^$/;
if(/^Received:.*[(d+.d+.d+.d+)]/) {
print "$1n";
last;
}
}
}
I use it, along with the standard tools "sort" and "uniq" to summarize the sending IP addresses:
./getips.pl msg.* | sort | uniq -c | sort -nr
That takes the list of IP addresses generated by the perl script, then sorts them (so identical IP addresses are adjacent to each other), then counts how many times each discrete email address is found, then sorts them most to fewest. If you want to see how it does that, play around with the command line, removing commands off the end one-by-one to see the intermediate data it produces.
The result looks like this:
27 208.49.63.243
20 208.49.63.242
20 208.49.63.240
15 208.49.63.245
15 208.49.63.241
11 208.49.63.244
3 38.107.108.146
2 38.107.108.149
2 209.112.253.83
1 89.230.132.139
1 38.107.108.144
1 209.112.253.90
1 209.112.253.85
The IP address beginning with 89 is the Ukrainian bride spam itself. Of the remainder it's easy to see that they come from three main groups of addresses - 208.49.63.0/24, 38.107.108.0/24 and 209.112.253.0/24.
The 208 and 209 ranges are both IP space owned by RedEnvelope, so that was mail sent by them directly (both transactional and advertising mail). The 38 range is space owned by the ESP Cheetahmail. (I'll go into how I worked all that out in a future post).
What does this mean? It means that either a russian bride spammer just happened to guess the email address I'd created solely to give to RedEnvelope (incredibly unlikely in this particular case, due to how I handle tagged addresses) or they stole it from somewhere. There are four places they could plausibly have stolen it from:
- My mail client, by compromising my laptop
- My mail server, by compromising the machine or the staff who run it
- RedEnvelope, by compromising their servers, employees or employee machines
- Cheetahmail, by compromising their servers, employees or employee machines
I run my own mailserver, so I know exactly which email addresses it handles. There are many hundreds or thousands of email addresses which it handles, and to which mail has been sent legitimately (as well as countless billions of addresses that it would accept email to if you made them up, but which have never been used). If it had been compromised in any way, I would expect many of those email addresses to be sent spam as part of this spam run (it's not at all unusual for 40 or 50 of my email addresses to receive copies of any given spam). But only the RedEnvelope-specific email address received the Ukrainian bride spam.
Similarly for my laptop. It has hundreds of email addresses in it's mailboxes. If it had been compromised, I'd have expected to see this spam sent to many of those email addresses, and I don't. If someone had stolen multiple email addresses of mine, I'd expect them to be sending spam to all of them, unless they were doing something clever and deceptive like spear-phishing - and Ukrainian bride spam isn't clever, subtle or targeted.
That leaves just RedEnvelope or CheetahMail as likely sources of the stolen address. Conveniently, Laura also has an account with RedEnvelope, and also uses a tagged address with them. She's seen no spam at all to her RedEnvelope-specific address. Doing the same analysis with the legitimate RedEnvelope mail she's received to that address I get this:
2 209.112.253.90
2 209.112.253.85
2 208.49.63.242
2 208.49.63.241
1 209.112.253.83
1 208.49.63.245
There are only three significant differences between Laura's account and mine. Mine was created in June 2010, while hers was created in December. Mine has been emailed via CheetahMail, hers hasn't. And mine received russian bride spam, hers didn't.
One possibility is that RedEnvelope were compromised prior to December, so only my address was taken. But if that were the case I'd have expected to see that address misused before today. It's possible, but not the most likely explanation.
More likely is that CheetahMail were compromised some time in the past few days, and the email address was stolen from there.
This isn't conclusive proof by any means, but if I were RedEnvelope or CheetahMail I'd be looking very closely at other reports of stolen addresses, to see if there are patterns of theft from RedEnvelope lists sent across multiple ESPs or compromises of data from multiple CheetahMail customers.
I believe Cheetah Mail has been compromised. I wrote them about it since I received a v!@gr@ message to an email address tagged to The Childrens Place (a Cheetah client), and Cheetah did not respond. Very unprofessional and rude.
The Childrens Place account at CheetahMail was misused a while back – quite a different issue, most likely, and not a sign of any real security issue at CheetahMail (monitoring and mitigation weaknesses, maybe, but that’s rather a different thing).
We know that CheetahMail has been breached a couple of times over the past year. But why do you think they have been breached again ‘in the last few days’?
Is there anything which suggests that this is not just data stolen, but not used, from a previous attack?
I think this is a very important distinction. It was very serious when CheetahMail were breached possibly twice or more last year; but if they are still being successfully targetted now, then I would have very serious concerns.
Steve H…..I am sure they are still be targeted and will continue to be a future target. Why would you think anything less? Once breached always breached, especially when it was easy in the first place. Common sense says we are all breached in this ‘connected’ world we live in. Be safe by trusting that there is no safe when online.
Steve W: http://www.databreaches.net/?p=17874 is a discussion of The Childrens Place issue. That post does claim that recipients were notified. But I’m not on that list so I have no personal information about that.