Who leaked my address, and when?

Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they’ve leaked to spammers.
I’d really like to know who leaked an email address, and when.
All my inbound mail is sorted into “spam” and “not-spam” by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my “recent spam”. That’s a huge amount of data, though, something like 40,000 pieces of spam a month.
Finding the needle of interesting data in that haystack is going to take some automation. As I’ve mentioned before you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.
I’m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can’t rely on the Date: header, as that’s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email – and it records that in the first Received: header in the message.

#!/usr/bin/perl
use strict;
use Date::Parse;
foreach my $file (@ARGV) {
    open IF, $file or die "Failed to open '$file': $!n";
    my @headers;
    while() {
        s/[rn]//g;
        last if /^$/;
	push @headers, '' unless /^s/;
	$headers[$#headers] .= $_;
    }
    my $date;
    my $timestamp;
    foreach my $header (@headers) {
	if($header =~ /^Received:.*;([^(]+)/) {
	    $date = $1;
	    $timestamp = str2time($1);
	    last;
	}
    }
    # Replace this regex with something that
    # matches your tagged addresses
    if(join(' ', @headers) =~
                 /(foo+[a-z0-9]+@[a-z.-]+)/) {
	print "$timestamp $1 $daten";
    }
}

Dates and times are annoying to work with on the command line, so I also use the perl Date::Parse module to convert the timestamp in the received header into epoch time – the number of seconds since January 1st, 1970. I use some unix commandline magic to run this against my two spam mailboxes and dump the results in a file.

find spamassassin/ | xargs stamp-address.pl >>junk.txt
find junk/ | xargs stamp-address.pl >> junk.txt

The end result is one line per email, with the epoch time, the tagged email address and the original format of the date and time. Something like this:

1300731078 cpan-tag@addr  Mon, 21 Mar 2011 11:11:18 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300732902 unicorn-tag@addr Mon, 21 Mar 2011 11:41:42 -0700

Next, I want to find the first occurrence of each tagged address.

#!/usr/bin/perl
use strict;
my %seen;
while(<>) {
    chomp;
    my ($stamp, $address) = split / /;
    unless(exists $seen{$address}) {
	print "$_n";
	$seen{$address} = 1;
    }
}

I sort the list of addresses numerically, then use this script to display the first time each email address received spam:

sort -n 

That reduces the amount of data enough that I can look at it by hand. What did I find? Several interesting things, but I'm just going to mention one here.

1299111914 casemate-tag@addr Wed, 2 Mar 2011 16:25:14 -0800
1307104954 dell-tag@addr Fri, 3 Jun 2011 05:42:34 -0700
1307104986 codefast-tag@addr Fri, 3 Jun 2011 05:43:06 -0700 

Casemate and Codefast have only ever mailed me via iContact, so given iContact's history it seems likely that those leaks were via iContact.
Dell, on the other hand, have mailed me directly and through several ESPs - and I don't recall them using iContact. Looking at the timestamps (and the content of the spams) it's clear that the Dell and Codefast tagged addresses were both sent spam for the first time as part of the same spamrun - so it's almost certain that they leaked at the same time.
Looking for iContacts bounce domain (icpbounce.com) in my mailbox I do find that Dell used them briefly, on May 4th. So that's pretty compelling evidence that iContact leaked all three addresses. (Which means my previous theory about Dell customer addresses leaking, based on misleading statements from Intervision, was wrong.)
There's another thing that's interesting... iContact has had a history of email breaches. The data I have here (and it's matched by a couple of older data points, if I recall correctly) shows spam being sent to newly leaked addresses on the 2nd or 3rd of the month.
I wonder if iContact does a batch export to a subcontractor, or an offsite backup or something similar on the first of each month?

Related Posts

More security problems

I know a lot of people are putting all their eggs in the 2 factor authentication (2FA) basket as a solution to the recent breaches. Earlier this year, however, RSA had their internal systems breached and unknown data was stolen. Speculation from a lot of sources is that the information stolen from RSA by the attackers could be used to infiltrate systems protected by 2FA.
Today I, Cringely reports that a very large U.S. defense contractor may have been breached despite protection by SecurID. Anyone who has been around folks that work for defense contractors, or even just people with security clearances, knows that security and secrecy becomes second nature. They are naturally suspicious and careful, particularly when interacting with secure systems.
What should really concern anyone thinking about implementing security is that the defense contractor’s security folks implemented extra security after the RSA breach, but someone still managed to infiltrate their systems.
Whatever happens with RSA and the defense department, it’s pretty clear that 2FA is not a panacea. And even when we’re talking about security experts, including defense contractors and RSA, hackers can still get into their systems.
Many of the compromises start with spam linking to payloads. In fact, just last night another email expert had their gmail account compromised, resulting in virus being sent to multiple mailing lists and individuals. Some of the compromises happen through Facebook with links that fool people who should know better.
Security is critical for everything on the internet. But recently the attackers seem to be gaining the upper hand over the defenders. When even the experts are compromised, what chance does the average user have?
UPDATE: Reuters reports that the defense contractor was Lockheed.

Read More

Multipart MIME cheat sheet

I’ve had a couple of people ask me about MIME structure recently, especially how you create multipart messages, when you should use them and which variant of multipart you use for different things. (And I’m working on a MIME parser / generator for Abacus at the moment, so it’s all fresh in my mind)
So I’ve put together a quick cheat sheet, showing the structure of four common types of email, and how their MIME structure looks.

Read More

Real. Or. Phish?

After Epsilon lost a bunch of customer lists last week, I’ve been keeping an eye open to see if any of the vendors I work with had any of my email addresses stolen – not least because it’ll be interesting to see where this data ends up.
Yesterday I got mail from Marriott, telling me that “unauthorized third party gained access to a number of Epsilon’s accounts including Marriott’s email list.”. Great! Lets start looking for spam to my Marriott tagged address, or for phishing targeted at Marriott customers.
I hit what looks like paydirt this morning. Plausible looking mail with Marriott branding, nothing specific to me other than name and (tagged) email address.
It’s time to play Real. Or. Phish?
1. Branding and spelling is all good. It’s using decent stock photos, and what looks like a real Marriott logo.
All very easy to fake, but if it’s a phish it’s pretty well done. Then again, phishes often steal real content and just change out the links.
Conclusion? Real. Maybe.
2. The mail wasn’t sent from marriott.com, or any domain related to it. Instead, it came from “Marriott@marriott-email.com”.
This is classic phish behaviour – using a lookalike domain such as “paypal-billing.com” or “aolsecurity.com” so as to look as though you’re associated with a company, yet to be able to use a domain name you have full control of, so as to be able to host websites, receive email, sign with DKIM, all that sort of thing.
Conclusion? Phish.
3. SPF pass
Given that the mail was sent “from” marriott-email.com, and not from marriott.com, this is pretty meaningless. But it did pass an SPF check.
Conclusion? Neutral.
4. DKIM fail
Authentication-Results: m.wordtothewise.com; dkim=fail (verification failed; insecure key) header.i=@marriott-email.com;
As the mail was sent “from” marriott-email.com it should have been possible for the owner of that domain (presumably the phisher) to sign it with DKIM. That they didn’t isn’t a good sign at all.
Conclusion? Phish.
5. Badly obfuscated headers
From: =?iso-8859-1?B?TWFycmlvdHQgUmV3YXJkcw==?= <Marriott@marriott-email.com>
Subject: =?iso-8859-1?B?WW91ciBBY2NvdW50IJYgVXAgdG8gJDEwMCBjb3Vwb24=?=

Base 64 encoding of headers is an old spammer trick used to make them more difficult for naive spam filters to handle. That doesn’t work well with more modern spam filters, but spammers and phishers still tend to do it so as to make it harder for abuse desks to read the content of phishes forwarded to them with complaints. There’s no legitimate reason to encode plain ascii fields in this way. Spamassassin didn’t like the message because of this.
Conclusion? Phish.
6. Well-crafted multipart/alternative mail, with valid, well-encoded (quoted-printable) plain text and html parts
Just like the branding and spelling, this is very well done for a phish. But again, it’s commonly something that’s stolen from legitimate email and modified slightly.
Conclusion? Real, probably.
7. Typical content links in the email
Most of the content links in the email are to things like “http://marriott-email.com/16433acf1layfousiaey2oniaaaaaalfqkc4qmz76deyaaaaa”, which is consistent with the from address, at least. This isn’t the sort of URL a real company website tends to use, but it’s not that unusual for click tracking software to do something like this.
Conclusion? Neutral
8. Atypical content links in the email
We also have other links:

Read More