Who leaked my address, and when?

Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they’ve leaked to spammers.
I’d really like to know who leaked an email address, and when.
All my inbound mail is sorted into “spam” and “not-spam” by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my “recent spam”. That’s a huge amount of data, though, something like 40,000 pieces of spam a month.
Finding the needle of interesting data in that haystack is going to take some automation. As I’ve mentioned before you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.
I’m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can’t rely on the Date: header, as that’s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email – and it records that in the first Received: header in the message.

#!/usr/bin/perl
use strict;
use Date::Parse;
foreach my $file (@ARGV) {
    open IF, $file or die "Failed to open '$file': $!n";
    my @headers;
    while() {
        s/[rn]//g;
        last if /^$/;
	push @headers, '' unless /^s/;
	$headers[$#headers] .= $_;
    }
    my $date;
    my $timestamp;
    foreach my $header (@headers) {
	if($header =~ /^Received:.*;([^(]+)/) {
	    $date = $1;
	    $timestamp = str2time($1);
	    last;
	}
    }
    # Replace this regex with something that
    # matches your tagged addresses
    if(join(' ', @headers) =~
                 /(foo+[a-z0-9]+@[a-z.-]+)/) {
	print "$timestamp $1 $daten";
    }
}

Dates and times are annoying to work with on the command line, so I also use the perl Date::Parse module to convert the timestamp in the received header into epoch time – the number of seconds since January 1st, 1970. I use some unix commandline magic to run this against my two spam mailboxes and dump the results in a file.

find spamassassin/ | xargs stamp-address.pl >>junk.txt
find junk/ | xargs stamp-address.pl >> junk.txt

The end result is one line per email, with the epoch time, the tagged email address and the original format of the date and time. Something like this:

1300731078 cpan-tag@addr  Mon, 21 Mar 2011 11:11:18 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300732902 unicorn-tag@addr Mon, 21 Mar 2011 11:41:42 -0700

Next, I want to find the first occurrence of each tagged address.

#!/usr/bin/perl
use strict;
my %seen;
while(<>) {
    chomp;
    my ($stamp, $address) = split / /;
    unless(exists $seen{$address}) {
	print "$_n";
	$seen{$address} = 1;
    }
}

I sort the list of addresses numerically, then use this script to display the first time each email address received spam:

sort -n 

That reduces the amount of data enough that I can look at it by hand. What did I find? Several interesting things, but I'm just going to mention one here.

1299111914 casemate-tag@addr Wed, 2 Mar 2011 16:25:14 -0800
1307104954 dell-tag@addr Fri, 3 Jun 2011 05:42:34 -0700
1307104986 codefast-tag@addr Fri, 3 Jun 2011 05:43:06 -0700 

Casemate and Codefast have only ever mailed me via iContact, so given iContact's history it seems likely that those leaks were via iContact.
Dell, on the other hand, have mailed me directly and through several ESPs - and I don't recall them using iContact. Looking at the timestamps (and the content of the spams) it's clear that the Dell and Codefast tagged addresses were both sent spam for the first time as part of the same spamrun - so it's almost certain that they leaked at the same time.
Looking for iContacts bounce domain (icpbounce.com) in my mailbox I do find that Dell used them briefly, on May 4th. So that's pretty compelling evidence that iContact leaked all three addresses. (Which means my previous theory about Dell customer addresses leaking, based on misleading statements from Intervision, was wrong.)
There's another thing that's interesting... iContact has had a history of email breaches. The data I have here (and it's matched by a couple of older data points, if I recall correctly) shows spam being sent to newly leaked addresses on the 2nd or 3rd of the month.
I wonder if iContact does a batch export to a subcontractor, or an offsite backup or something similar on the first of each month?

Related Posts

Real. Or. Phish?

After Epsilon lost a bunch of customer lists last week, I’ve been keeping an eye open to see if any of the vendors I work with had any of my email addresses stolen – not least because it’ll be interesting to see where this data ends up.
Yesterday I got mail from Marriott, telling me that “unauthorized third party gained access to a number of Epsilon’s accounts including Marriott’s email list.”. Great! Lets start looking for spam to my Marriott tagged address, or for phishing targeted at Marriott customers.
I hit what looks like paydirt this morning. Plausible looking mail with Marriott branding, nothing specific to me other than name and (tagged) email address.
It’s time to play Real. Or. Phish?
1. Branding and spelling is all good. It’s using decent stock photos, and what looks like a real Marriott logo.
All very easy to fake, but if it’s a phish it’s pretty well done. Then again, phishes often steal real content and just change out the links.
Conclusion? Real. Maybe.
2. The mail wasn’t sent from marriott.com, or any domain related to it. Instead, it came from “Marriott@marriott-email.com”.
This is classic phish behaviour – using a lookalike domain such as “paypal-billing.com” or “aolsecurity.com” so as to look as though you’re associated with a company, yet to be able to use a domain name you have full control of, so as to be able to host websites, receive email, sign with DKIM, all that sort of thing.
Conclusion? Phish.
3. SPF pass
Given that the mail was sent “from” marriott-email.com, and not from marriott.com, this is pretty meaningless. But it did pass an SPF check.
Conclusion? Neutral.
4. DKIM fail
Authentication-Results: m.wordtothewise.com; dkim=fail (verification failed; insecure key) header.i=@marriott-email.com;
As the mail was sent “from” marriott-email.com it should have been possible for the owner of that domain (presumably the phisher) to sign it with DKIM. That they didn’t isn’t a good sign at all.
Conclusion? Phish.
5. Badly obfuscated headers
From: =?iso-8859-1?B?TWFycmlvdHQgUmV3YXJkcw==?= <Marriott@marriott-email.com>
Subject: =?iso-8859-1?B?WW91ciBBY2NvdW50IJYgVXAgdG8gJDEwMCBjb3Vwb24=?=

Base 64 encoding of headers is an old spammer trick used to make them more difficult for naive spam filters to handle. That doesn’t work well with more modern spam filters, but spammers and phishers still tend to do it so as to make it harder for abuse desks to read the content of phishes forwarded to them with complaints. There’s no legitimate reason to encode plain ascii fields in this way. Spamassassin didn’t like the message because of this.
Conclusion? Phish.
6. Well-crafted multipart/alternative mail, with valid, well-encoded (quoted-printable) plain text and html parts
Just like the branding and spelling, this is very well done for a phish. But again, it’s commonly something that’s stolen from legitimate email and modified slightly.
Conclusion? Real, probably.
7. Typical content links in the email
Most of the content links in the email are to things like “http://marriott-email.com/16433acf1layfousiaey2oniaaaaaalfqkc4qmz76deyaaaaa”, which is consistent with the from address, at least. This isn’t the sort of URL a real company website tends to use, but it’s not that unusual for click tracking software to do something like this.
Conclusion? Neutral
8. Atypical content links in the email
We also have other links:

Read More

Defending against the hackers of 1995

Passwords are convenient for the end user, but it’s too easy to lose control of them. People share them with other people. People write them down, where they can be read. People send them in email, and that email is easily intercepted. People’s web browsers store the passwords, so they can log in automatically. Worst of all, perhaps, people tend to use the same username and password at many different websites. If just one of those websites is compromised (or even run as a password collecting scam) then those passwords can be used to attack accounts at all of the others.
Two factor authentication that uses an uncopyable physical device (such as a cellphone or a security token) as a second factor mitigates most of these threats very effectively. Weaker two factor authentication using digital certificates is a little easier to misuse (as the user can share the certificate with others, or have it copied without them noticing) but still a lot better than a password.
Security problems solved, then?

Read More

Yes, we have no IP addresses, we have no addresses today

We’ve just about run out of the Internet equivalent of a natural resource – IP addresses.

Read More