Who leaked my address, and when?

Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they’ve leaked to spammers.
I’d really like to know who leaked an email address, and when.
All my inbound mail is sorted into “spam” and “not-spam” by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my “recent spam”. That’s a huge amount of data, though, something like 40,000 pieces of spam a month.
Finding the needle of interesting data in that haystack is going to take some automation. As I’ve mentioned before you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.
I’m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can’t rely on the Date: header, as that’s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email – and it records that in the first Received: header in the message.

#!/usr/bin/perl
use strict;
use Date::Parse;
foreach my $file (@ARGV) {
    open IF, $file or die "Failed to open '$file': $!n";
    my @headers;
    while() {
        s/[rn]//g;
        last if /^$/;
	push @headers, '' unless /^s/;
	$headers[$#headers] .= $_;
    }
    my $date;
    my $timestamp;
    foreach my $header (@headers) {
	if($header =~ /^Received:.*;([^(]+)/) {
	    $date = $1;
	    $timestamp = str2time($1);
	    last;
	}
    }
    # Replace this regex with something that
    # matches your tagged addresses
    if(join(' ', @headers) =~
                 /(foo+[a-z0-9]+@[a-z.-]+)/) {
	print "$timestamp $1 $daten";
    }
}

Dates and times are annoying to work with on the command line, so I also use the perl Date::Parse module to convert the timestamp in the received header into epoch time – the number of seconds since January 1st, 1970. I use some unix commandline magic to run this against my two spam mailboxes and dump the results in a file.

find spamassassin/ | xargs stamp-address.pl >>junk.txt
find junk/ | xargs stamp-address.pl >> junk.txt

The end result is one line per email, with the epoch time, the tagged email address and the original format of the date and time. Something like this:

1300731078 cpan-tag@addr  Mon, 21 Mar 2011 11:11:18 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300732902 unicorn-tag@addr Mon, 21 Mar 2011 11:41:42 -0700

Next, I want to find the first occurrence of each tagged address.

#!/usr/bin/perl
use strict;
my %seen;
while(<>) {
    chomp;
    my ($stamp, $address) = split / /;
    unless(exists $seen{$address}) {
	print "$_n";
	$seen{$address} = 1;
    }
}

I sort the list of addresses numerically, then use this script to display the first time each email address received spam:

sort -n 

That reduces the amount of data enough that I can look at it by hand. What did I find? Several interesting things, but I'm just going to mention one here.

1299111914 casemate-tag@addr Wed, 2 Mar 2011 16:25:14 -0800
1307104954 dell-tag@addr Fri, 3 Jun 2011 05:42:34 -0700
1307104986 codefast-tag@addr Fri, 3 Jun 2011 05:43:06 -0700 

Casemate and Codefast have only ever mailed me via iContact, so given iContact's history it seems likely that those leaks were via iContact.
Dell, on the other hand, have mailed me directly and through several ESPs - and I don't recall them using iContact. Looking at the timestamps (and the content of the spams) it's clear that the Dell and Codefast tagged addresses were both sent spam for the first time as part of the same spamrun - so it's almost certain that they leaked at the same time.
Looking for iContacts bounce domain (icpbounce.com) in my mailbox I do find that Dell used them briefly, on May 4th. So that's pretty compelling evidence that iContact leaked all three addresses. (Which means my previous theory about Dell customer addresses leaking, based on misleading statements from Intervision, was wrong.)
There's another thing that's interesting... iContact has had a history of email breaches. The data I have here (and it's matched by a couple of older data points, if I recall correctly) shows spam being sent to newly leaked addresses on the 2nd or 3rd of the month.
I wonder if iContact does a batch export to a subcontractor, or an offsite backup or something similar on the first of each month?

Related Posts

Another kind of email breach

In all the recent discussions of email address thievery I’ve not seen anyone mention stealing addresses by abusing the legal system. And, yet, there’s at least one ambulance chasing lawyer that’s using email addresses that were never given to him by the recipients. Even worse, when asked about it he said that the courts told him he could use the email address and that we recipients had no recourse.
I’m not sure the spammer is necessarily wrong, but it’s a frustrating situation for both the recipient and the company that had their address list stolen.
A few years ago, law firm of Bursor and Fisher filed a host of class action lawsuits against various wireless carriers, including AT&T. At one point during the AT&T lawsuit the judge ruled that AT&T turn over their customer list, including email addresses, to Bursor and Fisher. Bursor and Fisher were then to send notices to all the AT&T subscribers notifying them of the suit.
This is not unreasonable. Contacting consumers by email to notify them of legal action makes a certain amount of sense.
But then Bursor and Fisher took it a step further. They looked at all these valid email addresses and decided they could use this for their own purposes. They started mailing advertisements to the AT&T wireless list.

Read More

Defending against the hackers of 1995

Passwords are convenient for the end user, but it’s too easy to lose control of them. People share them with other people. People write them down, where they can be read. People send them in email, and that email is easily intercepted. People’s web browsers store the passwords, so they can log in automatically. Worst of all, perhaps, people tend to use the same username and password at many different websites. If just one of those websites is compromised (or even run as a password collecting scam) then those passwords can be used to attack accounts at all of the others.
Two factor authentication that uses an uncopyable physical device (such as a cellphone or a security token) as a second factor mitigates most of these threats very effectively. Weaker two factor authentication using digital certificates is a little easier to misuse (as the user can share the certificate with others, or have it copied without them noticing) but still a lot better than a password.
Security problems solved, then?

Read More

Analysing a data breach – CheetahMail

I often find myself having to analyze volumes of email, looking for common factors, source addresses, URLs and so on as part of some “forensics” work, analyzing leaked emails or received spam for use as evidence in a case.
For large volumes of mail where I might want to dig down in a lot of detail or generate graphical or statistical reports I tend to use Abacus to slurp in and analyze all the emails, store them in a SQL database in an easy to handle format and then do the ad-hoc work from a SQL commandline. For smaller work, though, you can get a long way with unix commandline tools and some basic perl scripting.
This morning I received Ukrainian bride spam to a tagged address that I’d only given to one vendor, RedEnvelope, so that address has leaked to criminal spammers from somewhere. Looking at a couple of RedEnvelope’s emails I see they’re sending from a number of sources, so I decided to dig a little deeper.
I started by searching for all emails to that tagged address in my mail client, then copied all the matching emails to a newly created folder. Then I took a copy of that folder and split it into one file per email using a shell one-liner:

Read More