Clicktracking link abuse

C

If you use redirection links in the emails you send out, where a click on the link goes to your server – so you can record that someone clicked – before redirecting to the real destination, then you’ve probably already thought about how they can be abused.
Redirection links are simple in concept – you include a link that points to your webserver in email that you send out, then when recipients click on it they end up at your webserver. Instead of displaying a page, though, your webserver sends what’s called a “302 redirect” to send the recipients web browser on to the real destination. How does your webserver know where to redirect to? There are several different ways, with different tradeoffs:

The simplest approach
The simplest sort of redirection link includes the final destination in the link itself – something like http://click.example.com/cnn.com/WORLD/. The webserver at click.example.com would simply strip off the first part of the link, and redirect to the remainder – cnn.com/WORLD/.
This is nice, because it’s fairly transparent to the recipient – when they hover over the link in their mail client or webmail it’ll be fairly clear where it’s going.
But it has several limitations. One is that you can’t really record very much data about the click – you know where it was redirecting to, but almost nothing else.
The bigger problem is that it’s very easy for a spammer to abuse – they can send out spam that has the link http://click.example.com/onlinepharmacy.ru/order.html, to hide their real link from spam filters, and your webserver will happily redirect recipients to go there. Or, worse, that can be used to redirect to a website hosting viruses. That can cause all sorts of problems for your reputation, up to and including having your redirection webserver blacklisted by antivirus and antiphishing organizations, meaning it’ll be blocked by many web browsers.
Add some metadata
Some of the things you might want to be able to record about a click would be which customers mail it was found in, which mailing campaign and which recipient it was sent to. This would let you do more sensible reporting and click-tracking, and also let you spot when a link is misused in some way (for example, thousands of clicks on a url that was sent to just one recipient).
That might look like http://click.example.com/123/456/789/cnn.com/WORLD/. Your webserver would strip off the first four parts, recording a click for customer 123, campaign 456 and recipient 789, then redirect to the remainder – cnn.com/WORLD/
This lets you do better reporting and is still fairly transparent to the recipient, but can still be abused in the same way.
Use a database
If you stored every link you wanted to redirect to in  a database you could simply store a unique key for each link – so you might record that key 2718 means http://cnn.com/WORLD/. Then the redirection URL might look like http://click.example.com/123/456/789/2718
This lets you do good reporting and is much more difficult for spammers to abuse (but not impossible – if the spammer signs up for a free or demo account on your system, then sends a test email to themselves, they can then reuse the links that they received in that mail).
But it’s fairly opaque to the recipient – they have no idea where the link will go. And it requires maintaining a database of every link you’ve ever used, for as long as it’s valuable (which could easily be several years if a recipient goes back to an old newsletter) and requires a database lookup for every click – which adds a fair bit of infrastructure you need to keep working 24/7 just to make links work.
Use a database and a cosmetic link
You could take the database format and add the final destination link on the end – like this http://click.example.com/123/456/789/2718/cnn.com/WORLD/ – and then just ignore everything after the url key (2718). That’ll work exactly the same way, but the final destination will be fairly transparent to the recipient.
This still can’t be abused by spammers, as if they try to use http://click.example.com/123/456/789/2718/mypharmacy.ru, it’ll still just redirect to http://cnn.com/WORLD/ as the only meaningful bit of the redirection link is the “2718“.
Cryptographically sign your links
A different approach is to record all the information you need in the link and to also add a cryptographic signature to prevent people from misusing it. This is much simpler than the word “cryptography” suggests, you just need to use a magic word (we’ll use “albatross”) and know about the md5() function.
You start off with the same destination string we used in Add some metadata – “/123/456/789/cnn.com/WORLD/“. Then you add the magic word on the end, to give “/123/456/789/cnn.com/WORLD/albatross“, and take the md5 “hash” of that. That’s some cryptographic black magic that’ll give you a string of letters and numbers that’s a “fingerprint” of that string. It’ll look something like “609a78b941bdf9f045cadcfa2e09d54c“. Then you combine that with the destination string to look like this:
http://click.example.com/609a78b941bdf9f045cadcfa2e09d54c/123/456/789/cnn.com/WORLD/
Then, when your webserver sees this link it splits it into the hash (609a78b941bdf9f045cadcfa2e09d54c) and destination string (/123/456/789/cnn.com/WORLD/). It then does exactly the same thing you did when you created the link – appends the magic word to the destination string to give “/123/456/789/cnn.com/WORLD/albatross” and takes the md5 hash of that string. If the result of that matches the hash in the link, it knows it’s a valid redirection link and it can record the click-tracking data and forward to the destination link. If the result doesn’t match it knows that the link has been tampered with, and can return an error page.
To generate the link in PHP would be something like this:

This is much cheaper to generate and validate than using a database, even a typical in-memory database.
Which to use?
Don’t use the simple approach – it’ll get abuse sooner or later and you’ll regret it. Any of the database or cryptographic approaches work just fine, though the cryptographic approach may be easier to scale up and maintain. The database approaches make it easier to disable a link, or direct it to somewhere else at a later point, in case of abuse or some other need.
What else is it good for?
You can use the same sort of approach to validate unsubscription links and VERP return paths for bounce handling. And “open tracking” using these sort of links for image URLs, if you find that a useful metric to offer.

About the author

7 comments

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Steve, excellent post! I have a few tidbits to add, if you don’t mind.
    From a security stand point, I do not think it is a good idea to expose internal ids if you can avoid it (eg 123/456/789).
    The strategy that I have employed in the past is to generate a randomish key for each campaign which is used to encrypt any bit of information I would like for the broadcast, including CGI params in the URL redirection. This means you do not have to do any sort of md5 checksum comparision… you simply decrypt your values with your campaign specifiic key.
    To avoid unnecessary db hits it is fairly trivial to add a caching layer (memcached, temporary files, berkeleydb, etc.) for retrieving the key necessary to decrypt the link values.
    Then to continue along the db light approach, log the redirection in a non-db manner and then have background processes load those clicks in the db.
    Anyway, this process has worked well for me :).

  • Thanks!
    I don’t think it’s really a security issue, as the internal ids aren’t externally meaningful, other than having a 1:1 relationship to the customer, and the only way to avoid that is to remove that 1:1 relationship, which I’d rather not do.
    I would encode them, but to increase the data density (base 62 rather than base 10, for example) rather than to hide that 1:1 mapping. Or to hide the fact that I only had half a dozen customers, maybe (int encryptCustomerId(x) { return x + 1492; }) 🙂
    I like it when recipients, and the folks at their ISPs who read mail logs, can look at a dozen message-ids or verp strings and get the idea that these are all the same customer – it’s transparency rather than information leakage, and I prefer to be in a situation where it doesn’t benefit me to hide that sort of thing from recipients.
    I also like to have the VERP and unsub strings be somewhat human parseable, so that support and abuse folks can, by eye, see that consistent customer identifier rather than it being an opaque blob that needs software to crack it into something meaningful. For the abuse desk that makes it very easy to recognize that it’s mail about one of your top five problem children even without any ticketing automation, and it can make ad-hoc reporting easier too.
    I’ve worked with a bunch of ESPs to do data analysis, log file analysis, email leak forensics and so on – as a third party it’s really painful to require access to their internal tools to crack data from message-ids, rather than just being able to read off opaque customer and campaign id cookies from them directly.

  • Thanks for that extra insight. I will take that into consideration in future projects. I had never thought about it from a transparency to the ISP level before. A very good point.

  • I should do a post on VERP next (as, apparently, we haven’t ever done one) and that’s where the transparency and easy pattern recognition gets really important.

  • I really like the hash approach – though I think shorter hashes than MD5 might be good because long links can cause problems. It’s OK for cnn.com/WORLD but if one tries to link to a site with longer URLs (blog.wordtothewise.com/2010/10/clicktracking-link-abuse/#more-2095 or veryveryverylongname.wordtothewise.com/blog/archives/2010/10/12/much-longer-title-about-clicktracking-link-abuse/#automaticallygeneratedlonglabel345geft876hsdf00werf ) then adding the MD5 has can easily make the url wrap in the email and thus break the redirect for some readers.
    One option would be to do a CRC32 on the url and salt and then base64encode the output (though you will want to replace any b64 / s with somethng else such as _ )
    If you have control of your webserver (i.e. it isn’t hosted by someone who won’t let you tinker with the web server environment) you can almost certainly use varients of mod_rewrite and regular log analysis to remove all scripting overhead in the serving of the URL because you can use the file system and server error handling system as the way to perform the redirect and gather the required click data. If you have a busy site then this is a great way to reduce load on the server (database reads and writes are wonderful ways to slow performance for most websites).

By steve

Recent Posts

Archives

Follow Us