Clicktracking link abuse

If you use redirection links in the emails you send out, where a click on the link goes to your server – so you can record that someone clicked – before redirecting to the real destination, then you’ve probably already thought about how they can be abused.
Redirection links are simple in concept – you include a link that points to your webserver in email that you send out, then when recipients click on it they end up at your webserver. Instead of displaying a page, though, your webserver sends what’s called a “302 redirect” to send the recipients web browser on to the real destination. How does your webserver know where to redirect to? There are several different ways, with different tradeoffs:

The simplest approach
The simplest sort of redirection link includes the final destination in the link itself – something like http://click.example.com/cnn.com/WORLD/. The webserver at click.example.com would simply strip off the first part of the link, and redirect to the remainder – cnn.com/WORLD/.
This is nice, because it’s fairly transparent to the recipient – when they hover over the link in their mail client or webmail it’ll be fairly clear where it’s going.
But it has several limitations. One is that you can’t really record very much data about the click – you know where it was redirecting to, but almost nothing else.
The bigger problem is that it’s very easy for a spammer to abuse – they can send out spam that has the link http://click.example.com/onlinepharmacy.ru/order.html, to hide their real link from spam filters, and your webserver will happily redirect recipients to go there. Or, worse, that can be used to redirect to a website hosting viruses. That can cause all sorts of problems for your reputation, up to and including having your redirection webserver blacklisted by antivirus and antiphishing organizations, meaning it’ll be blocked by many web browsers.
Add some metadata
Some of the things you might want to be able to record about a click would be which customers mail it was found in, which mailing campaign and which recipient it was sent to. This would let you do more sensible reporting and click-tracking, and also let you spot when a link is misused in some way (for example, thousands of clicks on a url that was sent to just one recipient).
That might look like http://click.example.com/123/456/789/cnn.com/WORLD/. Your webserver would strip off the first four parts, recording a click for customer 123, campaign 456 and recipient 789, then redirect to the remainder – cnn.com/WORLD/
This lets you do better reporting and is still fairly transparent to the recipient, but can still be abused in the same way.
Use a database
If you stored every link you wanted to redirect to in  a database you could simply store a unique key for each link – so you might record that key 2718 means http://cnn.com/WORLD/. Then the redirection URL might look like http://click.example.com/123/456/789/2718
This lets you do good reporting and is much more difficult for spammers to abuse (but not impossible – if the spammer signs up for a free or demo account on your system, then sends a test email to themselves, they can then reuse the links that they received in that mail).
But it’s fairly opaque to the recipient – they have no idea where the link will go. And it requires maintaining a database of every link you’ve ever used, for as long as it’s valuable (which could easily be several years if a recipient goes back to an old newsletter) and requires a database lookup for every click – which adds a fair bit of infrastructure you need to keep working 24/7 just to make links work.
Use a database and a cosmetic link
You could take the database format and add the final destination link on the end – like this http://click.example.com/123/456/789/2718/cnn.com/WORLD/ – and then just ignore everything after the url key (2718). That’ll work exactly the same way, but the final destination will be fairly transparent to the recipient.
This still can’t be abused by spammers, as if they try to use http://click.example.com/123/456/789/2718/mypharmacy.ru, it’ll still just redirect to http://cnn.com/WORLD/ as the only meaningful bit of the redirection link is the “2718“.
Cryptographically sign your links
A different approach is to record all the information you need in the link and to also add a cryptographic signature to prevent people from misusing it. This is much simpler than the word “cryptography” suggests, you just need to use a magic word (we’ll use “albatross”) and know about the md5() function.
You start off with the same destination string we used in Add some metadata – “/123/456/789/cnn.com/WORLD/“. Then you add the magic word on the end, to give “/123/456/789/cnn.com/WORLD/albatross“, and take the md5 “hash” of that. That’s some cryptographic black magic that’ll give you a string of letters and numbers that’s a “fingerprint” of that string. It’ll look something like “609a78b941bdf9f045cadcfa2e09d54c“. Then you combine that with the destination string to look like this:
http://click.example.com/609a78b941bdf9f045cadcfa2e09d54c/123/456/789/cnn.com/WORLD/
Then, when your webserver sees this link it splits it into the hash (609a78b941bdf9f045cadcfa2e09d54c) and destination string (/123/456/789/cnn.com/WORLD/). It then does exactly the same thing you did when you created the link – appends the magic word to the destination string to give “/123/456/789/cnn.com/WORLD/albatross” and takes the md5 hash of that string. If the result of that matches the hash in the link, it knows it’s a valid redirection link and it can record the click-tracking data and forward to the destination link. If the result doesn’t match it knows that the link has been tampered with, and can return an error page.
To generate the link in PHP would be something like this:

$destination = "/$customerid/$campaignid/$recipientid/$link";
$clicktrack = 'http://click.example.com/' . md5($destination . 'albatross') . $destination;

This is much cheaper to generate and validate than using a database, even a typical in-memory database.
Which to use?
Don’t use the simple approach – it’ll get abuse sooner or later and you’ll regret it. Any of the database or cryptographic approaches work just fine, though the cryptographic approach may be easier to scale up and maintain. The database approaches make it easier to disable a link, or direct it to somewhere else at a later point, in case of abuse or some other need.
What else is it good for?
You can use the same sort of approach to validate unsubscription links and VERP return paths for bounce handling. And “open tracking” using these sort of links for image URLs, if you find that a useful metric to offer.

Related Posts

The secret to fixing delivery problems

There is a persistent belief among some senders that the technical part of sending email is the most important part of delivery. They think that by tweaking things around the edges, like changing their rate limiting and refining bounce handling, their email will magically end up in the inbox.
This is a gross misunderstanding of the reasons for bulk foldering and blocking by the ISPs. Yes, technical behaviour does count and senders will find it harder to deliver mail if they are doing something grossly wrong. In my experience, though, most technical issues are not sufficient to cause major delivery problems.
On the other hand, senders can do everything technically perfect, from rate limiting to bounce handling to handling feedback loops through authentication and offer wording and still have delivery problems. Why? Sending unwanted mail trumps technical perfection. If no one wants the email mail then there will be delivery problems.
Now, I’ve certainly dealt with clients who had some minor engagement issues and the bulk of their delivery problems were technical in nature. Fix the technical problems and make some adjustments to the email and mail gets to the inbox. But with senders who are sending unwanted email the only way to fix delivery problems is to figure out what recipients want and then send mail meeting those needs.
Persistent delivery problems cannot be fixed by tweaking technical settings.

Read More

Which is better UTF-8 or ISO-?

Someone asked today on a mailing list whether they should be using UTF-8 or “ISO” encoding for sending email. What’s the best choice depends on some of the details of the situation, but here’s the answer I gave:
UTF-8 will work for pretty much anything, as it’s just an 8 bit encoding scheme for Unicode (which is supposed to be the one character encoding to rule them all). It’s well supported in most languages and development environments – Windows has been native UTF-16 under the covers since the mid 90s, for instance – and typical messages that use mainstream glyphs should render well from utf-8 in most western MUAs and browsers.
There are still a very few old or broken clients out there that will not handle UTF-8 well but (outside the asian language market, where there’s still some non-ASCII, non-Unicode legacy usage) they’re typically ones that don’t really handle any character set encoding well and the only thing safe to send to them is either plain ASCII or whichever ASCII superset their OS happens to support natively (which is probably an argument for sending Windows-1252 codepage, but not a terribly strong one).
The various extended ASCIIs (such as ISO-8859-*) will only work for messages that are written solely using characters from that character set. If you have even one character in a message that cannot be expressed in ISO-8859-1, then you can’t use ISO-8859-1 to send that message.
ISO-8859-1 (aka Latin1) is fairly sloppy in some respects – it has no apostrophe, nor single quotes, for instance – but it can handle an awful lot of languages, from Kurdish to Swahili. It can’t handle Dutch, Estonian, Finnish, Hungarian and Welsh particularly well, nor can it show the Euro symbol (ISO-8859-14 or -15 are needed for some characters there).
A common problem is that many people (and the software they write) think that Windows uses Latin1. It doesn’t, it uses Windows-1252. If you accept messages written on Windows, using the Windows-1252 code page, and throw them out on the wire as ISO-8859-1 what you end up with is not quite right. It mostly works, as the two codepages overlap quite a bit, but they have different glyphs in the 0x80-0x9f range. So if you use single or double quotes (“smart quotes”), or the Euro symbol, or ellipses, or bullet, or the trademark symbol in your message they’ll be garbled. This is so common that some mail clients and web browsers will actually treat a document that claims to be ISO-8859-1 as Windows-1252, but that’s a bug workaround and not something it’s really safe to rely on.
If you’re doing personalized messages, and you’re sending one of them to Győző and one of them to Eiður then you may have to use different character sets for the two messages. If you’re talking about Győző and personalizing it for Eiður then you might find things break horribly.
Someone probably has some concrete data on mail client character set support, broken down by region and language, but my understanding is that this is a reasonable approach:

Read More

Improving the email interface

Want an improved email interface? Then build it.
There’s been an ongoing discussion about adding thumbs up / thumbs down style buttons to email clients. While I am dubious this is a useful feature or something that recipients will use, if there are others in the industry that think it would be useful then I strongly suggest they go ahead and create it.
In fact, there are a couple things that have been asked for in email interfaces that aren’t currently provided. Last October I blogged about adding an unsubscribe button to email clients.

Read More