Almost every bulk mail sent includes some sort of instrumentation to track which users click on which links and when. That’s usually done by the ESP rewriting links in the content so they point at the ESP’s tracking server, and include information about the customer, campaign and recipient. The recipient clicks on the link in the email, their web browser fetches the link from the tracking server, the tracking server records the details of that click and tells the browser to immediately open the original destination page.
(This is much the same thing as URL shortening services do, but the goal here isn’t to make the link shorter. Be very cautious about using actual link shorteners – such as bit.ly – in email.)
It’s not too difficult to build your own link redirector, perhaps a few hours work for a basic implementation, but there are a few operational details that might not be immediately obvious. Here, in no particular order, are some things to be aware of:
- Be aware that content-filtering will be strongly affected by the hostname of the URL you use for click-tracking. If all your customers use the same one, they’ll share reputation.
- Use a dedicated hostname for your tracking redirector, rather than having it on the same domain as your image open server, remote content server or any other part of your web infrastructure. That’ll make it easier to disentangle it if you need to as you scale.
- Make sure that only links your system creates will work so that bad actors can’t edit the URL to make it point somewhere else.
- Make sure that you can easily disable redirection for a whole mailing or a whole customer, mitigating the damage when it’s abused.
- At least one large ISP prefers it if you use click.example.com/whatever?id=my_identifier rather than, e.g., click.example.com/whatever/my_identifier
- If you’re deploying a new system, use https everywhere. It doesn’t seem critical today, but I expect it’ll be a deliverability advantage or even a hard requirement soon.
- Log the redirect traffic you get – click IP address, referer, user-agent. At some point something funky is going to happen and if you have that you can work out what.
- Never reuse links
- Make it extremely difficult to guess a link, so an attacker can’t start with a link they receive and walk through your namespace to guess links sent by other customers or to other recipients. The destination the final link points to will often include PII, so protecting that against attackers is important.
- Monitor and look for spikes of high numbers of clicks on a link sent to just one recipient. That might be because someone shared a link on social media and it went viral. Or it might be that a bad actor sent a single mail through your system, then started spamming out that content via another provider.
- Record the time that the mail containing the link was sent along with the rest of the click data. You’re going to see behaviour like spam filters automatically clicking on links before the mail is even delivered and having that information will let you dig into that more easily.
- Consider whether you want to be able to change the destination URL after the mail was sent. Supporting that puts some limitations on the implementation that may make it more complex, but it can be useful for abuse mitigation, for time based redirects and as a customer-visible feature.
- Consider having part of the click URL that is ignored by the redirector. This lets you make the links in the email less hostile when the recipient hovers over them. e.g. that lets you support a friendly link that looks like https://click.wttw/link-tracking-redirectors-d69403e26 instead of https://click.wttw/d69403e26. That does allow an attacker to change the visible part of the link, but not the final destination, unless you cryptographically sign that part. But that’s unlikely to be used for anything more malicious than geek shenanigans, so …
There are two main approaches a developer might think of to support all those requirements. One would be to have each URL include an opaque key (generated from a large namespace) that the redirect server can use to look up the information about the link, including the final destination, in a database. The other would be to include all the information you need in the URL itself, potentially encrypted to hide the contents from casual eyes and cryptographically signed to prevent tampering with it.
Both are perfectly good approaches, as is a hybrid of the two, but they have different tradeoffs. Database-backed is simpler to implement, and makes invalidating or changing the destination URL easy – but it adds database latency to the redirect and more importantly means that link redirection stops working if the database is overloaded or down for maintenance.
Cryptographic signing can be really simple – appending a secret key to the link content, taking a cryptographic hash of that and including the hash in the link is easy to do, protects against any sort of link guessing or modification and is very cheap to check whether a link is valid or not without needing to hit a database. This takes the database out of the critical path, allowing the redirector to send back a http redirect immediately, even if the database is unavailable.