Clicktracking 2: Electric Boogaloo

A week or so back I talked about clicktracking links, and how to put them together to avoid abuse and blocking issues.
Since then I’ve come across another issue with click tracking links that’s not terribly obvious, and that you’re not that likely to come across, but if you do get hit by it could be very painful – phishing and malware filters in web browsers.
Visting this site may harm your computer
First, some background about how a lot of malware is distributed, what’s known as “drive-by malware”. This is where the hostile code infects the victims machine without them taking any action to download and run it, rather they just visit a hostile website and that website silently infects their computer.
The malware authors get people to visit the hostile website in quite a few different ways – email spam, blog comment spam, web forum spam, banner ads purchased on legitimate websites and compromised legitimate websites, amongst others.
That last one, compromised legitimate websites, is the type we’re interested in. The sites compromised aren’t usually a single, high-profile website. Rather, they tend to be a whole bunch of websites that are running some vulnerable web application – if there’s a security flaw in, for example, WordPress blog software then a malware author can compromise thousands of little blog sites, and embed malware code in each of them. Anyone visiting any of those sites risks being infected, and becoming part of a botnet.
Because the vulnerable websites are all compromised mechanically in the same way, the URLs of the infected pages tend to look much the same, just with different hostnames – http://example.com/foo/bar/baz.html, http://www.somewhereelse.invalid/foo/bar/baz.html and http://a.net/foo/bar/baz.html – and they serve up just the same malware (or, just as often, redirect the user to a site in russia or china that serves up the malware that infects their machine).
A malware filter operator might receive a report about http://example.com/foo/bar/baz.html and decide that it was infected with malware, adding example.com to a blacklist. A smart filter operator might decide that this might be just one example of a widespread compromise, and go looking for the same malware elsewhere. If it goes to http//a.net/foo/bar/baz.html and finds the exact same content, it’ll know that that’s another instance of the infection, and add a.net to the blacklist.
What does this have to do with clickthrough links?
Well, an obvious way to implement clickthrough links is to use a custom hostname for each customer (“click.customer.com“), and have all those pointing at a single clickthrough webserver. It’s tedious to setup the webserver to respond to each hostname as you add a new customer, though, so you decide to have the webserver ignore the hostname. That’ll work fine – if you have customer1 using a clickthrough link like http://click.customer1.com/123/456/789.html you’d have the webserver ignore “click.customer1.com” and just read the information it needs from “123/456/789.html” and send the redirect.
But that means that if you also have customer2, using the hostname click.customer2.com, then the URL http://click.customer2.com/123/456/789.html it will redirect to customer1’s content.
If a malware filter decides that http://click.customer1.com/123/456/789.html redirects to a phishing site or a malware download – either due to a false report, or due to the customers page actually being infected – then they’ll add click.customer1.com to their blacklist, meaning no http://click.customer1.com/ URLs will work. So far, this isn’t a big problem.
But if they then go and check http://click.customer2.com/123/456/789.html and find the same redirect, they’ll blacklist click.customer2.com, and so on for all the clickthrough hostnames of yours they know about. That’ll cause any click on any URL in any email a lot of your customers send out to go to a “This site may harm your computer!” warning – which will end up a nightmare even if you spot the problem and get the filter operators to remove all those hostnames from the blacklist within a few hours or a day.
Don’t let this happen to you. Make sure your clickthrough webserver pays attention to the hostname as well as the path of the URL.
Use different hostnames for different customers clickthrough links. And if you pick a link from mail sent by Customer A, and change the hostname of that link to the clickthrough hostname of Customer B, then that link should fail with an error rather than displaying Customer A’s content.

Related Posts

Which is better UTF-8 or ISO-?

Someone asked today on a mailing list whether they should be using UTF-8 or “ISO” encoding for sending email. What’s the best choice depends on some of the details of the situation, but here’s the answer I gave:
UTF-8 will work for pretty much anything, as it’s just an 8 bit encoding scheme for Unicode (which is supposed to be the one character encoding to rule them all). It’s well supported in most languages and development environments – Windows has been native UTF-16 under the covers since the mid 90s, for instance – and typical messages that use mainstream glyphs should render well from utf-8 in most western MUAs and browsers.
There are still a very few old or broken clients out there that will not handle UTF-8 well but (outside the asian language market, where there’s still some non-ASCII, non-Unicode legacy usage) they’re typically ones that don’t really handle any character set encoding well and the only thing safe to send to them is either plain ASCII or whichever ASCII superset their OS happens to support natively (which is probably an argument for sending Windows-1252 codepage, but not a terribly strong one).
The various extended ASCIIs (such as ISO-8859-*) will only work for messages that are written solely using characters from that character set. If you have even one character in a message that cannot be expressed in ISO-8859-1, then you can’t use ISO-8859-1 to send that message.
ISO-8859-1 (aka Latin1) is fairly sloppy in some respects – it has no apostrophe, nor single quotes, for instance – but it can handle an awful lot of languages, from Kurdish to Swahili. It can’t handle Dutch, Estonian, Finnish, Hungarian and Welsh particularly well, nor can it show the Euro symbol (ISO-8859-14 or -15 are needed for some characters there).
A common problem is that many people (and the software they write) think that Windows uses Latin1. It doesn’t, it uses Windows-1252. If you accept messages written on Windows, using the Windows-1252 code page, and throw them out on the wire as ISO-8859-1 what you end up with is not quite right. It mostly works, as the two codepages overlap quite a bit, but they have different glyphs in the 0x80-0x9f range. So if you use single or double quotes (“smart quotes”), or the Euro symbol, or ellipses, or bullet, or the trademark symbol in your message they’ll be garbled. This is so common that some mail clients and web browsers will actually treat a document that claims to be ISO-8859-1 as Windows-1252, but that’s a bug workaround and not something it’s really safe to rely on.
If you’re doing personalized messages, and you’re sending one of them to Győző and one of them to Eiður then you may have to use different character sets for the two messages. If you’re talking about Győző and personalizing it for Eiður then you might find things break horribly.
Someone probably has some concrete data on mail client character set support, broken down by region and language, but my understanding is that this is a reasonable approach:

Read More

Poor delivery can't be fixed with technical perfection

There are a number of different things delivery experts can do help senders improve their own delivery. Yes, I said it: senders are responsible for their delivery. ESPs, delivery consultants and deliverability experts can’t fix delivery for senders, they can only advise.
In my own work with clients, I usually start with making sure all the technical issues are correct. As almost all spam filtering is score based, and the minor scores given to things like broken authentication and header issues and formatting issues can make the difference between an email that lands in the inbox and one that doesn’t get delivered.
I don’t think I’m alone in this approach, as many of my clients come to me for help with their technical settings. In some cases, though, fixing the technical problems doesn’t fix the delivery issues. No matter how much my clients tweak their settings and attempt to avoid spamfilters by avoiding FREE!! in the subject line, or changing the background, they still can’t get mail in the inbox.
Why not? Because they’re sending mail that the recipients don’t really want, for whatever reason. There are so many ways a sender can collect an email address without actually collecting consent to send mail to that recipient. Many of the “list building” strategies mentioned by a number of experts involve getting a fig leaf of permission from recipients without actually having the recipient agree to receive mail.
Is there really any difference in permission between purchasing a list of “qualified leads” and automatically adding anyone who makes a purchase at a website to marketing lists? From the recipient’s perspective they’re still getting mail they don’t want, and all the technical perfection in the world can’t overcome the negative reputation associated with spamming.
The secret to inbox delivery: don’t send mail that looks like spam. That includes not sending mail to people who have not expressly consented to receive mail.

Read More

Abuse Reporting Format

J.D. has a great post digging into ARF, the abuse reporting format used by most feedback loops.
If you’re interested in following along, you might find this annotated example ARF report handy.

Read More