Useful bits of Cryptography – Hashes
More than just PGP
Cryptography is the science of securing communication from adversaries. In the email world it’s most obvious use is tools like PGP or S/MIME that are used to encrypt a message so that it can only be read by the intended recipient, or to sign a message so that the recipient can be sure of who it came from. There are quite a few other aspects of sending email where a little cryptography is useful or essential, though – bounce management, suppression lists, unsubscription handling, DKIM and DMARC, amongst others.
A hash function converts any text you give it into an opaque string of gobbledygook called a digest. As an example, one commonly used hash function is “md5” and the md5 hash of “Word to the Wise” gives “a1606a9079b1c15a521c6d04344dfb62” as a digest. Using the same hash function on the same string will always give the same digest, and different strings will always give a different digest. They’re like a fingerprint. And you can’t “decode” a digest back into the original string.
Hash functions are used for all sorts of things, from checking when a file has changed to securely storing passwords. Because of that they’re easily available from pretty much all programming languages and from a windows or unix command line.
One thing hashes are useful for is sharing information when you don’t entirely trust the person you’re sharing it with. If, say, you have a suppression list of email addresses and you want someone else to remove them from their database, but you don’t want to actually share the list of email addresses with them. You can go through your suppression list and generate the digest for each email address, then send the other party the list of digests. They can go through their database and generate the digest for each of their email addresses. If that digest is in the list you sent them they know that email address needs to be suppressed. But they can’t find out the other email addresses on your suppression list. (The advanced version of this would have you share a salt with the other party).
More generally, they allow two people to identify which email addresses they have in common, without revealing to the other the email addresses they don’t possess.
Another thing they’re useful for is making something resistant to tampering. Say you’re using something like “firstname.lastname@example.org” as your email’s return path, as part of VERP-based bounce management, where the number represents the customer and the rest of it represents my email address. This makes handling bounces fairly simple, as when you receive mail to that address you just need to go into your database and note that my email address is bouncing mail for customer 547 and may need to be suppressed.
But what if I’m having a fight with someone else on that mailing list, and I send a fake bounce to “email@example.com”. As far as your bounce management automation is concerned, that means that mail to laura is bouncing and mail to her should be suppressed.
If you’re rather not be open to that sort of attack then you could use a hash function to cheaply “sign” the bounce address, so that it can’t be faked in that way. You choose a secret word, perhaps “12marmalade”. Each time you send an email you take the local part of your old-style return path and your secret word and mush them together to give a string like “bounce-547-steve=wordtothewise.com12marmalade” and then you use the md5 hash function to get the digest “a782eac364cc5a1dcbab2705495fe7a7”. You use the digest to create a new return path “firstname.lastname@example.org”.
Now when you get a bounce message sent to that address you can take the customer ID and email address part of it, add your secret word and find the md5 digest of that string. If it matches the one in the address you know it was a valid bounce, and you should suppress mail to the address. If it doesn’t match, it’s a forgery and you shouldn’t. Now I can’t fake up bounces from Laura and get her bounced off of mailing lists. If the full 32 character digest seems a bit excessive you can just use a substring of it – even 8 characters is plenty to protect against attacks.
You can use the same sort of approach to add some basic security to things like unsubscription links, opt-in confirmation links and so on. Using a hash-based signature to protect those URLs not only allows you to be sure that the person clicking the confirmation link has the confirmation mail you sent out, it makes it easier to demonstrate if you later need evidence of that.
 Yes, I’m simplifying slightly. Read Applied Cryptography if you want the longer version.