Character encoding

This morning, someone asked an interesting question.

Last time I worked with the actual HTML design of emails (a long time ago), <head> was not really needed. Is this still true for the most part? Any reason why you still want to include <head> + meta, title tags in emails nowadays?

There are several bits of information in the <head> part of an HTML document that can affect the rendering of it – there’s the doctype, which will control the html rendering model, there’s often some css which will control the styling, and there’s often a meta tag that states what character set is used in the document.
That last one is interesting in the case of a piece of HTML that’s being sent as part of a MIME email – as MIME already has a perfectly good way of specifying the character set a message has, as part of the Content-Type header. I looked at a few bulk messages I’d received recently and, sure enough, most of them include the <head> section, and have a meta tag in there that defines the character set. All of them have a character set defined in the Content-Type header. Sometimes those character sets didn’t match:

Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7Bit
<html>
<head>
<title></title>
<meta http-equiv=”Content-Type” content=”text/html; charset=windows-1252″>
<meta name=”title” content=”New CS5.5 Web Premium” />a snippet from this mornings email

What happens when they don’t match? I don’t think it’s defined anywhere. Time for some empirical testing.
Testing! For Science!
I needed to create some test emails which would be visibly different depending on which character set the mail client decided to use. I picked out two character sets – ISO-8859-15 and ISO-8859-16, as they differ from each other and from ISO-8859-1 enough that I could differentiate them just by the way two characters were rendered.
The byte 0xfd renders as e-with-a-tail (ę) in ISO-8859-16 and as y-acute (ý) in the other two character sets, while 0xa4 renders as the generic currency symbol (¤) in ISO-8859-1 and as a euro symbol (€) in the other two. I included the characters in two different ways in each test message – once as a raw character in the body of the message (=a4 or =fd in quoted-printable format), and once as a numeric HTML entity (&#164; or &#253;).
This is what I found:

Mail clientMime charsetHTML meta charsetRaw characterHTML entity
Mail.app-15-16-15-1
Gmail-15-1
Mail.app-16-15-16-1
Gmail-1-1
Mail.app-15none-15-1
Gmail-1-1
Mail.appnone-16broke-1
Gmail-1-1
Mail.appus-ascii-16broke-1
Gmail-1-1

 
There are several things to see from this data. The simple one first – regardless of which character set I declared, and where I declared it, both mail clients rendered characters written as HTML numeric entities (“&#164;”) consistently in ISO-8859-1. (This isn’t really a surprise, as it’s how the HTML specs define them.)
Raw characters were much less consistent. Mail.app consistently used the character set declared in the MIME Content-Type header when it was set to something reasonable, and ignored the encoding in the HTML meta tag. Giving it an unreasonable character set in the Content-Type header caused it to render 0xfd as a double dagger (‡), which makes no sense at all in any character set I can find. Gmail managed to render the raw character in ISO-8859-15 correctly, but gave up and fell back to using ISO-8859-1 for everything else.
Conclusions
There are a few things we can conclude from this, I think, even though it really needs some comparisons with different mail clients, and some testing with other character sets (including unicode and some of the asian sets).

  1. Don’t bother with putting HTML meta content-type tags in your HTML
  2. Send your text/html parts as plain 7 bit ascii, using HTML entities for non-ascii characters
  3. It might be less confusing to use named entities such as &copy; rather than numeric ones such as &#169;
  4. If you’re generating numeric entities from user-generated input, be wary of input that’s not ISO-8859-1 or Windows-1252
  5. Character set conversion is hard, lets go unicode

I’ve made the test emails I used available for download. From a unix prompt, with swaks installed, you can send them like this:
for i in charset*.eml ; do swaks –to your@email.address –from your@email.address –server your.email.server –data – <$i; done
 
 

Related Posts

Yes, we have no IP addresses, we have no addresses today

We’ve just about run out of the Internet equivalent of a natural resource – IP addresses.

Read More

Poor delivery can't be fixed with technical perfection

There are a number of different things delivery experts can do help senders improve their own delivery. Yes, I said it: senders are responsible for their delivery. ESPs, delivery consultants and deliverability experts can’t fix delivery for senders, they can only advise.
In my own work with clients, I usually start with making sure all the technical issues are correct. As almost all spam filtering is score based, and the minor scores given to things like broken authentication and header issues and formatting issues can make the difference between an email that lands in the inbox and one that doesn’t get delivered.
I don’t think I’m alone in this approach, as many of my clients come to me for help with their technical settings. In some cases, though, fixing the technical problems doesn’t fix the delivery issues. No matter how much my clients tweak their settings and attempt to avoid spamfilters by avoiding FREE!! in the subject line, or changing the background, they still can’t get mail in the inbox.
Why not? Because they’re sending mail that the recipients don’t really want, for whatever reason. There are so many ways a sender can collect an email address without actually collecting consent to send mail to that recipient. Many of the “list building” strategies mentioned by a number of experts involve getting a fig leaf of permission from recipients without actually having the recipient agree to receive mail.
Is there really any difference in permission between purchasing a list of “qualified leads” and automatically adding anyone who makes a purchase at a website to marketing lists? From the recipient’s perspective they’re still getting mail they don’t want, and all the technical perfection in the world can’t overcome the negative reputation associated with spamming.
The secret to inbox delivery: don’t send mail that looks like spam. That includes not sending mail to people who have not expressly consented to receive mail.

Read More

What is Two Factor Authentication?

Two factor authentication, or the snappy acronym 2FA, is something that you’re going to be hearing a lot about over the next year or so, both for use by ESP employees (in an attempt to reduce the risks of data theft) and by ESP customers (attempting to reduce the chance of an account being misused to send spam). What is Authentication?
In computer security terms authentication is proving who you are – when you enter a username and a password to access your email account you’re authenticating yourself to the system using a password that only you know.
Authentication (“who you are”) is the most visible part of computer access control, but it’s usually combined with two other A’s – authorization (“what you are allowed to do”) and accounting (“who did what”) to form an access control system.
And what are the two factors?
Two factor authentication means using two independent sources of evidence to demonstrate who you are. The idea behind it is that it means an attacker need to steal two quite different bits of information, with different weaknesses and attack vectors, in order to gain access. This makes the attack scenario much more complex and difficult for an attacker to carry out.
It’s important that the different factors are independent – requiring two passwords doesn’t count as 2FA, as an attack that can get the first password can just as easily get the second password. Generally 2FA requires the user to demonstrate their identity via two out of three broad ways:

Read More