Dodgy PDF handling at Gmail

We sent out some W-9s this week. For non-Americans and those lucky enough not to have to deal with IRS paperwork those are tax forms.
They’re simple single page forms with the company name, address and tax ID numbers on them. Because this is the 21st Century we don’t fill them in with typewriters and snail mail them out, we fill in a form online at the IRS website which gives us PDFs to download that we then send out via email.

We started to get replies from people we’d sent them to that we hadn’t included the tax ID number. Which was odd, because it was definitely there in the PDFs we’d sent.
The reports of missing numbers came from Google Apps users, so we sent a copy to one of our Gmail addresses to see. Sure enough, when you click on the attachment it’s mostly there, but some of the digits of the tax ID number are missing.

And all the spaces have been stripped from our address.

The rest of the form looked fine, but the information we’d entered was scrambled. Downloading the PDF from Gmail and displaying it – everything is there, and in the right place.
Weird. After a brief “Are gmail hiding things that look like social security numbers?” detour I realized that the IRS website was probably generating the customized forms using PDF annotations.
PDF is a very powerful, but very complex, file format. It’s not just an image, it’s a combination of different elements – images, lines, vector artwork, text, interactive forms, all sorts of things – bundled together into a single file. And you can add elements to an existing PDF file to, for example, overlay text on to it. These “annotations” are a common way to fill in a PDF form, by adding text in the right place over the top of an existing template PDF.
I cracked the PDF open with some forensics tools and sure enough, the IRS had generated the PDF form using annotations.
 

<< /Type /Annot /DV (Palo Alto, CA) /T (topmostSubform[0].Page1[0].Address[0].f1_8[0])
/Rect [ 57.6 539.968 388.8 553.969 ] /AP 81 0 R /FT /Tx /DA (/Helvetica-Bold 9 Tf 0 g)

And the Gmail PDF viewer isn’t rendering that annotated text correctly.
I’ve filed a bug sent feedback to Google, so hopefully it’ll be fixed. Meanwhile, if you’re sending customized content to recipients using PDF you should probably check that it renders correctly when previewed in Gmail.
 

Related Posts

Tell us about how you use Gmail Postmaster Tools

One of the things I hear frequently is that folks really want access to Google Postmaster Tools through an API. I’ve also heard some suggestions that we should start a petition. I thought a better idea was to put together a survey showing how people are using GPT and how high the demand is for an API.
They’re a data company, let’s give them data.

I’ve put together a survey looking at how people are using GPT. It’s 4 pages and average time to take the survey is around 7 minutes. Please give us your feedback on GPT usage.
I’m planning on leaving the survey open through the first week in November. Then I’ll pull data together and share here and with Google.

Read More

Google Postmaster Tools: Last Chance!

I’ll be closing down the Google Postmaster Tools survey Oct 31. If you’ve not had a chance to answer the questions yet, you have through tomorrow.
This data will be shared here. The ulterior motive is to convince Google to make an API available soon due to popular demand.

Read More

SNDS issues and new Gmail

A bunch of folks reported problems with Microsoft’s SNDS page earlier today. This afternoon, our friendly Microsoft rep told the mailop mailing list that it should be fixed. If you see problems again, you can report it to mailop or your ESP and the message will get shared to the folks who can fix it.
The other big thing that happened today was Gmail rolled out their new inbox layout.
It’s… nice. I’ll be honest, I am not a big gmail user and have never been a huge fan. I got my first account way-back-during-the-beta. I used it to handle some of my mailing list mail. I could never work out how to get it to stop breaking threads by deciding to put some mail into the junk folder. I just gave up and went back to my shell with procmail (now sieve) scripts. I still have a couple lists routed to my gmail account, and the filtering is much improved – I can at least tell it to never bulk folder certain email.
The feature I’m really interested in is the confidential, expiring email. I’m interested in how that’s going to work with non-Gmail accounts. Within Gmail makes perfect sense, but I don’t think Gmail can control mail once it’s off their system.

My best guess is that Gmail will end up sending some type of secure link to recipients using non-Gmail mail servers. The message itself will stay inside Google and recipients will only be able to view mail through the web. That’s how the vast majority of secure mail systems work.
If anyone has the secure message already, feel free to send me a secure message. I’ll report back as to how it works.

Read More