Why is my PDF Producer showing up in Chinese (or all the adventure of the wrongly encoded textstream)

Mark Stephens

14 years ago

I was recently sent a PDF file where some of the metadata appeared to be wrong. In particular, the PRODUCER field was appearing in Chinese.

Oh dear, I thought (or slightly less poetic words to that effect), and opened the file in Acrobat to see where I had gone wrong. That is where it became interesting, because this is what Acrobat 9.0 showed.

But some versions of Acrobat and Foxit actually show the PRODUCER as ScanSoft PDF Create! 5; modified using iText 2.1.7 by 1T3XT . So we have a bit of a mystery here…

I opened up the PDF file in a text editor to look at the data and here is the PDF information object

20 0 obj<<

/CreationDate(D:20101207222118+01’00’)

/Title<feff004d006900630072006f0073006f0066007400200057006f007200640020002

d00200044006f006b0075006d0065006e00740031>

/Producer(˛ˇScanSoft PDF Create! 5; modified using iText 2.1.7 by 1T3XT)

/Author<feff006d0067>

/Creator<feff004d006900630072006f0073006f0066007400200057006f00720064002000

2d00200044006f006b0075006d0065006e00740031>

/ModDate(D:20101210103136+01’00’)

>>endobj

The Producer value is between 2 brackets and although it looks like a text string, it is in fact a binary values which can be encoded either as 2 byte unicode or as PDFDocEncoding (essentially ASCII so it actually looks like text in a viewer). The key to the mystery here is the 2 funny characters at the start ˛ˇ which are actually byte values 254 and 255. This indicates that the rest of the data is 2 byte Unicode. As you can see this is not the case.

So the problem is that the string is wrongly encoded. Some tools are either assuming it must be PDFDocEncoding (so getting it right in this case) or have their own strategy for spotting the mistake.

There are quite a few cases where PDF files can deviate from the Spec, such as the All TrueType Fonts are MAC encoded (unless they are not), issue I wrote about in another post.

Have you found any oddities in your PDF files?