Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Mystery of the PDF file and the missing Euro Character

1 min read

I had to debug an intriguing mystery to investigate this morning (while listening to Hercule Poirot naturally enough). It shows off some interesting features of the PDF file format and Acrobat, so let’s recreate the crime and give you the solution… (you will need to provide your own soundtrack).

The problem was that our PDF viewer was not displaying the Euro symbol correctly (while it appeared correctly in Acrobat). So the first step was to find out some more about the PDF. I used our PDF viewer to find out what the PDF was created with and more details about the fonts (you can get the same information in most PDF viewers). The PDF file was created with PDF lib and contains embedded Truetype fonts.
PDF properties

So the next stop is to open the PDF file in Acrobat and see how the fonts are setup. The value in the PDF file for the character is the Standard Euro value (decimal 128). The index value is  usually mapped onto a glyph value in the CMAP table. Acrobat has a nifty feature to show us these tables.

There are 2 tables mapping glyphs onto the actual glyph values in the PDF font data. In theory they should give identical results. The solution is actually in the second table. The CMAP lists the glyphs (usually by name) and the connected glyph number in the file. It also shows a preview of the glyph.

Here is our answer. Glyph 57 is the Euro symbol but it is identified not by the glyph name but by 20ac – the unicode value for Euro. Hm… not sure I could find that in the PDF specification, but Acrobat accepts it and adding it in fixes the display in our PDF viewer.

There are several lessons to be learnt from this little mystery. Having the right tools is essential to track down issues, and expect to find some slightly odd things going on inside some PDF files. Oh, and have something appropriate to listen to in the background…

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Can we help you to solve any of these problems?

IDRsolutions has been helping companies to solve these problems since 1999.

Convert PDF to HTML5 or SVG with BuildVuConvert PDF to HTML5 or SVGConvert AcroForms and XFA to HTML5 with FormVuConvert PDF forms to HTML5
Java Image SDK for working with Image files with JDeliJava SDK for Image files JPedal Java PDF SDK for working with PDF filesJava SDK for PDF files
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Why you should care about Unicode support in Java…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Bethan Palmer
1 min read

PDF to HTML5’s Holy Grail – Vertical positioning for…

It’s safe to say that if someone designed fonts from scratch today they’d be very different on the inside. As with many technologies, the...
Sam Howard
1 min read

WOFF 2.0: What is it, why is it coming,…

WOFF 2.0 is working its way towards being a standard recommended by the W3C, so it seems like a good time to look at...
Sam Howard
2 min read

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2021. All rights reserved.