Glyph names- what is in a name?

I came across a rather intriguing problem while debugging a font issue in a PDF created by a tool called A-PDF.
The PDF specification has a great deal of flexibility. And one of the many tricks you can do is to redefine how glyph values mapped onto indexes. You can create a custom set of encodings to map values onto glyphs. This is really useful if you want to embed a font with just a few characters to save space.  This is done with a differences object and would look something like this if viewed in the raw file
427 0 obj
<<
/Differences [ 2 /A/B/euro ]
So value 2 maps onto glyph ‘A’, value 3 maps onto ‘B’ and value 4 maps onto the euro character.
There is a list of all the standard glyph values, and the standard mappings used if you do not create your own. Appendix D of the PDF Reference lists the standard encodings and all the glyph names. Most of the time, you do not need to define your own values and can just use the already prepared tables – StandardEncoding, MacRomanEncoding, WinAnsiEncoding and so forth.
Where it gets slightly messy is that not only can you define your own Encoding but you can create your own glyphs. The glyph name is just a key value used to lookup font data in other tables. So long as you are consistent, any value should be possible. So you could have a Differences object along the lines of
427 0 obj
<<
/Differences [ 2 /AnyName/SillyName/MyNewGlpyh ]
Most of the time, this works fine, but what about this value, taken from the problem file.
427 0 obj
<<
/Differences [ 2 /#23#234CH2eb0c8ba15de4cce8fa3c169622f8e93 /#23#2346H7a539460a8268e5915c0973dbb05dce1
/;#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323#2323
/g47
Usually, the # character indictates the next 2 characters are a numeric value we use – in theory, so long as we are consistent, it should not matter. But this is what it looks like in Acrobat.
 
So for the values 1 and 2, we need to strip out the number values after the # so that  #23#234CH2eb0c8ba15de4cce8fa3c169622f8e93  becomes ##4CH2eb0c8ba15de4cce8fa3c169622f8e93
But for value 3, we strip the first 2. Why do we need to strip some of the values?
I can only guess that the presence of non-numeric values make the numbers invalid or that we strip the first 2 values, but I can’t find any clear rules – I am just guessing. So if you see the value # is a PDF string, be careful…
And if you the exact rules which should be applied here why not post and explain what’s really going on here…
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

One thought on “Glyph names- what is in a name?

  1. Dave Kriewall

    Mark,

    I think that #23 is supposed to represent the Ascii character 0x23, which is “#”. It’s just confusing because the escape character is also #.

    If you preprocess the “#23#23CH2eb0” replacing every #nn with its ASCII equivalent, you’d get “##CH2eb0”. But somebody stuck the un-preprocessed string in the Differences dictionary.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>