Truetype fonts (which are used in many places including PDF files) consist of a set of tables. These tables include the Glyf table defines the actual shapes of the characters, and the CMAP table which define which glyf maps onto each character. Because Truetype fonts are designed to work on multiple platforms and with different languages, it is possible to to have several CMAP tables in different formats. So far so good. We can see the tables disassembled in lots of tools – here is a view of a Truetype font in Acrobat 9.0
There are 2 CMAP tables present. As I had a problem with character 224, I checked the tables and found that they gave different results. In theory they should return the same answer.
In this case the CMAP tables give inconsistent results. In PDF files, there is an order of preference for using CMAP tables and in general format 0 should be used in preference to format 4.
So I did some tests on my PDF file collection (after many years working with PDF files I have a ‘small selection’ which I use for regression tests) and found that some contained broken format 0 and some contained broken format 4. The PDF user has to work out which table is correct.
So do not always assume that you can use all the CMAP tables in a font reliably (and make sure you have some decent tools available if you need to investigate). Do you trust your CMAP tables?
Are you a Developer working with PDF files?
|Free: The Developer's Guide to PDF|
|Convert PDF files to HTML|
|Use PDF Forms in a web browser|
|Convert PDF Documents to an image|
|Work with PDF Documents in Java|