Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Are your Truetype CMAP tables lying to you?

1 min read

Truetype fonts (which are used in many places including PDF files) consist of a set of tables. These tables include the Glyf table defines the actual shapes of the characters, and the CMAP table which define which glyf maps onto each character. Because Truetype fonts are designed to work on multiple platforms and with different languages, it is possible to to have several CMAP tables in different formats. So far so good. We can see the tables disassembled in lots of tools – here is a view of a Truetype font in Acrobat 9.0

truetype font tables

There are 2 CMAP tables present. As I had a problem with character 224, I checked the tables and found that they gave different results. In theory they should return the same answer.

It is a Euro symbol
Or is it an agrave

In this case the CMAP tables give inconsistent results.  In PDF files, there is an order of preference for using CMAP tables and in general format 0 should be used in preference to format 4.

So I did some tests on my PDF file collection (after 10 years working with PDF files I have a ‘small selection’ which I  use for regression tests) and found that some contained broken format 0 and some contained broken format 4. The PDF user has to work out which table is correct (maybe how you do that will be something for another article).

So do not always assume that you can use all the CMAP tables in a font reliably (and make sure you have some decent tools available if you need to investigate). Do you trust your CMAP tables?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Did you know...

IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?

It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Enabling SVG Gzip Compression on Apache and NGINX

Gzip compression is a widely supported method of reducing the size of the content sent from a web server in order to improve the...
Leon Atherton
47 sec read

2 Replies to “Are your Truetype CMAP tables lying to you?”

  1. It is incorrect to assume you can go through just any “cmap” in a TrueType font (not be mixed up with “CMap”, and not to be written as “CMAP”) with a character code from your PDF in your hands.

    Before you go to a “cmap” to need to check the encoding information in your font dictionary. Then you have to find the Unicode code point for that character, then you look inside the font into any of the “cmap” tables to see whether one (or sevral) of them can map your Unicode codepoint to a glyph.

    This is slgihtly different for symbolic TrueType fonts, as here there should only either be a single “cmap” or a pair of MacRomandEncoding and Microsoft symbolic encoding (which have to agree on the mapping), and only in this case do you go from character code to the index value in either the “cmap” tables.

    Now, I won’t claim there never could be a TrueType font that still gives different results even base on the lookup I described – but then that indeed would be a bug in the font (and I’d like to see sample files for that….).


    PS: The text inside the PDF standard is pretty clear about all this, though admittedly it is not an easy read… (check out )

    PS: On the PDF Association’s website there is now a brand new discussion forum for topics like this one, and a wide range of excelllent experts are watching the space (and tend to come up with their take often within hours…) – check out .

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.