Understanding the PDF file format – Standard font information

One of the reasons that the PDF file format is so popular is that it embeds a large amount of font information in the PDF file, so that it can accurately reproduce the display as intended on any machine. It will not turn your beautifully crafted 12 page document into a horribly mis-formatted 14 page version, as Microsoft word does, if it cannot find all the fonts.

To help reduce the size of PDF files, the PDF file format specifies a set of font families (originally 14 and now reduced to 8) which all PDF readers should know about. No information is included in the PDF beyond the font name (ie Arial).

This is fantastic for PDF file size, but where do you get information about these fonts if you need it for some reason. With an embedded font, you have the width and bounding box of each glyph. It must be somewhere, because Adobe has access to it, but it is not in the PDF file. Where is the information stored?

It is actually stored inside a set of separate text files which are built into PDF readers. It contains a header with information about the font followed by a set of lines describing each glyph. This is part of what one looks like…

StartFontMetrics 4.1
Comment Copyright (c) 1989, 1990, 1991, 1992, 1993, 1997 Adobe Systems Incorporated.  All Rights Reserved.
Comment Creation Date: Thu May  1 17:27:09 1997
Comment UniqueID 43050
Comment VMusage 39754 50779
FontName Courier
FullName Courier
FamilyName Courier
Weight Medium
ItalicAngle 0
IsFixedPitch true
CharacterSet ExtendedRoman
FontBBox -23 -250 715 805
UnderlinePosition -100
UnderlineThickness 50
Version 003.000
Notice Copyright (c) 1989, 1990, 1991, 1992, 1993, 1997 Adobe Systems Incorporated.  All Rights Reserved.
EncodingScheme AdobeStandardEncoding
CapHeight 562
XHeight 426
Ascender 629
Descender -157
StdHW 51
StdVW 51
StartCharMetrics 315
C 32 ; WX 600 ; N space ; B 0 0 0 0 ;
C 33 ; WX 600 ; N exclam ; B 236 -15 364 572 ;
C 34 ; WX 600 ; N quotedbl ; B 187 328 413 562 ;
C 35 ; WX 600 ; N numbersign ; B 93 -32 507 639 ;

For each glyph, there is a line giving the index number (33), the width of the font (600 units out of 1000), the name of the glyph (exclam) and the Bounding Box which can be drawn around it.

This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>