One of the reasons that the PDF file format is so popular is that it embeds a large amount of font information in the PDF file, so that it can accurately reproduce the display as intended on any machine. It will not turn your beautifully crafted 12 page document into a horribly mis-formatted 14 page version, as Microsoft word does, if it cannot find all the fonts.
To help reduce the size of PDF files, the PDF file format specifies a set of font families (originally 14 and now reduced to 8) which all PDF readers should know about. No information is included in the PDF beyond the font name (ie Arial).
This is fantastic for PDF file size, but where do you get information about these fonts if you need it for some reason. With an embedded font, you have the width and bounding box of each glyph. It must be somewhere, because Adobe has access to it, but it is not in the PDF file. Where is the information stored?
It is actually stored inside a set of separate text files which are built into PDF readers. It contains a header with information about the font followed by a set of lines describing each glyph. This is part of what one looks like…
Comment Copyright (c) 1989, 1990, 1991, 1992, 1993, 1997 Adobe Systems Incorporated. All Rights Reserved.
Comment Creation Date: Thu May 1 17:27:09 1997
Comment UniqueID 43050
Comment VMusage 39754 50779
FontBBox -23 -250 715 805
Notice Copyright (c) 1989, 1990, 1991, 1992, 1993, 1997 Adobe Systems Incorporated. All Rights Reserved.
C 32 ; WX 600 ; N space ; B 0 0 0 0 ;
C 33 ; WX 600 ; N exclam ; B 236 -15 364 572 ;
C 34 ; WX 600 ; N quotedbl ; B 187 328 413 562 ;
C 35 ; WX 600 ; N numbersign ; B 93 -32 507 639 ;
For each glyph, there is a line giving the index number (33), the width of the font (600 units out of 1000), the name of the glyph (exclam) and the Bounding Box which can be drawn around it.
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.