Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

What are subsetted fonts in PDF files?

2 min read

What are subsetted fonts?

Subsetted fonts are fonts which  only include certain values. If you look at the fonts on your Computer you will see that they are often very large and contain all the possible values. In a PDF file, we know which values are used, and they might just be one or two values. So we could create a version of the font for our PDF file containing just these values. This will result in much smaller files.

How do they work?

We can achieve this by making use of a powerful feature of the PDF file specification with the ability to create custom font encodings. This means that for each font you can choose exactly what glyph value the text index values used in the Tj command map onto. This has a number of advantages, including:-

1. Making it very easy to map font values with subsetted fonts (especially CID fonts). If you are only using a few glyphs in a font, this can substantially reduce the PDF file size and improve its loading speed.

2. Make it simple to map values from any system/platform so that they display correctly on multiple platforms.

Custom encoding is setup in the /Differences value of the Encoding dictionary. Here is an example

246 0 obj
<<
/Type /Encoding
/BaseEncoding /MacRomanEncoding
/Differences [32/space 97/a 99/c/d/e/f 104/h/i 108/l 
110/n/o 115/s/t/u 121/y]
>>
endobj

The /BaseEncoding defines the general encoding to use and then the /Differences key lists our changes. It is a number value followed by one or more values (if it is one or more we increment the counter). So [32/space 97/a 99/c/d/e/f 104/h/i 108/l 110/n/o 115/s/t/u 121/y] would define 32 as space, 97 as a, 99 and c, 100 as d, 101 as e, etc. So you can can mix and match standard and non-standard values.

In this case, all the values are glyphs, but they can also be octal, hex or decimal values. Here is an example I found this week in a PDF file.

obj
<<
/Differences[2/71/105/108/76/111/98/101/
73/117/115/116/114/97/100/121/82/99/104
/87/110 32/space]
/Type/Encoding
>>

If the number has a / it is a value, otherwise it is a next glyph number to use which is rather confusing.

How are names defined in Differences?

You can actually have any value as a glyph (like missing_glyph) but if it is a non-standard one then you need to define it in your font.

In theory, it should be very simple – the glyph number followed by one or more glyph names. Here are 2 examples from the PDF reference guide.

9 0 obj

/Differences [ 97 /square /triangle ]

...

 /square 11 0 R
/triangle 12 0 R

endobj
25 0 obj
<< /Type /Encoding
/Differences
[
39 /quotesingle
96 /grave
128 /Adieresis /Aring /Ccedilla /Eacute /Ntilde /Odieresis /Udieresis
/aacute /agrave /acircumflex /adieresis /atilde /aring /ccedilla
/eacute /egrave /ecircumflex /edieresis /iacute /igrave /icircumflex
/idieresis /ntilde /oacute /ograve /ocircumflex /odieresis /otilde
/uacute /ugrave /ucircumflex /udieresis /dagger /degree /cent
/sterling /section /bullet /paragraph /germandbls /registered
/copyright /trademark /acute /dieresis
174 /AE /Oslash
177 /plusminus
180 /yen /mu
187 /ordfeminine /ordmasculine

So far so good. It turns out that you can use any character name, so long as you use the same reference elsewhere (as in the first example).

So if you think you have it figured out, here is an interesting example for you…

<i255/5/6/7/8/9/10
 /11/12/13/14/15/16/17/18/19/20
/21/22/23/24/25/26/27/28/29/30
/31/32/33/34/35/36/37/38/39/40
/41/42/43/44/45/46/47/49/50
/51/52/53/54/55/56/57/58/59/60
/61/62/63/64/65/66/67/68/69/70
/71/72/73/74/75/76/77/78/79/80
/81/82/83/84/85/86/87/88/89/90
/91/92/93/94/95/96/97/98/99/100
/101/102/103/104/105/106/107/108/109/110
/111/112/113/114/115/116/117/118/119/120
/121/122/123/124/125/126/127/128/129/130
/131/132/133]/Type/Encoding

This is very confusing because numbers are being used as character names (and the number is the WIN encoding value for the actual character). So you have to be very exact about whether a value is preceded by a slash (/) which shows it is a name rather than a number. There is also a rather odd i255 value. This PDF file looks like it was created with Ghostscript.

Final thoughts

So Differences gives you a very powerful way to define custom glyph settings and only embed the values you use in a PDF file.



Our software libraries allow you to

Convert PDF files to HTML
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Read and write HEIC and other Image formats in Java
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.