Problems you can meet in implementing a PDF library – ArrayIndexOutOfBoundsException in a Font

In theory there is a PDF File Specification (which every PDF creation tool implements and we just have to follow)…

People often ask about the sort of problems you can find when trying to develop a PDF library. I came across this example earlier this week and thought it would make a good example of the type of issue which can cause issues.

We were sent a PDF file which did not work. Our Java code was throwing an ArrayIndexOutOfBoundsException error on the file. So we opened it up to have a look…

Non-CID fonts can contain up to 256 values (0-255) and you can also define exactly which glyphs map onto the values. Here is the font encoding value from the PDF file. This is a simple PDF object containing a set of keys value pairs. It has a type of /Font and tells us it is a Truetype font with 255 values and tells us the width and glyph values in the /Widths and /Encoding entry. The /FontDescriptor points to anothe robject with the actual font drawing details.

6 0 obj
<<
/Type /Font
/Subtype /TrueType
/Name /F00
/BaseFont /Arial
/FirstChar 0
/LastChar 255
/Widths [
768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 768 285 285 364 570 570 911 683 196 341 341 399 598 285 341 285 285 570 570 570 570 570 570 570 570 570 570 285 285 598 598 598 570 1040 683 683 740 740 683 626 797 740 285 512 683 570 853 740 797 683 797 740 683 626 740 683 967 683 683 626 285 285 285 481 570 341 570 570 512 570 570 285 570 570 228 228 512 228 853 570 570 570 570 341 512 285 570 512 740 512 512 512 342 266 342 598 768 570 768 228 570 341 1024 570 570 341 1024 683 341 1024 768 626 768 768 228 228 341 341 359 570 1024 341 1024 512 341 967 768 512 683 285 341 570 570 570 570 266 570 341 755 379 570 598 341 755 566 410 562 341 341 341 590 550 341 341 341 374 570 854 854 854 626 683 683 683 683 683 683 1024 740 683 683 683 683 285 285 285 285 740 740 797 797 797 797 797 598 797 740 740 740 740 683 683 626 570 570 570 570 570 570 911 512 570 570 570 570 285 285 285 285 570 570 570 570 570 570 570 562 626 570 570 570 570 512 570 512 ]
/Encoding quotedbl/numbersign/dollar/percent/ ampersand/quotesingle/parenleft/parenright/asterisk/plus/comma/hyphen/period/ slash/zero/one/two/three/four/five/six/seven/eight/nine/colon/semicolon/less/ equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R/S/T/U/ V/W/X/Y/Z/bracketleft/backslash/bracketright/asciicircum/underscore/grave/ a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/braceleft/bar/braceright/ asciitilde/space/Euro/space/quotesinglbase/florin/quotedblbase/ellipsis/dagger/ daggerdbl/circumflex/perthousand/Scaron/guilsinglleft/OE/space/Zcaron/space/ space/quoteleft/quoteright/quotedblleft/quotedblright/bullet/endash/emdash/ tilde/trademark/scaron/guilsinglright/oe/space/zcaron/Ydieresis/space/exclamdown/ cent/sterling/currency/yen/brokenbar/section/dieresis/copyright/ordfeminine/ guillemotleft/logicalnot/hyphen/registered/macron/degree/plusminus/twosuperior/ threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior/ordmasculine/ guillemotright/onequarter/onehalf/threequarters/questiondown/Agrave/Aacute/ Acircumflex/Atilde/Adieresis/Aring/AE/Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/ Igrave/Iacute/Icircumflex/Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/ Odieresis/multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn/ germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla/egrave/ eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis/eth/ntilde/ograve/ oacute/ocircumflex/otilde/odieresis/divide/oslash/ugrave/uacute/ucircumflex/ udieresis/yacute/thorn/ydieresis/space]>>

/FontDescriptor 7 0 R
>>
endobj

The problem with this file is the /Encoding value. It actually defines too many values. At the end, you can see value 255 (ydieresis) and a spurious space value (256). This is where our ArrayIndexOutOfBoundsException was being caused. The fix is easy in this case – we just ignore any such values and it all works. The annoying thing is that if the spec was enforced, it would not occur. As far as the Producer tool is concerned, it opens in Acrobat so it is okay. And if does not work in our code, it must be our bug…

Do you have any favourite nasty ways in which PDF creation tools ignore the specification?

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>