Update: JPDF2HTM5 has been rebranded as BuildVu and JPDFForms has been rebranded as FormVu

PDF to HTML5 conversion – Where are my hyphens?

The devil is always in the detail with the PDF spec. I have been working on a PDF file where the Hyphen character was not appearing in the converted HTML5 output. This was odd as I have seen it on loads of other samples. So we drilled down to see what was going on…

When you map glyph indices onto the actual characters that are displayed there are several ways to do this. One of these involves a set of mapping character tables (Appendix D in the PDF spec if you want to look it up). There are then a whole load of exceptions to this and one of these had not been correctly coded by me. The one missing was

The hyphen character is also encoded as 255 in WinAnsiEncoding. The meaning of this duplicate code is “soft hyphen,” but it is typographically the same as hyphen.

A quick fix, regression test and reset the baseline onthe regression tests to lock in the fix and it is all resolved. But it is a really good example about the complexity of the PDF specification. Do you have any favourite gotchas in PDF?

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>