PDF to HTML5 conversion – Using chrome to enhance Text layout

I have been working on mapping text as closely as possible to PDF layout and thought an article on progress would be of interest as it shows how you can debug HTML5 and what we are up to…

I have been working on some sample newspaper pages. These are good samples to work with because the multi-column format makes them particularly challenging.

The first improvement was actually to spot that there was a bug in our code (it does sadly happen!). If the space occupied was exactly the same size as the slot on the PDF page our code still reduced the fontsize by one point. That is now fixed!

When investigating HTML5 issues, Chrome has a neat debugger which allows you to inspect any element on the page (right click the menu item over the page). This makes it very easy to examine the element on the page.

I was originally rounding the font size up or down if it gave a better fit on same in a simple symmetrical matter. Using this it became clear that when adjusting the font size for best fit, it was better to stick to the lower size value unless the gap was over half the font size. This gives a better representation.

newspaper page

The big issue with text from a PDF file is that quite often the best font size would be 8.5pt but I have to choose between 8pt and 9pt. This means that I cannot reproduce the hard right-aligned margin on the columns of text (but it is a reasonably good representation of the page. As the page uses Type1 embedded fonts, it will look even better when we add in our Type1 to OTF font support and include the actual fonts as web fonts!

How do you like to debug HTML5?

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>