Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

PDF to HTML5 conversion – Newspaper page layout

48 sec read

I started IDRsolutions while working for the Times Newspaper group in the 1990s. So I know that the complex page layout on Newspaper pages tends to raise a whole load of special issues. But also that it provides some really good case studies to hone our technology. Here is an example I would like to share.

In our PDF to HTML5 conversion process, there is a trade-off. We can position every glyf on its own. In this case we get accurate but very large HTML files. Or we can roll the text together into lines Рlosing a little accuracy but producing much smaller files. This grouping is also important because our Javascript will attempt to auto-fit the text blocks into their correct spaces Рone  long line will look much better than 2 blocks.

Here is an example with one line highlighted. You will notice there are big spaces between the words on the highlighted line. It comes from a live Newspaper page (reproduced with permission) to show the issue.

paper from pdf

If we split out the individual words, we get this which does not look too good.

first HTML5 attempt

So let us be more fussy on what breaks we allow and try to keep the text as a single block.

 

It needs some more work and tuning but definitely a step in the right direction. What do you think?

Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Converting your PDF files to HTML5 with BuildVu 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Leave a Reply

Your email address will not be published. Required fields are marked *