PDF to HTML5 conversion – How it works

A potential customer asked how PDF to HTML conversion works so here is an explanation…

A PDF file is more like a traditional computer program than a traditional file format. You execute the instructions in the PDF file which writes out shapes, text and images to a display and the finished result is your page. It was developed from Postscript – the programming language which revolutionised printing by letting computers and cheap printers produce beautiful copy (in the right hands – you need some design talents to produce great design).

So the first thing you need is a PDF parser. Luckily we happen to have one of those lying around which we have been developing for the last 11 years. So it is robust, powerful and tested. We provided hooks so that we could link in to the points where it would write out the text, shapes and images and altered those to generate the required HTML5 instead. Usefully, because the code was designed to write to Java (which works in sRGB), we also had all the conversions in place to we could use RGB as the display format whatever was in the PDF.

Sometimes you need to make changes to the HTML code to allow for differences in the way it works to PDF – for example there is a clip in PDF but not HTML so images need to be preclipped. That’s where we are now – testing a lot of files and improving the output. And there are lots of features we are adding in – I am currently working on Truetype fonts. Hope that helps explain it. Give the convertor a try and let us know what you think… There are instructions on using it here.

Click here to see all the articles in the PDF to HTML5 conversion series.

This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

2 thoughts on “PDF to HTML5 conversion – How it works

  1. Manjini

    Hi,

    I have downloaded the pdf2html pro version.

    pls. help me with the instructions, how to convert the pdf to html5.

    regards
    Manjini

  2. There is sample Java code at http://www.jpedal.org/html_support.php – does that help?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>