PDF to HTML5 conversion – Mapping PDF features onto HTML

When we decided to investigate the possibility of PDF to HTML conversion, we looked at HTML5 to see whether it could be used to represent the sort of rich content found in PDF files. We have rejected previous versions of HTML and XHTML because they did not have the market support or the features. These were 4 main areas of concern:-

1. Support

Unlike previous versions of HTML, support for HTML5 is broad and fairly good. HTML5 is supported by Chrome, latest IE, Firefox and Safari and most mobile platforms. We have found a couple of interesting ‘features’ where HTML files worked on Safari on the Mac but not on the IPad – but that’s what makes development fun 😉

Overall we were impressed with HTML 5 adoption in the marketplace.

2. Text capabilities

PDF can do alsorts of fancy tricks with text. HTML5 offers 2 formats for displaying text. You can use CSS to control text (adding fonts and fancy effects) and it also offers a canvas (a bitmap drawing surface which you can draw onto).  The canvas offers more flexibility but it is bitmapped so looks pixellated if you scale in.

The biggest issue with PDF files is matching embedded fonts (which can be different sizes) so that spacing and appearance looks correct in HTML. For a later version we might explore writing out the font data as a Truetype font (although this raises lots of legal and technical issues). Overall we felt HTML5 offers enough support to do a decent job.

3. Images

PDF files contain 2 types of image. Bitmapped images are easy – convert to PNG and add to the HTML page. Just allow for any clip first and apply the clip to the image.

PDF files also  contain Vector graphics. The Canvas object is very similar to the Java Graphics2D object, providing lots of draw primitives.

4. Forms

PDF files contain interactive form elements. HTML5 offers lots of support for comparable forms including Javascript.

Having looked at HTML5 in detail, we were satisfied that it offered the level of support we needed. In later articles, we will go into the details.

Click here to see all the article in the PDF to HTML5 conversion series.

This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>