Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

PDF to HTML5 conversion – Mapping PDF features onto HTML

1 min read

When we decided to investigate the possibility of PDF to HTML conversion, we looked at HTML5 to see whether it could be used to represent the sort of rich content found in PDF files. We have rejected previous versions of HTML and XHTML because they did not have the market support or the features. These were 4 main areas of concern:-

1. Support

Unlike previous versions of HTML, support for HTML5 is broad and fairly good. HTML5 is supported by Chrome, latest IE, Firefox and Safari and most mobile platforms. We have found a couple of interesting ‘features’ where HTML files worked on Safari on the Mac but not on the IPad – but that’s what makes development fun 😉

Overall we were impressed with HTML 5 adoption in the marketplace.

2. Text capabilities

PDF can do alsorts of fancy tricks with text. HTML5 offers 2 formats for displaying text. You can use CSS to control text (adding fonts and fancy effects) and it also offers a canvas (a bitmap drawing surface which you can draw onto).  The canvas offers more flexibility but it is bitmapped so looks pixellated if you scale in.

The biggest issue with PDF files is matching embedded fonts (which can be different sizes) so that spacing and appearance looks correct in HTML. For a later version we might explore writing out the font data as a Truetype font (although this raises lots of legal and technical issues). Overall we felt HTML5 offers enough support to do a decent job.

3. Images

PDF files contain 2 types of image. Bitmapped images are easy – convert to PNG and add to the HTML page. Just allow for any clip first and apply the clip to the image.

PDF files also  contain Vector graphics. The Canvas object is very similar to the Java Graphics2D object, providing lots of draw primitives.

4. Forms

PDF files contain interactive form elements. HTML5 offers lots of support for comparable forms including Javascript.

Having looked at HTML5 in detail, we were satisfied that it offered the level of support we needed. In later articles, we will go into the details.

Click here to see all the article in the PDF to HTML5 conversion series.

This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.



Are you a Developer working with PDF files?

Our developers guide contains a large number of technical posts to help you understand the PDF file Format.

Do you need to solve any of these problems?

Display PDF documents in a Web app
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.