Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

PDF to HTML5 conversion – Mapping PDF features onto HTML

1 min read

When we decided to investigate the possibility of PDF to HTML conversion, we looked at HTML5 to see whether it could be used to represent the sort of rich content found in PDF files. We have rejected previous versions of HTML and XHTML because they did not have the market support or the features. These were 4 main areas of concern:-

1. Support

Unlike previous versions of HTML, support for HTML5 is broad and fairly good. HTML5 is supported by Chrome, latest IE, Firefox and Safari and most mobile platforms. We have found a couple of interesting ‘features’ where HTML files worked on Safari on the Mac but not on the IPad – but that’s what makes development fun šŸ˜‰

Overall we were impressed with HTML 5 adoption in the marketplace.

2. Text capabilities

PDF can do alsorts of fancy tricks with text. HTML5 offers 2 formats for displaying text. You can use CSS to control text (adding fonts and fancy effects) and it also offers a canvas (a bitmap drawing surface which you can draw onto). Ā The canvas offers more flexibility but it is bitmapped so looks pixellated if you scale in.

The biggest issue with PDF files is matching embedded fonts (which can be different sizes) so that spacing and appearance looks correct in HTML. For a later version we might explore writing out the font data as a Truetype font (although this raises lots of legal and technical issues). Overall we felt HTML5 offers enough support to do a decent job.

3. Images

PDF files contain 2 types of image. Bitmapped images are easy – convert to PNG and add to the HTML page. Just allow for any clip first and apply the clip to the image.

PDF files also Ā contain Vector graphics. The Canvas object is very similar to the Java Graphics2D object, providing lots of draw primitives.

4. Forms

PDF files contain interactive form elements. HTML5 offers lots of support for comparable forms including Javascript.

Having looked at HTML5 in detail, we were satisfied that it offered the level of support we needed. In later articles, we will go into the details.

Click here to see all the article in the PDF to HTML5 conversion series.

This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.

Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Converting your PDF files to HTML5 with BuildVuĀ 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

How HTML5 Javadocs in Java 9 will make your…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Rob
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *