When we decided to investigate the possibility of PDF to HTML conversion, we looked at HTML5 to see whether it could be used to represent the sort of rich content found in PDF files. We have rejected previous versions of HTML and XHTML because they did not have the market support or the features. These were 4 main areas of concern:-
Unlike previous versions of HTML, support for HTML5 is broad and fairly good. HTML5 is supported by Chrome, latest IE, Firefox and Safari and most mobile platforms. We have found a couple of interesting ‘features’ where HTML files worked on Safari on the Mac but not on the IPad – but that’s what makes development fun 😉
Overall we were impressed with HTML 5 adoption in the marketplace.
2. Text capabilities
PDF can do alsorts of fancy tricks with text. HTML5 offers 2 formats for displaying text. You can use CSS to control text (adding fonts and fancy effects) and it also offers a canvas (a bitmap drawing surface which you can draw onto). The canvas offers more flexibility but it is bitmapped so looks pixellated if you scale in.
The biggest issue with PDF files is matching embedded fonts (which can be different sizes) so that spacing and appearance looks correct in HTML. For a later version we might explore writing out the font data as a Truetype font (although this raises lots of legal and technical issues). Overall we felt HTML5 offers enough support to do a decent job.
PDF files contain 2 types of image. Bitmapped images are easy – convert to PNG and add to the HTML page. Just allow for any clip first and apply the clip to the image.
PDF files also contain Vector graphics. The Canvas object is very similar to the Java Graphics2D object, providing lots of draw primitives.
Having looked at HTML5 in detail, we were satisfied that it offered the level of support we needed. In later articles, we will go into the details.
Click here to see all the article in the PDF to HTML5 conversion series.
This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.