When converting a PDF file into HTML5 content, there are different ways to go about it depending on your priorities. IDRsolutions have been working on this thorny issue for over 4 year now. We have found that HTML5 contains different features compared to PDF, so there is not always a direct match.
For example, the PDF file format allows individual control over the spacing between each text character. You can emulate this in HTML5 by putting each character in its own div tag but this can create large files. So in this case is an exact layout or smaller filesize more important to you?
1. Make a image of the page and display it as an image in HTML5
Advantages: looks identical.
Disadvantages: Large file size, does not scale, text not selectable.
2. Make a image of the page and display it as an image in HTML5 and hide text behind it
Advantages: looks identical and gives text selection.
Disadvantages: Large file size, does not scale, text may be ignored by search engines.
3. Convert all Contents into closest HTML5 equivalent. Text becomes text, images stay as images and Vector content can be put on canvas or image
Advantages: Smaller file size, perfect zoom on text, searchable.
Disadvantages: PDF layout not exactly reproducible, lots of work with font conversions. Can you use the fonts?
4. Convert content into SVG (which can be shown in HTML5)
Advantages: SVG often looks better than HTML5 for text and images.
Disadvantages: SVG does not offer key HTML5 features like forms.
In practise we find that a combination of the above is the best general case (with the option to use other modes if more appropriate).
What mode works best for you?
This post is part of our “HTML5 Article Index” where you can learn more about HTML5.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.