When converting a PDF file into HTML5 content, there are different ways to go about it depending on your priorities. IDRsolutions have been working on this thorny issue for over 4 year now. We have found that HTML5 contains different features compared to PDF, so there is not always a direct match.
For example, the PDF file format allows individual control over the spacing between each text character. You can emulate this in HTML5 by putting each character in its own div tag but this can create large files. So in this case is an exact layout or smaller filesize more important to you?
1. Make a image of the page and display it as an image in HTML5
Advantages: looks identical.
Disadvantages: Large file size, does not scale, text not selectable.
2. Make a image of the page and display it as an image in HTML5 and hide text behind it
Advantages: looks identical and gives text selection.
Disadvantages: Large file size, does not scale, text may be ignored by search engines.
3. Convert all Contents into closest HTML5 equivalent. Text becomes text, images stay as images and Vector content can be put on canvas or image
Advantages: Smaller file size, perfect zoom on text, searchable.
Disadvantages: PDF layout not exactly reproducible, lots of work with font conversions. Can you use the fonts?
4. Convert content into SVG (which can be shown in HTML5)
Advantages: SVG often looks better than HTML5 for text and images.
Disadvantages: SVG does not offer key HTML5 features like forms.
In practise we find that a combination of the above is the best general case (with the option to use other modes if more appropriate).
What mode works best for you?
This post is part of our “HTML5 Article Index” where you can learn more about HTML5.
Do you need to write or read JPEG in Java?
We have an easy guide on how to write JPEG in Java using ImageIO and JDeli.
You can learn how to read/write most of the image files in ImageIO. However, it gives little control over the process.
JDeli is easy to use and offers complete support, so why don't you give a try?