4 ways to convert PDF to HTML5

When converting a PDF file into HTML5 content, there are different ways to go about it depending on your priorities. IDRsolutions have been working on this thorny issue for over 4 year now. We have found that HTML5 contains different features compared to PDF, so there is not always a direct match.

For example, the PDF file format allows individual control over the spacing between each text character. You can emulate this in HTML5 by putting each character in its own div tag but this can create large files. So in this case is an exact layout or smaller filesize more important to you?

convertPDF2HTML5

Convert PDF Files to HTML5

 

1. Make a image of the page and display it as an image in HTML5

Advantages: looks identical.
Disadvantages: Large file size, does not scale, text not selectable.

2. Make a image of the page and display it as an image in HTML5 and hide text behind it

Advantages: looks identical and gives text selection.
Disadvantages: Large file size, does not scale, text may be ignored by search engines.

3. Convert all Contents into closest HTML5 equivalent. Text becomes text, images stay as images and Vector content can be put on canvas or image

Advantages: Smaller file size, perfect zoom on text, searchable.
Disadvantages: PDF layout not exactly reproducible, lots of work with font conversions. Can you use the fonts?

4. Convert content into SVG (which can be shown in HTML5)

Advantages: SVG often looks better than HTML5 for text and images.
Disadvantages: SVG does not offer key HTML5 features like forms.

In practise we find that a combination of the above is the best general case (with the option to use other modes if more appropriate).

You can see some good examples of HTML5 conversions on the IDRsolutions with our  HTML5 examples page and experiment with the different methods with our free online converter.

What mode works best for you?

This post is part of our “HTML5 Article Index” where you can learn more about HTML5.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>