When people convert PDF files into HTML files, they tend to be disappointed with the results. The main reason for this tends to be that a straight conversion is not possible. PDF files can contain a large number of structures which have no direct equivalent in HTML (even in the new HTML5). PDF was designed as a format to be viewed – the file is painted onto the page and the user sees the end result. Many PDFs are generated from strips of images or overlapping overlays which need to fit together exactly.
People also expect the text in an HTML file to be in the correct order. Because a PDF is generating a ‘picture’ this is not always going to happen. Some PDF creation tools draw the text in very odd ways – I explained this in more detail in a previous article: PDF text. The text looks correct because your brain sees he finished output and interprets it.
If image quality in the HTML is important, you could convert the PDF into a image and display that, but then all interaction is lost and you need big files for high resolution.
Updated 2012 – since I wrote this, I have indeed had a look at HTML5 and you can read the results in other blog articles.
This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|Convert PDF to HTML5 or SVG|
|Convert AcroForms and XFA to HTML5|
|Java PDF SDK for working with PDF files|