One of our key considerations with PDF to HTML conversion was the quality of the text. We have seen too many attempts which just generate a bitmap screenshot of the page and displays that. We wanted proper text so you could zoom in properly and have compact HTML files. It turns out that there are TWO ways to show text in HTML5.
1. HTML tags and CSS
The way which provides best quality is to put the text in a
t1">Some
This is positioned using this CSS
#t1 { position:absolute; left:67.4999px; top:47.60309px; FONT-SIZE: 38px; FONT-FAMILY: 'Times New Roman', Times, serif; color:rgb(0,85,149); }
This works very nicely for most text and the text is scaled as you zoom in so it is what we use by default.
2. Draw the text onto the context
The PDF file format allows very fine control over text which can be rotated and distorted by a matrix. CSS does not really support this but a Canvas translates very easily from Java’s Graphics2D. Here is how to add rotated text to appear in HTML via a Canvas object. First you setup the Canvas in your Javascript
Then the text can be drawn on using code very similar to what you might use in Java. It is a good idea to restore the values to default each time.
pdf_context.save(); pdf_context.translate(545.065,276.28992); pdf_context.rotate(1.5707964); pdf_context.fillText("R",0.0,0.0); pdf_context.restore();
This does not rescale as you zoom in but does allow more control so we use only when we need this fine accuracy. Next time I will talk about drawing Vector graphics from PDF files in HTML
Click here to see all the article in the PDF to HTML5 conversion series.
This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.
IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.