PDF to HTML5 conversion – 2 ways to display text in HTML

One of our key considerations with PDF to HTML conversion was the quality of the text. We have seen too many attempts which just generate a bitmap screenshot of the page and displays that. We wanted proper text so you could zoom in properly and have compact HTML files. It turns out that there are TWO ways to show text in HTML5.

1. HTML tags and CSS

The way which provides best quality is to put the text in a

tag and then position it via CSS. The CSS can either be inline in the HTML file or in a separate file mentioned in the HTML header. You can also use the CSS to set the display features of the text. Here is an example fragment
t1">Some

This is positioned using this CSS

#t1 {
position:absolute;
left:67.4999px;
top:47.60309px;
FONT-SIZE: 38px;
FONT-FAMILY: 'Times New Roman', Times, serif;
color:rgb(0,85,149);
}

This works very nicely for most text and the text is scaled as you zoom in so it is what we use by default.

2. Draw the text onto the context

The PDF file format allows very fine control over text which can be rotated and distorted by a matrix. CSS does not ¬†really support this but a Canvas translates very easily from Java’s Graphics2D. Here is how to add rotated text to appear in HTML via a Canvas object. First you setup the Canvas in your Javascript

Then the text can be drawn on using code very similar to what you might use in Java. It is a good idea to restore the values to default each time.

pdf_context.save();
pdf_context.translate(545.065,276.28992);
pdf_context.rotate(1.5707964);
pdf_context.fillText("R",0.0,0.0);
pdf_context.restore();

This does not rescale as you zoom in but does allow more control so we use only when we need this fine accuracy. Next time I will talk about drawing Vector graphics from PDF files in HTML

Click here to see all the article in the PDF to HTML5 conversion series.

This post is part of our “HTML5 Article index” in these articles, we aim to help you understand the world of HTML5.

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>