Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

PDF to HTML5 conversion – Tradeoff of precision versus filesize

1 min read

When I was at school we used to have endless arguments in our Maths lessons about whether 3.000000000 was a more correct answer than 3 on its own. The answer is that it depends on the scenario.

We have a similar issue in PDF to HTML conversion where we have to decide how precise the answer should be. Internally we can work to about 6-8 decimal places with a reasonable degree of confidence?  But what should we put into the HTML output. Consider this example of 2 versions of HTML generated from the same PDF file. The first example is arguably less accurate but because an HTML file is a text file, the second example will produce a much larger file. We have some large sample PDF files where it can make a big difference.

#t105 {
position:absolute;
left:213px;
top:506px;
FONT-SIZE: 12px;
FONT-FAMILY: 'Times New Roman', Times, serif;
color:rgb(0,0,0);
}

pdf_context.moveTo(90,703);
pdf_context.lineTo(234,703);
pdf_context.lineTo(234,702);
pdf_context.lineTo(90,702);
pdf_context.lineTo(90,703);
#t105 {
position:absolute;
left:213.07166px;
top:506.77997px;
FONT-SIZE: 12px;
FONT-FAMILY: 'Times New Roman', Times, serif;
color:rgb(0,0,0);

pdf_context.moveTo(90.0,703.199996948);
pdf_context.lineTo(234.0,703.199996948);
pdf_context.lineTo(234.0,702.599998474);
pdf_context.lineTo(90.0,702.599998474);
pdf_context.lineTo(90.0,703.199996948);

So which is better? That answer depends on the PDF and the tradeoffs that the user is prepared to make. In cases like this, we set a default and allow the user to choose. The latest release of our PDF to HTML conversion software, adds this new line in the example so you can set it as you wish

DynamicVectorRenderer HTMLoutput=new HTMLDisplay(page, cropBox ,false,100, new ObjectStore());
HTMLoutput.setMaxNumberOfDecimalPlaces(0); //let use select max number of decimal places
HTMLoutput.setOutputDir(output_dir,outputName); //root for output
FormFactory HTMLFormFactory=new HTMLFormFactory(HTMLoutput, decode_pdf.getPdfPageData().getMediaBoxHeight(page));
HTMLFormFactory.setDecoder(decode_pdf);
decode_pdf.addExternalHandler(HTMLoutput, Options.CustomOutput); //custom object to draw PDF
decode_pdf.addExternalHandler(HTMLFormFactory, Options.FormFactory); //custom object to draw Forms

What do you think is the best tradeoff?

Click here to see all the articles in the PDF to HTML5 conversion series.

Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Converting your PDF files to HTML5 with PDF2HTML5 

Recently we announced our updated product range for 2018 and are introducing two new products named PDF2HTML5 and PDF2SVG along with rebranding some existing...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

How HTML5 Javadocs in Java 9 will make your…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Rob
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *