Cleaning up our HTML5 Output – Text Spacing adjustment without JavaScript

If you are familiar with the PDF file format specification, you will know just how powerful the text handling capabilities are. A range of parameters (9 in total) can be set on the Text State giving very fine control of text display. Here is a list of those parameters:

TextState

In addition to the above parameters, it’s also possible to adjust kerning, providing individual glyph positioning in a very concise way:

Kerning

As an end display file format, there is no concept of “justify this line of text”. In PDF, a justified line of text is as a result of carefully setting the Text State and Kerning for that line of text. This removes the potential of applications having differing definitions of what “justify this line of text” means, meaning that regardless of where or how you are viewing the PDF, the line of text will appear exactly as intended.

Unfortunately, if you are familiar with the HTML spec, you will know just how powerful the text handling capabilities are not. In addition to this, HTML is not known for its rigid guidelines that all applications follow that mean your document will appear exactly as intended however you view it.

If you are aiming to exactly replicate a PDF in HTML where the PDF has complex custom spacing, you are in for a hard time. This is why the vast majority of the applications that claim to convert PDF to HTML are actually fooling you by rasterizing the text to image, and providing invisible text on top that is positioned somewhere near to where it needs to be.

We offer a text mode that allows you to do this too, but are also proud of the fact that we also offer a text mode that outputs real text (with converted PDF fonts) whilst still maintaining a very close representation of the original PDF. We do have an option that can be used to position each glyph individually as a PDF would, but unfortunately HTML is quite a bit more verbose than PDF, resulting in impractically large HTML files.

Previously we have used some slightly hacky JavaScript to adjust the text’s spacing in order to compensate for HTML’s shortcomings, but this is not a popular solution as it increases file complexity and makes it more difficult to integrate our converted files. So there are various workarounds for some cases, but not an elegant global solution.

In Friday’s release, we are pleased to say that we have found a workaround, and no longer will our converted files require JavaScript to be executed when viewing. Our solution is now CSS based and offers many advantages over the previous solution.

1. It provides a more accurate representation of the PDF
2. It’s instant – no waiting around while the JavaScript updates the spacing
3. Our files now no longer require any JavaScript to be executed
4. It even works in IE6!

This update will be available in Friday’s release.

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the  Blog RSS.

Related Posts:

The following two tabs change content below.
Leon is a developer at IDRsolutions and product manager for JPDF2HTML5. He is responsible for managing the JPDF2HTML5 product strategy and roadmap, and also spends a lot of his time writing code to build new features, improve functionality, fix bugs, and improve the testing for JPDF2HTML5.
Leon Atherton

About Leon Atherton

Leon is a developer at IDRsolutions and product manager for JPDF2HTML5. He is responsible for managing the JPDF2HTML5 product strategy and roadmap, and also spends a lot of his time writing code to build new features, improve functionality, fix bugs, and improve the testing for JPDF2HTML5.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>