If you are familiar with the PDF file format specification, you will know just how powerful the text handling capabilities are. A range of parameters (9 in total) can be set on the Text State giving very fine control of text display. Here is a list of those parameters:
In addition to the above parameters, it’s also possible to adjust kerning, providing individual glyph positioning in a very concise way:
As an end display file format, there is no concept of “justify this line of text”. In PDF, a justified line of text is as a result of carefully setting the Text State and Kerning for that line of text. This removes the potential of applications having differing definitions of what “justify this line of text” means, meaning that regardless of where or how you are viewing the PDF, the line of text will appear exactly as intended.
Unfortunately, if you are familiar with the HTML spec, you will know just how powerful the text handling capabilities are not. In addition to this, HTML is not known for its rigid guidelines that all applications follow that mean your document will appear exactly as intended however you view it.
If you are aiming to exactly replicate a PDF in HTML where the PDF has complex custom spacing, you are in for a hard time. This is why the vast majority of the applications that claim to convert PDF to HTML are actually fooling you by rasterizing the text to image, and providing invisible text on top that is positioned somewhere near to where it needs to be.
We offer a text mode that allows you to do this too, but are also proud of the fact that we also offer a text mode that outputs real text (with converted PDF fonts) whilst still maintaining a very close representation of the original PDF. We do have an option that can be used to position each glyph individually as a PDF would, but unfortunately HTML is quite a bit more verbose than PDF, resulting in impractically large HTML files.
1. It provides a more accurate representation of the PDF
4. It even works in IE6!
This update will be available in Friday’s release.
This post is part of our “Fonts Articles Index” in these articles we explore Fonts.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.