Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

PDF to HTML5 conversion – ‘Creative’ use of space by Corel

54 sec read

I was sent an interesting PDF file to investigate this week. The issue was that spaces were appearing in the text when translated into HTML5. Intrigued, I dived in to see what was going on…

The PDF file was created using Corel’s PDF Engine. This often has its own unique way of doing things (to put it politely). So I drilled down and found the word in question which was appearing with a space in it. I copied it from Acrobat and it also had a space in it! I looked at the internal PDF command and the text was encoded as a single word with a space in it. But in the viewer (both our PDF viewer and Acrobat) no space is visible.

The reason for this is that there is a command in the PDF text commands called Tw. This allows you to define an additional amount of space (positive or negative) to be added when a space is drawn. In this case, the amount is set to cancel out the space exactly so it is there but does not appear as a gap when the PDF is viewed. I have altered our code to now ignore this when converting the PDF to HTML5 (and extracting text).

So if you are using, our PDF to HTML5 convertor and seeing odd spaces, try today’s release. The bigger mystery is why Corel needs to add spaces in the middle of words and then move the position back to ignore them – any ideas?

Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Converting your PDF files to HTML5 with BuildVuĀ 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

How HTML5 Javadocs in Java 9 will make your…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Rob
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *