Daniel When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

Custom HTML Font Mapping in PDF to HTML conversion

1 min read

I’ve been trying to find a decent way for users of our PDF to HTML5 converter to adjust the fonts to their taste.  At the moment we have to replace the fonts with the ones available in HTML.  In the future we plan to extract the font data but for the time being the text and font size is extracted from the PDF and a reasonable approximation is made.  However, when encountering a unknown font the software ultimately has to make a ‘best guess’.

I decided to base the default font information on a XML file, therefore adding the option for users to make adjustments to what they feel is best regarding the HTML version of their fonts and the various shapes and sizes they can come it.  If you are using our converter and you find that some of the fonts are squeezed too close together or that the choice of HTML font is unsuitable your can do the follow.

When you run the ExtractPagesAsHTML add a VM flag: -Dorg.jpedal.saveXML=”path/to/my/newXMLfile” and when the extraction takes place you will find a XML file with a list of the fonts found in the given PDF.  Like so:

xml version="1.0" encoding="UTF-8" standalone="no"?>F_tst_03'Times New Roman', Times, serifheavyF_tst_2-1'Times New Roman', Times, serif

As you can see the name given to the fonts within the PDF file can be pretty useless!  RawFont is effectively a unique identifier for a font, this is followed by a size adjust which will adjust the value of the font size.  MappedTo is the CSS that will be refered to when selecting the font and weight and style are the HTML attributes of the font.  The elements can all be adjusted and then can be reloaded and used for the conversion.  Just use -Dorg.jpedal.loadXML=”path/to/my/newXMLfile” and the PDF will be converted with the new adjustments taken into account.

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

 

Daniel When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

Converting your PDF files to HTML5 with BuildVu 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

How HTML5 Javadocs in Java 9 will make your…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Rob
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *