Daniel When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

Custom HTML Font Mapping in PDF to HTML conversion

1 min read

I’ve been trying to find a decent way for users of our PDF to HTML5 converter to adjust the fonts to their taste.  At the moment we have to replace the fonts with the ones available in HTML.  In the future we plan to extract the font data but for the time being the text and font size is extracted from the PDF and a reasonable approximation is made.  However, when encountering a unknown font the software ultimately has to make a ‘best guess’.

I decided to base the default font information on a XML file, therefore adding the option for users to make adjustments to what they feel is best regarding the HTML version of their fonts and the various shapes and sizes they can come it.  If you are using our converter and you find that some of the fonts are squeezed too close together or that the choice of HTML font is unsuitable your can do the follow.

When you run the ExtractPagesAsHTML add a VM flag: -Dorg.jpedal.saveXML=”path/to/my/newXMLfile” and when the extraction takes place you will find a XML file with a list of the fonts found in the given PDF.  Like so:

xml version="1.0" encoding="UTF-8" standalone="no"?>F_tst_03'Times New Roman', Times, serifheavyF_tst_2-1'Times New Roman', Times, serif

As you can see the name given to the fonts within the PDF file can be pretty useless!  RawFont is effectively a unique identifier for a font, this is followed by a size adjust which will adjust the value of the font size.  MappedTo is the CSS that will be refered to when selecting the font and weight and style are the HTML attributes of the font.  The elements can all be adjusted and then can be reloaded and used for the conversion.  Just use -Dorg.jpedal.loadXML=”path/to/my/newXMLfile” and the PDF will be converted with the new adjustments taken into account.

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

 



Are you a Developer working with PDF files?

Our developers guide contains a large number of technical posts to help you understand the PDF file Format.

Do you need to solve any of these problems?

Display PDF documents in a Web app
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Daniel When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.