I’ve been trying to find a decent way for users of our PDF to HTML5 converter to adjust the fonts to their taste. At the moment we have to replace the fonts with the ones available in HTML. In the future we plan to extract the font data but for the time being the text and font size is extracted from the PDF and a reasonable approximation is made. However, when encountering a unknown font the software ultimately has to make a ‘best guess’.
I decided to base the default font information on a XML file, therefore adding the option for users to make adjustments to what they feel is best regarding the HTML version of their fonts and the various shapes and sizes they can come it. If you are using our converter and you find that some of the fonts are squeezed too close together or that the choice of HTML font is unsuitable your can do the follow.
When you run the ExtractPagesAsHTML add a VM flag: -Dorg.jpedal.saveXML=”path/to/my/newXMLfile” and when the extraction takes place you will find a XML file with a list of the fonts found in the given PDF. Like so:
xml version="1.0" encoding="UTF-8" standalone="no"?>F_tst_03'Times New Roman', Times, serifheavyF_tst_2-1'Times New Roman', Times, serif
As you can see the name given to the fonts within the PDF file can be pretty useless! RawFont is effectively a unique identifier for a font, this is followed by a size adjust which will adjust the value of the font size. MappedTo is the CSS that will be refered to when selecting the font and weight and style are the HTML attributes of the font. The elements can all be adjusted and then can be reloaded and used for the conversion. Just use -Dorg.jpedal.loadXML=”path/to/my/newXMLfile” and the PDF will be converted with the new adjustments taken into account.
This post is part of our “Fonts Articles Index” in these articles we explore Fonts.