Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

PDF to HTML conversion – tradeoffs on adjusting font size

1 min read

When you convert a PDF to HTML, you move from a coding environment where you can work in any floating point size to one where you have to work with int pt sizes. This can cause some issues – if the PDF font size is 8.5pt should we use 8 or 9 in the HTML (8.5 is not allowed and both 8 and 9 are essentially wrong as they will not fit exactly)?

To improve the positioning of the text in HTML we alter the CharSpacing (moving the characters together or apart) and also see if adjusting the font size would produce a better fit. This is especially important when substituting fonts in the HTML – some fonts I have seen are much thinner than the replacement HTML fonts so 9pt in font X is actually more like 18pt in the font we use to display as HTML.

The drawback with this approach is that it can result in blocks of text with slightly changing font sizes (8,9,8,9,9,8 pt for example). This is more accurate but looks odd. So in our latest release we have added a compromise in the HTML. We will ignore small changes in HTML font size (so that line of text will now be 8,8,8,8,8,8 pt) but the 9pt is still changed to 18pt). And we allow the user to adjust the threshold value using the code call

//only adjust font if change bigger than 5pt
HTMLoutput.setValue(HTMLDisplay.UseFontResizing, 5);

Altering the value 5 to zero would include any tiny change and 10 would only alter the font size if the change was great than 10 pt. 5 seems a sensible default and you can experiment with the best compromise for your files in converting PDF to HTML. What value works best for you?



BuildVu allows you to

View PDF files in a Web app
Convert PDF documents to HTML5
Parse PDF documents as HTML
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.