Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Do you need to process or display PDF files?

Find out why you should be using IDRSolutions software

PDF to HTML conversion – tradeoffs on adjusting font size

1 min read

When you convert a PDF to HTML, you move from a coding environment where you can work in any floating point size to one where you have to work with int pt sizes. This can cause some issues – if the PDF font size is 8.5pt should we use 8 or 9 in the HTML (8.5 is not allowed and both 8 and 9 are essentially wrong as they will not fit exactly)?

To improve the positioning of the text in HTML we alter the CharSpacing (moving the characters together or apart) and also see if adjusting the font size would produce a better fit. This is especially important when substituting fonts in the HTML – some fonts I have seen are much thinner than the replacement HTML fonts so 9pt in font X is actually more like 18pt in the font we use to display as HTML.

The drawback with this approach is that it can result in blocks of text with slightly changing font sizes (8,9,8,9,9,8 pt for example). This is more accurate but looks odd. So in our latest release we have added a compromise in the HTML. We will ignore small changes in HTML font size (so that line of text will now be 8,8,8,8,8,8 pt) but the 9pt is still changed to 18pt). And we allow the user to adjust the threshold value using the code call

//only adjust font if change bigger than 5pt
HTMLoutput.setValue(HTMLDisplay.UseFontResizing, 5);

Altering the value 5 to zero would include any tiny change and 10 would only alter the font size if the change was great than 10 pt. 5 seems a sensible default and you can experiment with the best compromise for your files in converting PDF to HTML. What value works best for you?



Our software libraries allow you to

Convert PDF files to HTML
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Read and write HEIC and other Image formats in Java
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.