When you convert a PDF to HTML, you move from a coding environment where you can work in any floating point size to one where you have to work with int pt sizes. This can cause some issues – if the PDF font size is 8.5pt should we use 8 or 9 in the HTML (8.5 is not allowed and both 8 and 9 are essentially wrong as they will not fit exactly)?
To improve the positioning of the text in HTML we alter the CharSpacing (moving the characters together or apart) and also see if adjusting the font size would produce a better fit. This is especially important when substituting fonts in the HTML – some fonts I have seen are much thinner than the replacement HTML fonts so 9pt in font X is actually more like 18pt in the font we use to display as HTML.
The drawback with this approach is that it can result in blocks of text with slightly changing font sizes (8,9,8,9,9,8 pt for example). This is more accurate but looks odd. So in our latest release we have added a compromise in the HTML. We will ignore small changes in HTML font size (so that line of text will now be 8,8,8,8,8,8 pt) but the 9pt is still changed to 18pt). And we allow the user to adjust the threshold value using the code call
//only adjust font if change bigger than 5pt HTMLoutput.setValue(HTMLDisplay.UseFontResizing, 5);
Altering the value 5 to zero would include any tiny change and 10 would only alter the font size if the change was great than 10 pt. 5 seems a sensible default and you can experiment with the best compromise for your files in converting PDF to HTML. What value works best for you?
Now your customers can view pages inside a PDF at a lightning speed!
Find out how one company did it with BuildVu
BuildVu allows you to
View PDF files in a Web app |
Convert PDF documents to HTML5 |
Parse PDF documents as HTML |