A popular trick in PDF files is to print some text twice (with the second character moved slightly) to create a bold effect.
You cannot do this in HTML5 so all you get is double text overlapping. How ugly!
So we add some ‘intelligence’ into the conversion to ignore these characters (it needs to be smart enough to work correctly when we get genuine double characters like following or moon so we look at the position and gap between the letters).
This gives a much better representation of the text 🙂
The PDF file format uses lots of tricks which work very well for PDF but need care in being translated in HTML5.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.