A popular trick in PDF files is to print some text twice (with the second character moved slightly) to create a bold effect.
You cannot do this in HTML5 so all you get is double text overlapping. How ugly!
So we add some ‘intelligence’ into the conversion to ignore these characters (it needs to be smart enough to work correctly when we get genuine double characters like following or moon so we look at the position and gap between the letters).
This gives a much better representation of the text 🙂
The PDF file format uses lots of tricks which work very well for PDF but need care in being translated in HTML5.