The devil is always in the detail with the PDF spec. I have been working on a PDF file where the Hyphen character was not appearing in the converted HTML5 output. This was odd as I have seen it on loads of other samples. So we drilled down to see what was going on…
When you map glyph indices onto the actual characters that are displayed there are several ways to do this. One of these involves a set of mapping character tables (Appendix D in the PDF spec if you want to look it up). There are then a whole load of exceptions to this and one of these had not been correctly coded by me. The one missing was
The hyphen character is also encoded as 255 in WinAnsiEncoding. The meaning of this duplicate code is “soft hyphen,” but it is typographically the same as hyphen.
A quick fix, regression test and reset the baseline onthe regression tests to lock in the fix and it is all resolved. But it is a really good example about the complexity of the PDF specification. Do you have any favourite gotchas in PDF?
Latest posts by Mark Stephens (see all)
- 3 ways that the European Union is changing the way Companies write software in 2018 - January 31, 2018
- IDRsolutions product range update for 2018 - January 22, 2018
- 4 ways Companies can make remote working successful - December 21, 2017
- My experience of a Turkish bath (visiting Istanbul for DevFest) - November 24, 2017
- My 5 key takeaways from JavaOne 2017 - October 6, 2017