Sam Howard

Sam is a developer at IDRsolutions who mostly specialises in font support and conversion. He’s also enjoyed working with Java 3D, Java FX and Swing. His other interests include music and game design.

Font Conversion for PDF2HTML – dotsection

1 min read

We recently released support for converting Type1c (otherwise known as CFF) fonts to OpenType for use within our PDF to HTML converter. I thought it would make an interesting blog article because it gives you an insight into the world of fonts and also highlights some of the continuing issues with font compatibility in different browsers.

OpenType fonts consist of a bunch of tables containing different data about the font. Luckily for us, it also extends two previous formats – TrueType and CFF. This means that you can create an OpenType font by either using a number of tables from an existing TrueType font or by including a CFF font as a table, then adding a number of other required tables.

This means a large part of the new font, including the glyph outlines, can be generated simply by copying the binary straight out of the PDF – great, less work for us! Or so you’d assume…

Well, most of the time, yes. Unfortunately, though, it’s not always that simple.

CFF glyph outlines are made up of a series of instructions for drawing the glyph. Early versions of the specification included one instruction called ‘dotsection’ for specifying that a new section of a glyph – such as the dot on an i – was about to begin. It had no technical usage, and was completely ignored by all parsers. At some point it was removed from the specification, and continues to be completely ignored by the vast majority of parsers.

Unfortunately, Google Chrome isn’t one of them! Chrome has a bit of code called OTS (the OpenType Sanitiser) which goes through OpenType fonts Chrome is trying to use and checks them for potential problems which could cause the font engine being used (which varies by platform) to crash. When it finds a problem, it quite often fixes it, but in the case of dotsection commands it simply rejects the font outright.

So due to this, we have no way of ensuring a font is accepted by Chrome except by either completely rewriting the CFF data from scratch, or going through the CFF data and stripping out dotsection commands, keeping track of a large number of offsets and updating them accordingly. We chose the latter option, and now every CFF font for conversion is quickly scanned for potential issues, and just as quickly fixed if any rogue commands are found.

All of this could easily be avoided by proper backwards compatibility with fonts among browsers, but until that happens strange quirks like this are sure to keep popping up. Do you think browsers should be strict (like Chrome) or just try and make things work (like Adobe Acrobat)?

This post is part of our “Fonts Articles Index” in these articles we explore Fonts.

Sam Howard

Sam is a developer at IDRsolutions who mostly specialises in font support and conversion. He’s also enjoyed working with Java 3D, Java FX and Swing. His other interests include music and game design.

Converting your PDF files to HTML5 with BuildVu 

Recently we announced our updated product range for 2018 and are rebranding some existing products, like JPDF2HTML5 which has been renamed to BuildVu. It...
Georgia Ingham
3 min read

Favourite resources from our HTML development team

As the web progresses and grows, so do the technologies that come along with it. Trying to keep on top of everything you need...
Ovidijus Okinskas
1 min read

How HTML5 Javadocs in Java 9 will make your…

Here at IDRsolutions we are very excited about Java 9 and have written a series of articles explaining some of the main features. In...
Rob
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *