When we developed our original product for the Times Newspaper to extract Newspaper content from PDF files 12 years ago, I spent a lot of time looking at pages and working out heuristics to achieve the best conversion. This was a good training ground for PDF to HTML5 conversion as I now spend a lot of time working out the best way to handle various structures in HTML5.
Today, I have been looking at this text in a PDF file
The first letter of each word is being made bigger to emphasize it. This works fine in PDF but HTML5 does not give us that much control on positioning. If we try to do this, it looks horrible:-(
We are better off keeping the text the same size like this.
This looks much closer to the original version and produces smaller files.
The final step is to run the change against our baseline of HTML5 and review any changes – turns out it works really well on large font sizes but best avoided on smaller sizes. A little bit of tweaking and reset the baseline. Then it is on to look at the next possible enhancement…
Latest posts by Mark Stephens (see all)
- IDRsolutions at Business of Software (Boston) and JavaOne (San Francisco) in September - August 18, 2016
- Removal of signed JPedal Java PDF Library SDK jars from August 2016 release - August 16, 2016
- 3 enhancements for our free online PDF to HTML5 and SVG converter - July 5, 2016
- NetCat 8.2 – what it is and why you should care… - June 28, 2016
- Sneak Preview of 7 possible upcoming enhancements in our Java PDF viewer and HTML5 converter - June 15, 2016