Ever since we began writing our PDF to HTML5 converter a little over 2 years ago, we have chosen the HTML5 canvas as the way to present a PDF file as HTML5. It allows us to output PDF vector graphics and images as JavaScript commands to draw onto the canvas when the file is loaded. We can then add selectable text and form components on top too. At the time this made sense – the canvas is well supported and works on nearly all mobile devices giving us good compatibility.
But since then, we have discovered many ‘features’ of the canvas that have caused us to get creative with how we convert PDF files into HTML5. For example, currently the HTML5 Canvas does not support filling shapes with the EvenOdd rule, or specifying settings for dashed lines. To solve these issues, we have had to instead output those shapes as images. Unfortunately this can lead to bloat in the output of some pages having many images.
There are also other interesting issues – for example using Save and Restore on Chrome on Android will result in a shape being incorrectly repeated, and using a scale CSS transform in Safari on Mac rasterizes text when you scale rather than redrawing at the correct size. These are all things I will go into more detail about in coming weeks.
But perhaps the biggest flaw with the canvas is that it’s a raster format. If you draw shapes to canvas, they get rasterized and do not scale well. In many cases, we could actually get a better result if we just provide an image of the page, and we do already offer this as an option in our converter.
There are several advantages to providing an image instead:
1. Lower file size – The file size of the image representation of the page can actually be smaller than the draw commands.
2. Visual feedback when loading – Browsers display images as they are loading – when using the canvas you don’t see anything till everything has loaded.
3. Faster load times – As the page is pre-rasterized there is no longer the overhead of having to rasterize the page to canvas each time it is loaded.
4. Everything is simplified – Currently we have some not so nice JavaScript to load and draw the page, we can replace all of this with a simple HTML image tag. It also greatly tidies up our conversion code.
5. Better IE support for older versions of IE (even IE6).
Outputting content as an image is a very nice compromise if you want fast loading files at the cost of not so nice zoom, and we will continue to offer this as an output option.
The PDF file format is a vector file format, and rasterizing the output is a very poor way to convert – it doesn’t make full use of HTML5 features and it does not scale well. We are planning to replace Canvas with inline SVG to produce vector HTML5 representations of PDF pages. SVG support in all mainstream browsers has improved vastly over the last 3 years. It is now a viable (even superior) alternative to Canvas.
This means that if you choose the SVG conversion option, instead of an image tag, you will in fact get an object tag that will displaying the content of an SVG file. This has a significant advantage in that it offers flawless zooming, as you would expect from a PDF file. Like images, SVG also displays the content as it is loaded, making for improved user experience.
In fact, what we will actually output is both an image and SVG representation of the page. If SVG is supported, the SVG will be used, otherwise the image will be used. This means that even when using the SVG mode, the output can use the fallback image and will even work on Internet Explorer 6!
We see this as a huge improvement over our current modes, a significant advantage over other available conversion tools, and very deserving of being announced as part of our version 5 release, inline with version 5 of the Java PDF Library that we also produce.
As this is quite a major change, we would like to take the opportunity to request your feedback. Do you think we are wrong to drop the canvas? Please let us know!
If you are curious about how our output may look in the future, a preliminary example has been created to preview. Please do zoom in to the map in the bottom left!