One of the things I have missed most in moving PDF content to HTML5 is the clipping functionality of the PDF.
In a PDF File you can set a clip which can be any irregular shape. Only content which is inside the clip is drawn the rest is not (it is simple). HTML5 has nothing like this which means we have to emulate it. Otherwise invisible content (such as crop marks or invisible lines) starts to appear.
This turned out to quite a complex task. Eliminating anything which is not in the clipped area was easy – the tricky bit is handling items which intersect with the clip (ie drawn so partly visible). Images can be clipped but shapes have to be altered. The hardest items to handle were images.
In the PDF File format you can have a Stroked Shape (the outline), a Filled shape (colour in the shape) and both. So you have to workout how the shape interacts with the clip. For example if the clip was totally inside the shape, we could ignore it if it was a Stroke (ie an outline) but would need to fill in the clipped area if it was filled. We had to dig out our old Maths notes on trigonometry to calculate the points where the lines appear and disappear!
I am sure we will find some additional cases which we have not currently covered. So try the latest version and let us know what you think.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.