Whilst developing our PDF to SVG conversion tool we have added an option to extract all the pages from the pdf into a single file. This was achieved using a single SVG object and displacing the coordinates of the various components of the page by the total height of all the pages that came before it. Although this may have worked at the time, further development has caused issues with page content being incorrectly displaced as more and more special cases began to appear.
To solve this we have altered how this code works so that each page is extracted as its own SVG object within a larger SVG object. In this way we need only calculate the contents position on the page as if the we are only extracting a single page. We then just needed to calculate the pages position in the single file and use these values to set an x and y coordinate for each page. In this way the positioning of the page is handled by the viewer or browser you are using and once an improvement enters single page mode it will also work in singe file mode.
This has opened up several options in the output of the SVG. In the future we could add new output modes such a mode where we have multiple files each containing two pages which would be useful when extracting news papers, magazines, books or anything that has a cross page layout.
We can also do this in a single file mode where two columns of pages are displayed side by side or could also allow for single file mode to display as a single row of pages. I should say now, before anybody gets excited, that the above are current possibilities and some may not make their way into the build.
These changes do not only give advantages to us on the development side. Since these changes have been implemented the browsers and viewers we have tested our output with have loaded the SVG files much faster than our previous output. The output is also clearer when viewed in a text area as it is now easier to distinguish between the end of one page and the start of the next.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.