Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Converting Microsoft Office documents to PDF, HTML5 or SVG

1 min read

Microsoft-Office-logo-2012
Office to PDF, HTML5, and SVG

As this is a question we get asked a lot at IDRsolutions, I decided to write a blog article on the topic, which may well develop into a series…

Microsoft Office files are an industry standard and lots of people want to convert them into PDF or HTML5 or SVG. One option is to use Microsoft Office but there is an alternative which is cross-platform and free  – LibreOffice. It is a version of the Open Source library OpenOffice which has excellent support for Word, PowerPoint and other office file formats. They are both very similar with slightly different strengths and  weaknesses (and both are free so try both yourself and choose).

LibreOffice has TWO very useful features. Firstly, it is cross-platform so it will run on Linux and OS X boxes and not just Windows. Secondly, it does not need a user to run it – the software can be called from your programs as a library. This is really easy to do. So

libreoffice --headless --convert-to pdf myFile.docx

will turn the Word file myFile.docx into a PDF file. We get to see a lot of PDF files and the PDF files created by LibreOffice are generally very good.

LibreOffice has several APIs (including Java) or you can just call it as an external process with this code in Java.

// Get an instance of shell
            Process pqShell = Runtime.getRuntime().exec("sh");
            
            String shellCommand = "libreoffice --headless --convert-to pdf " + fileName;
            try {
                java.io.DataOutputStream dos = new java.io.DataOutputStream(pqShell.getOutputStream());
                dos.writeBytes("cd " + userInputDirPath + "\n");
                dos.writeBytes(shellCommand + "\n");
                dos.writeBytes("exit\n");
                dos.flush();
                dos.close();
                pqShell.waitFor();
            } catch (Exception ex) {
                ex.printStackTrace();
            } finally {
                pqShell.destroy();
            }

The –convert-to parameter can take any filetype as parameter (ie txt for Office to Text, html for Office to HTML), etc. There are lots of additional featured which we may document in later articles…

The HTML output is quite simple, so we have been linking the PDF files created via LibreOffice to our PDF to HTML5 converter and testing for several months now. We (and our test customers) have been very pleased with the results and we know of lots of companies using LibreOffice internally for file conversion.

So we have added LibreOffice to our free online converter which now allows people to convert not just PDF files but also Convert Office documents to HTML5, Word Documents to HTML5, Excel Documents to HTML5 and Powerpoint to HTML5.

We recommend this additional functionality to our commercial clients who want to process a wider range of documents with our PDF to HTML5 converter.

We are very impressed with the possibilities of LibreOffice as part of a two stage conversion process to turn Office Documents into HTML5 via PDF. I was less enthusiastic about Office to HTML direct conversion.  I hope that if you are doing anything with Office documents on server or desktop, you have a look and experiment with it as part of your solution.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the Blog RSS.

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

One Reply to “Converting Microsoft Office documents to PDF, HTML5 or SVG”

Leave a Reply

Your email address will not be published. Required fields are marked *