Having had my Raspberry Pi for quite some time and played with it extensively at home, (making it read news articles out loud and act as a media server) I’ve long since wanted to try something a little more serious with it involving what we do at work. So this week I decided to sit down and get our PDF to HTML5 converter to convert the Raspberry Pi magazine known as the MagPi.
Our PDF software is written in Java so we produce a stand alone Java jar file but we also have an online web services version as well, so I had the option of offloading all the conversion work to our main server, but that’s cheating in my eyes since we would only be using the Pi as a relay, to download and upload the files, so I decided that I would endeavour to get everything done on the Pi itself! Since Java is supposed to be a write once, run anywhere affair I figured it would be easy…
(I will be working predominantly on the terminal/command line, but will also be using the GUI from time to time, some of the screenshots will be from both).
In order to get it working I listed a few prerequisites;
- We will need to have a Raspberry Pi with either the Soft-float Debian “wheezy” version of Raspbian or the regular version of Raspbian
- Since our PDF2HTML5 converter is built using Java we will need to install a JVM on the Pi. (If you are using the regular version you will also need to install the Early Access Version of Oracles Java8)
- We will need a PDF file to test the conversion with.
- If we want to view the file on another device or use some advance features of the program we will also need a local web server or way of access a web server.
Not too many steps! Below I detail how I got it all working in turn.
Installing a JVM
The first step is to check if Java is already on our Raspberry Pi. To do this we can just simply <type:
into the command line to see if it exists and is installed (you will get an appropriate message).
If it does you will need to make sure it’s the Oracle JVM as there are currently some issues in the OpenJDK version on the Pi that cause problems with our library that I discovered during the writing of this article (likely fixed in future versions of the OpenJDK).
There are quite a few JVMs available initially I stuck with the OpenJDK as it was the easiest to install, all you have to do is type the following into the command line:
sudo apt-get update sudo apt-get install openjdk-7-jdk
However as previously mentioned the OpenJDK has some issues that effect our code negatively (and at a difficult to fix low level) that are not an issue in Oracle’s JVM so we will need to install that instead!
There are some good guides around on how to just that so I won’t go into a lot of detail and will link to two guides that I referred to when writing this article.
Essentially it boils down to what version of Raspbian you are running, if, like me you assumed the OpenJDK would work and are using the normal version of Raspbian (the one that does not use Soft-floating point numbers) than you will need to make use of Oracles Early Access version of Java 8 as it will run on the Raspberry Pi just fine. You can download it here, select the version labelled Linux ARMv6/7 VFP, HardFP ABI.
If you are using the Soft-float Debian “wheezy” version of Raspbian you can still make use of the Early Access JDK if you wish or you can use the current version of the Oracle JVM on their website here; click Java Platform (JDK) and scroll down the page and download the version labelled Linux ARM v6/v7 Soft Float ABI.
Either way you will end up with a tar.gz file. This file contains the JDK you choose and will need to be unpacked onto the Pi to continue.
To do this run the command:
tar xvzf TARFILE HERE
Replacing TARFILE HERE with the name of the tar.gz file you downloaded, in my case it looked like this:
tar xvzf jdk-8-ea-b99-linux-arm-vfp-hflt-17_jul_2013.tar.gz jdk1.8.0/
This will unpack the file into your current directory, this may take between a few seconds and minutes on the Pi. Once extracted/unpacked you will be left with a new folder in your current directory named something along the lines of jdk1.8.0 or jdk1.7.0_10.
The next thing you will need to do is install the JDK and register it with Pi so we can make use of it easily. The guide I followed when I first did this suggested that that you should move the JDK to a better location, /opt/java to be precise which does not exist by default so I created it using:
sudo mkdir -p -v /opt/java
And moved it using:
sudo mv -v jdk1.8.0 /opt/java/
(We need to use sudo here as we are dealing with a directory that requires root permission)
Then we run the following commands to install the JDK (replacing jdk1.8.0 with what your directory was named if you used JDK7):
sudo update-alternatives --install "/usr/bin/java" "java" "/opt/java/jdk1.8.0/bin/java" 1
sudo update-alternatives --set java /opt/java/jdk1.8.0/bin/java
These install and setup the Oracle JDK as the version of java to use on the Pi.
Now when you run Java -version you should get some output similar to this:
java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b99) Java HotSpot(TM) Client VM (build 25.0-b41, mixed mode)
These two blog post from Savage Home Automation detail how to install the Oracle JDKs and were what I referred to when writing this article, you should have a look as they’re much more detailed than my quick guide:
http://www.savagehomeautomation.com/raspi-jdk7 – Adding the Java 7 JDK to a Soft-Float version of Raspbian
http://www.savagehomeautomation.com/raspi-jdk8 – Adding the Early Access JDK to any version of Raspbian
Getting the Jar and Making Sure it Runs
Now that we have the JVM installed we will want to fetch the jpdf2html.jar file, if your a customer you will likely already have it but we do a free 14 day trial jar at http://www.idrsolutions.com/java-pdf-converter/ for those interested and I will be using that during this blog post.
The easiest way I find to get the 14 day trial is to start up the the GUI (using the command startx when on the terminal) and open up a web browser to download it through.
So now we have downloaded the jar file it’s time to see if it runs; keep the GUI running and open a terminal and type in the following:
java -jar jpdf2html.jar
Replacing jpdf2html.jar with the path to where you downloaded the jar file (I placed it in my home directory). You can also double click the jar file in your GUI file browser. You should get this message or something similar describing the parameters:
That’s perfectly normal since we didn’t give it any files to convert. Upon clicking Ok (or doing nothing if you are on the terminal) you will then be greeted by a message telling you how long you have left on your trial.
Converting a PDF
Now we have a JVM installed and the pdf2html5 converter jar ready and working it’s time to try it out on some PDF files! I know one that I want to try out on but first we shall try it on a simple test PDF made of two pages to make sure everything works.
Again I will use the GUI browser to navigate and download the PDF from their site.
(You are welcome to use any PDF I am just using this as an example).
With the PDF downloaded (I placed mine in my home directory) we can now try running the converter on it using the terminal command:
java -jar jpdf2html.jar example.pdf ~/
This runs the converter with the default settings on example.pdf and places the output in our home directory (inside a subfolder named after the PDF).
If all went to well we will have a HTML version of that PDF document! We can check this in our terminal:
But what does it look like? Well we can check directly on one of the Pi’s browsers for our default conversion mode:
Now we shall try it on the MagPi magazine and make use of some of the HTML5 features in the library to make it look great.
These features are documented on our site here. For this magazine we will set it to be in our nice page turning mode that makes use of Ajax for loading the pages (so we will also need a small web server set up to see this output) and will also make the text use shapes so that we always get an almost identical look to the original PDF:
java -Dorg.jpedal.pdf2html.scaling="1024x768" -Dorg.jpedal.pdf2html.textMode=image_shapetext_selectable -Dorg.jpedal.pdf2html.viewMode=pageturning -jar jpdf2html.jar The-MagPi-issue-14-en.pdf ~/
This will take a while on the Pi since the MagPi magazine is quite long and complicated and the Pi only has so much memory and processing power compared to a desktop or server.
Once converted you can check the individual pages to see if they are working locally on the Pi but index.html will not work correctly due to it’s use of Ajax so you will need to set up a web server on your Pi to view it from there or put it onto a website.
Installing a HTTP Web Server to serve your files up
There are a few web servers available and you may already have one installed but I stuck with Lighttpd as it’s lightweight and easy to install and quick to configure.
sudo apt-get -y install lighttpd
Will install the webserver and the following commands will allow the default pi user to edit the folders:
sudo chown www-data:www-data /var/www sudo chmod 775 /var/www sudo usermod -a -G www-data pi
And then reboot the Pi. Now you will be able to add all the HTML files to the /var/www/ directory on the Pi and you will be able to access them from your Pi (and from elsewhere on your network). Because of the Pi’s limited memory and processing power the magazine mode’s page turning may be slightly choppy and will likely only work on a few of the browsers available to it (Chromium and IceWeasel among them).
But essentially you have now given your Pi the ability to convert PDF files to HTML5 files!
I only explained how to set up your Pi to enable it to convert PDF to HTML and host it for you and potentially the world to see. With these ingredients you could make all manner of things for example:
- Does your company file reports as PDF? You could write a script to convert those sent to the Pi into HTML so you could view them on your mobile devices.
- You could convert free E-Books you own to a format that will work on any device with a web browser and have them accessible from anywhere with an internet connection via your Pi.
- We use our java file with a GlassFish server to provide our Online PDF to HTML converter, now the Pi wouldn’t be able to handle this as well as a dedicated server but it could be useful in a small internal environment, you can install GlassFish on the Pi by install the Oracle Java Embedded Suite which includes it along with other useful Java software.
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|Convert PDF to HTML5 or SVG|
|Convert AcroForms and XFA to HTML5|
|Java PDF SDK for working with PDF files|