The PDF file format is a very common (and very elegant) file format which many Java developers will come across. But it is not totally straight-forward to understand, especially if you come with a mindset of expecting it to work like Microsoft Word or HTML/XML.
We have spent over 20 years (so far) working with PDF and we thought that a quick guide to the PDF file format would be useful. Below we have assembled our top tips for you.
What is a PDF file?
A PDF file is a binary data structure containing Postscript code to render documents. You can think of it as a program which a PDF parser will ‘run’ to produce the pages. It is based on an Open ISO format. It is popular because PDF files are cross-platform.
Does Java have any support for the PDF file format?
No. But there are plenty of Java libraries which make it very easy to work with PDF files in Java. We have been developing one of the most popular commercial ones since 1999.
Working with PDF files
As a Java Developer, you will probably interact with PDF files in one of these ways:-
1. Store a PDF file
A PDF file should always be treated as a BLOB and stored as a binary object. It contains its own internal offset table which will be broken if you start trying to edit the file. Be careful not to add additional bytes to the start or end of the blog if you store the data as this can make the file unusable.
2. Create a PDF file
There are 2 ways to create a PDF file. You can either print an existing Document as a PDF file with any printer driver or Ghostscript. Or you can create a PDF in XML (with Apache FOP) or programmatically with a tool like Itext
3. Edit a PDF file
Our recommendation is to use an external library (there is a good selection of both Open Source and Commercial available). This is a very complex problem and other developers have already spent years solving this for you.
4. Convert a PDF file
PDF files are often converted to Image or HTML5 for display purposes. This is the currently the most popular use case for our JPedal and Buildvu libraries. This can make them easier to display, but may lose some advantages of the PDF file format (Vector graphics, Annotations, etc).
Learn more about PDF files
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have over 20 years worth of PDF knowledge and tips. Click here to visit the series index.
Did you know...
IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?
It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page