The PDF file format is a very common (and very elegant) file format which many Java developers will come across. But it is not totally straight-forward to understand, especially if you come with a mindset of expecting it to work like Microsoft Word or HTML/XML.
We have been working with PDF files since 1999 and still learn new things each day. So we thought that a quick starter guide to the PDF file format would be useful. Below we have assembled our top tips for you.
What is a PDF file?
A PDF file is a binary data structure containing Postscript code to render documents. You can think of it as a program which a PDF parser will ‘run’ to produce the pages. It is based on an Open ISO format. It is popular because PDF files are cross-platform.
Does Java have any support for the PDF file format?
No. But there are plenty of Java libraries which make it very easy to work with PDF files in Java. We have been developing one of the most popular commercial ones since 1999.
Working with PDF files
As a Java Developer, you will probably interact with PDF files in one of these ways:-
1. Store a PDF file
A PDF file should always be treated as a BLOB and stored as a binary object. It contains its own internal offset table which will be broken if you start trying to edit the file. Be careful not to add additional bytes to the start or end of the blog if you store the data as this can make the file unusable.
2. Create a PDF file
There are 2 ways to create a PDF file. You can either print an existing Document as a PDF file with any printer driver or Ghostscript. Or you can create a PDF in XML (with Apache FOP) or programmatically with a tool like Itext
3. Edit a PDF file
Our recommendation is to use an external library (there is a good selection of both Open Source and Commercial available). This is a very complex problem and other developers have already spent years solving this for you.
4. Convert a PDF file
PDF files are often converted to Image or HTML5 for display purposes. This is the currently the most popular use case for our JPedal and Buildvu libraries. This can make them easier to display, but may lose some advantages of the PDF file format (Vector graphics, Annotations, etc).
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Do you need to solve any of these problems?
Display PDF documents in a Web app |
Use PDF Forms in a web browser |
Convert PDF Documents to an image |
Work with PDF Documents in Java |