PDF files are not directly supported by Java. This tutorial shows you how to extract form data from a PDF file in simple steps using JPedal PDF library.
JPedal includes extensive support for Interactive Forms and Compnents which is converts into Java Object representations and also allows access to Forms names and the GUI representations. The data can be accessed with a single call on a page or document basis.
Why use a third party library to handle PDF files?
PDF files are a very complex binary/text hybrid data structure which is a subset of the even more complicated Postscript format. In this example, we will use our JPedal PDF library to make this task simple.
How to Extract PDF Form Data in Java
Step 1 Create a File handle, InputStream or URL pointing to the PDF file
PdfFormUtilities extract=new PdfFormUtilities(path);
Step 2 Include a password if file password protected
extract.setPassword("password");
Step 3 Open the PDF file
if (extract.openPDFFile()) {
Step 4 Select the data type required
//all formNames
Object[] names=extract.getFormComponentsFromDocument(null, ReturnValues.FORM_NAMES);
// all forms in document called Mabel
Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromDocument("Mabel", ReturnValues.FORMOBJECTS_FROM_NAME);
//a form with PDF Reference 25 0 R
Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromDocument("25 0 R", ReturnValues.FORMOBJECTS_FROM_REF);
//all Swing versions of the Form objects
Object[] swingComponents=extract.getFormComponentsFromDocument(null, ReturnValues.GUI_FORMS_FROM_NAME);
//all formNames on page 5
Object[] names=extract.getFormComponentsFromPage(null, ReturnValues.FORM_NAMES,5);
// all forms in document called Mabel on page 5
Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromPage("Mabel", ReturnValues.FORMOBJECTS_FROM_NAME,5);
//a form with PDF Reference 25 0 R on page 5
Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromPage("25 0 R", ReturnValues.FORMOBJECTS_FROM_REF,5);
//all Swing versions of the Form objects on page 5
Object[] swingComponents=extract.getFormComponentsFromPage(null, ReturnValues.GUI_FORMS_FROM_NAME,5);
}
Step 5 Close the PDF file
extract.closePDFfile();