Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to extract PDF file form data in Java

1 min read

jpedal

PDF files are not directly supported by Java. This tutorial shows you how to extract form data from a PDF file in simple steps using JPedal PDF library.

JPedal includes extensive support for Interactive Forms and Compnents which is converts into Java Object representations and also allows access to Forms names and the GUI representations. The data can be accessed with a single call on a page or document basis.

Why use a third party library to handle PDF files?

PDF files are a very complex binary/text hybrid data structure which is a subset of the even more complicated Postscript format. In this example, we will use our JPedal PDF library to make this task simple.

How to Extract PDF Form Data in Java

Step 1 Create a File handle, InputStream or URL pointing to the PDF file

PdfFormUtilities extract=new PdfFormUtilities(path);

Step 2 Include a password if file password protected

extract.setPassword("password");

Step 3 Open the PDF file

if (extract.openPDFFile()) {

Step 4 Select the data type required

      //all formNames
      Object[] names=extract.getFormComponentsFromDocument(null, ReturnValues.FORM_NAMES);
            
      // all forms in document called Mabel
      Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromDocument("Mabel", ReturnValues.FORMOBJECTS_FROM_NAME);
 
      //a form with PDF Reference 25 0 R
      Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromDocument("25 0 R", ReturnValues.FORMOBJECTS_FROM_REF);

      //all Swing versions of the Form objects
      Object[] swingComponents=extract.getFormComponentsFromDocument(null, ReturnValues.GUI_FORMS_FROM_NAME);

      //all formNames on page 5
      Object[] names=extract.getFormComponentsFromPage(null, ReturnValues.FORM_NAMES,5);
            
      // all forms in document called Mabel on page 5
      Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromPage("Mabel", ReturnValues.FORMOBJECTS_FROM_NAME,5);
 
      //a form with PDF Reference 25 0 R on page 5
      Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromPage("25 0 R", ReturnValues.FORMOBJECTS_FROM_REF,5);

      //all Swing versions of the Form objects on page 5
      Object[] swingComponents=extract.getFormComponentsFromPage(null, ReturnValues.GUI_FORMS_FROM_NAME,5);
}

Step 5 Close the PDF file

 extract.closePDFfile();


Do you need to...

Display PDF files in Java Apps →

Convert PDF Files to image →

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.