Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Are you a Java Developer working with PDF files?

Find out why you should be using JPedal

How to extract PDF file form data in Java

1 min read

jpedal

PDF files are not directly supported by Java. This tutorial shows you how to extract form data from a PDF file in simple steps using JPedal Java PDF library.

JPedal includes extensive support for Interactive Forms and Components which it converts into Java Object representations and also allows access to Forms names and the GUI representations. The data can be accessed with a single call on a page or document basis.

Why use a third party library to handle PDF files?

PDF files are a very complex binary/text hybrid data structure which is a subset of the even more complicated Postscript format. In this example, we will use our JPedal Java PDF library to make this task simple.

How to Extract PDF Form Data in Java

  1. Create a File handle, InputStream or URL pointing to the PDF file
    PdfFormUtilities extract=new PdfFormUtilities(path);
  2. Include a password if file password protected
    extract.setPassword("password");
  3. Open the PDF file
    if (extract.openPDFFile()) {
  4. Select the data type required
          //all formNames
          Object[] names=extract.getFormComponentsFromDocument(null, ReturnValues.FORM_NAMES);
                
          // all forms in document called Mabel
          Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromDocument("Mabel", ReturnValues.FORMOBJECTS_FROM_NAME);
     
          //a form with PDF Reference 25 0 R
          Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromDocument("25 0 R", ReturnValues.FORMOBJECTS_FROM_REF);
    
          //all Swing versions of the Form objects
          Object[] swingComponents=extract.getFormComponentsFromDocument(null, ReturnValues.GUI_FORMS_FROM_NAME);
    
          //all formNames on page 5
          Object[] names=extract.getFormComponentsFromPage(null, ReturnValues.FORM_NAMES,5);
                
          // all forms in document called Mabel on page 5
          Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromPage("Mabel", ReturnValues.FORMOBJECTS_FROM_NAME,5);
     
          //a form with PDF Reference 25 0 R on page 5
          Object[] PDFObjectsAsPoJos=extract.getFormComponentsFromPage("25 0 R", ReturnValues.FORMOBJECTS_FROM_REF,5);
    
          //all Swing versions of the Form objects on page 5
          Object[] swingComponents=extract.getFormComponentsFromPage(null, ReturnValues.GUI_FORMS_FROM_NAME,5);
    }
  5. Close the PDF file
     extract.closePDFfile();
    


The JPedal PDF library allows you to

Display PDF files in Java Apps
View PDF files in Java
Convert PDF Files to image
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.