Site iconJava PDF Blog

How to search a PDF file in Java

jpedal

PDF files are not directly supported in Java. This tutorial shows you how to search the text content in a PDF file in simple steps using JPedal Java PDF library. This provides an easy to use Java PDF api to search text in PDF documents from your Java code.

Why use a third-party library to handle PDF files?

PDF files are a very complex binary/text hybrid data structure and the file needs to be decoded to figure out the textual content. In this example, we will use our JPedal Java PDF library to make this task simple.

How to search PDF file in Java

  1. Download JPedal trial jar.
  2. Create a File handle, InputStream or URL pointing to the PDF file
    FindTextInRectangle extract=new FindTextInRectangle(path);
  3. Include a password if file password protected
    extract.setPassword("password");
  4. Open the PDF file
    if (extract.openPDFFile()) {
  5. Scan the pages
      int pageCount = extract.getPageCount();
      for (int page = 1; page <= pageCount; page++) {
        float[] coords = extract.findTextOnPage(page"textToFind", 
              SearchType.MUTLI_LINE_RESULTS ) ;
      }
    }
  6. Close the PDF file
     extract.closePDFfile();
    

JPedal makes Searching PDF files for text simple


Find out more