PDF files are not directly supported in Java. This tutorial shows you how to search the text content in a PDF file in simple steps using JPedal PDF library.
Why use a third-party library to handle PDF files?
PDF files are a very complex binary/text hybrid data structure and the file needs to be decoded to figure out the textual content. In this example, we will use our JPedal PDF library to make this task simple.
How to search PDF file in Java
Step 1 Download JPedal trial jar.
Step 2 Create a File handle, InputStream or URL pointing to the PDF file
FindTextInRectangle extract=new FindTextInRectangle(path);
Step 3 Include a password if file password protected
extract.setPassword("password");
Step 4 Open the PDF file
if (extract.openPDFFile()) {
Step 5 Scan the pages
int pageCount = extract.getPageCount();
for (int page = 1; page <= pageCount; page++) {
float[] coords = extract.findTextOnPage(page"textToFind",
SearchType.MUTLI_LINE_RESULTS ) ;
}
}
Step 6 Close the PDF file
extract.closePDFfile();