This tutorial shows you how to find words in a PDF file in simple steps using JPedal Java PDF library. JPedal includes a PDF search engine which provides an easy to use Java PDF api to find words and phrases in a pdf document.
How to search PDF file in Java
- Download JPedal trial jar.
- Create a File handle, InputStream or URL pointing to the PDF file
- Include a password if file password protected
- Open the PDF file
- Scan the pages
- Close the PDF file
and the Java code to search a PDF…
File file = new File("/path/to/document.pdf"));
FindTextInRectangle extract=new FindTextInRectangle(file);
//extract.setPassword("password");
if (extract.openPDFFile()) {
int pageCount = extract.getPageCount();
for (int page = 1; page <= pageCount; page++) {
float[] coords = extract.findTextOnPage(page"textToFind",
SearchType.MUTLI_LINE_RESULTS ) ;
}
}
extract.closePDFfile();
Why can’t I just search the PDF file directly?
You cannot simply search inside a PDF file because the text data is stored in a special binary format.
Related tutorials
If you are looking to search PDF files in JPedal, we recommend you start with this tutorials:-
The JPedal PDF library allows you to solve these problems in Java
Viewer viewer = new Viewer();
viewer.setupViewer();
viewer.executeCommand(ViewerCommands.OPENFILE, "pdfFile.pdf");
//Convenience static method (see class for additional options)
ExtractClippedImages.writeAllClippedImagesToDir("inputFileOrDirectory", "outputDir", "outputImageFormat", new String[] {"imageHeightAsFloat", "subDirectoryForHeight"});
//Convenience static method (see class for additional options)
ExtractTextAsWordList.writeAllWordlistsToDir("inputFileOrDirectory", "outputDir", -1);
//Convenience static method (see class for additional options)
ArrayList resultsForPages = FindTextInRectangle.findTextOnAllPages("/path/to/file.pdf", "textToFind");
PrintPdfPages print = new PrintPdfPages("C:/pdfs/mypdf.pdf");
if (print.openPDFFile()) {
print.printAllPages("Printer Name");
}
//Convenience static method (see class for additional options)
ExtractClippedImages.writeAllClippedImagesToDir("inputFileOrDirectory", "outputDir", "outputImageFormat", new String[] {"imageHeightAsFloat", "subDirectoryForHeight"});
Why do developers choose JPedal over alternatives?
- Actively developed commercial library with full support and no third party dependencies.
- Simple licensing options and source code access for OEM users.
- Process PDF files up to 3x faster than alternative Java PDF libraries.