JPedal: Java PDF Parser

Table of Contents show

Why do we need to parse PDF files?

PDF files are unusual in that they do not contain the actual content you see displayed when you view the file. Instead it is a program which draws the text, lines, shapes and images to create that display. This code needs to be ‘executed’ in order to create the actual output.

We find it helpful to use the metaphor of the PDF file being a Map to the treasure, not the treasure itself.

What is PDF Parsing?

PDF parsing is the process of extracting and interpreting data from PDF files. The type of data could include text, images, tables, metadata. Developers can then use the data for further processing or analysis.

PDF parsing involves analysing internal structure of a PDF document to identify and retrieve specific elements. PDF parsing becomes necessary as PDF files are designed for display across different devices and not for easy data extraction.

Why Parse PDFs?

Parsing PDFs is essential for industries that rely on bulk document management since it helps transform static visual data into actionable digital resources. Other uses in different industries may include:

Compliance & Auditing: Cross-verifies regulatory, tax, or legal data for audit trails and compliance reviews.
Inventory & Order Management: Parses shipping manifests, inventory logs, and confirmations to sync with ERP or retail systems.
AI & NLP Data Preparation: Converts PDF text into datasets for retrieval-augmented generation and machine learning pipelines.

JPedal: PDF Parsing in Java

With JPedal you have a Java PDF parser, that allows you to parse text, images, metadata, marked/structured content or even raw data from PDF documents.

With JPedal you can parse PDF files to:

Extract Form data
Extract Embedded files
Extract Structured text as YAML, JSON, EPUB and XML
Extract Unstructured and Structured Text
Extract Images and Clipped Images
Translate text
Inspect Raw Data
Read PDF Metadata

and the Java PDF library offers much more…

Why JPedal?

JPedal was designed as a 100% pure Java library aimed specially at Java developers who work with PDFs. Its simple API makes PDF parsing tasks achievable using only a few lines of code.

The Java PDF Parser also has 25 years of development under its belt and is regularly updated with the latest features and improvements. The library was built in pure Java without any third-party dependencies.

JPedal is designed for both on-premise and cloud use cases, and is specifically tailored to global companies which process millions of documents regularly.

Conclusion

For Java-centric teams that need high-performance, full-featured PDF parsing, JPedal checks every box: deep technical pedigree, rich functionality, blazing performance, and a developer experience rooted in real-world needs.

Whether for document workflows, archiving, or integration, JPedal empowers Java developers to do more with PDF, fast and reliably.

The JPedal PDF library allows you to solve these problems in Java

//Convenience static method (see class for additional options)
ExtractClippedImages.writeAllClippedImagesToDir("inputFileOrDirectory", "outputDir", "outputImageFormat", new String[] {"imageHeightAsFloat", "subDirectoryForHeight"});

final PdfManipulator pdf = new PdfManipulator();
pdf.loadDocument(new File("inputFile.pdf"));
pdf.addPage(1, PaperSize.A4_LANDSCAPE);
pdf.addText(1, "Hello World", 10, 10, BaseFont.HelveticaBold, 12, 1, 0.3f, 0.2f);
pdf.addImage(1, new BufferedImage(), new float[] {0, 0, 100, 100});
pdf.rotatePage(1, 90);
pdf.apply();
pdf.writeDocument(new File("outputFile.pdf"));

Viewer viewer = new Viewer();
viewer.setupViewer();
viewer.executeCommand(ViewerCommands.OPENFILE, "pdfFile.pdf");

//Convenience static method (see class for additional options)
ExtractTextAsWordList.writeAllWordlistsToDir("inputFileOrDirectory", "outputDir", -1);

PdfMerge.mergeFiles(new File("inputFile1.pdf"), new File("inputFile2.pdf"), new File("outputFile.pdf"));

PdfManipulator.splitInHalf(new File("inputFile.pdf"), new File("outputFolder"), pageToSplitAt);

PrintPdfPages print = new PrintPdfPages("C:/pdfs/mypdf.pdf");

if (print.openPDFFile()) {
    print.printAllPages("Printer Name");
}

//Convenience static method (see class for additional options)
ExtractClippedImages.writeAllClippedImagesToDir("inputFileOrDirectory", "outputDir", "outputImageFormat", new String[] {"imageHeightAsFloat", "subDirectoryForHeight"});

//Convenience static method (see class for additional options)
ArrayList resultsForPages = FindTextInRectangle.findTextOnAllPages("/path/to/file.pdf", "textToFind");

java -jar jpedal.jar --inspect "inputFile.pdf"

PdfSigner.signPdf(
        "inputFile.pdf",
        "outputFile.pdf",
        "keystorePassword",
        "keystoreFile.p12",
        "signerName",
        "signerLocation",
        "signingReason",
        ACCESS_PERMISSION.P1
);

JPedal: Java PDF Parser

Why do we need to parse PDF files?

What is PDF Parsing?

Why Parse PDFs?

JPedal: PDF Parsing in Java

Why JPedal?

Conclusion

The JPedal PDF library allows you to solve these problems in Java

What is JPedal?

Why use JPedal?

What licenses are available?

How to use JPedal?

How to Reorder Pages in a PDF Using Java…

How to remove blank pages from a PDF in…

Convert PDF to HTML5: Preserving Layout

JPedal: Java PDF Parser

Why do we need to parse PDF files?

What is PDF Parsing?

Why Parse PDFs?

JPedal: PDF Parsing in Java

Why JPedal?

Conclusion

Related posts:

The JPedal PDF library allows you to solve these problems in Java

What is JPedal?

Why use JPedal?

What licenses are available?

How to use JPedal?

How to Reorder Pages in a PDF Using Java…

How to remove blank pages from a PDF in…

Convert PDF to HTML5: Preserving Layout