Jacob Collins Jacob is the JPedal Product Lead and specialises in PDF creation and manipulation. He also develops Salesforce backend systems and contributes to marketing and support. Outside of work, he’s a 1900‑rated chess player, guitarist, and French learner.

How to remove text from a PDF in Java using JPedal (Tutorial)

2 min read

redacted pdf

TL;DR

True PDF redaction in Java requires two things: hiding the text visually and removing it from the content stream. This tutorial shows how to do both with JPedal in under 20 lines of code.

redacted text

Why remove text from a PDF file?

Removing text from a PDF in Java is a common requirement when dealing with sensitive information, names, email addresses, phone numbers, and other personally identifiable information. Whether you are meeting GDPR redaction obligations, preparing documents for external sharing, or sanitising files before archiving, this tutorial explains how to do it using the JPedal PDF library.

What redaction actually means

Removing text from a PDF is a two-part problem. First, you find the text. Then you redact it, which itself has two layers:

  1. Hide the text visually, usually done by drawing an opaque box over it
  2. Remove it from the underlying content stream so it cannot be extracted by a PDF reader or copy-paste

Both steps are critical. Drawing a black box without editing the content stream is not true redaction. The text is still there, just invisible, and people will be able to copy and paste it. JPedal handles both steps, and together these are called redaction.

Choosing a Java PDF library for text removal

Most developers reach for Apache PDFBox first, but programmatically removing text from a PDF in Java, rather than just drawing over it, requires direct access to the content stream. JPedal exposes this through a clean API, handling both the search and the redaction in a few lines of code without manual stream manipulation.

Find, delete and redact text from a PDF in Java using JPedal

Open the PDF, scan each page for the target text, redact every match, then write out the modified document. The key methods are findTextOnPage() to locate matches and redact() to remove them. pdf.apply() commits the redaction operations to the document before writing.

  1. Download JPedal trial jar.
  2. Create a File handle to the PDF file
  3. Include a password if file password protected
  4. Open the PDF file
  5. Scan the pages for text
  6. Redact each match
  7. Write the output and close


findTextOnPage() returns a flat float array of coordinates for each match, x1, y1, x2, y2, plus a fifth value (magic number documented here) at index 4, which is why the loop increments by 5. The output is a new PDF with every instance of the search term permanently removed from both the visual layer and the content stream.
 
The original file is not modified unless you overwrite it. Add try-catch blocks around the file operations and PDF calls for production use. For other PDF text manipulation tasks in Java, extracting, searching, or modifying content programmatically, see the JPedal tutorials.
 
You can expand your understanding of the PDF format by reading our other articles. Similarly, if there is a specific term for PDF you would like to know more about, our PDF Glossary has an extensive list of common terms.

The JPedal PDF library allows you to solve these problems in Java


Jacob Collins Jacob is the JPedal Product Lead and specialises in PDF creation and manipulation. He also develops Salesforce backend systems and contributes to marketing and support. Outside of work, he’s a 1900‑rated chess player, guitarist, and French learner.