Jacob Collins Jacob is the JPedal Product Lead and specialises in PDF creation and manipulation. He also develops Salesforce backend systems and contributes to marketing and support. Outside work, he’s a 1900‑rated chess player, guitarist, and French learner.

Apache Tika PDF support in JPedal

29 sec read

Apache tika pdf

JPedal now contains an Apache Tika Parser which can parse and extract unstructured text from PDF files.

How to use an Apache Tika PDF Parser

Just like any other Apache Tika Parser, you must call the parse() method with a few parameters.

First, you must pass a TikaInputStream containing the path to your PDF file.

Second, you must pass a ContentHandler. It is advisable to set the character limit to -1 otherwise, the whole PDF file may not be parsed.

Next, you pass a Metadata. This can be a blank instance or it can contain the password to the PDF file if it is encrypted.

Finally, a ParseContext is not needed so the last argument can be null.

The extracted text is now stored in the ContentHandler!

Learn More

You can find more information about our Apache Tika Parser here.



The JPedal PDF library allows you to solve these problems in Java


Jacob Collins Jacob is the JPedal Product Lead and specialises in PDF creation and manipulation. He also develops Salesforce backend systems and contributes to marketing and support. Outside work, he’s a 1900‑rated chess player, guitarist, and French learner.

Easily convert PDF to HTML in Java (Tutorial)

PDF to HTML conversion helps improve your PDF content for browser display. BuildVu is one of the leading PDF to HTML solution for developers....
Leon Atherton
1 min read