PDF extraction Archives - Java PDF Blog

PDF extraction

Mastering Server-Side PDF Processing in Java

The Hidden Risks in Server-Side PDF Processing PDFs are the lifeblood of enterprise document workflows, but processing them at scale on a server is...

Jacob Collins
Dec 19, 2025 2 min read

Apache Tika PDF support in JPedal

JPedal now contains an Apache Tika Parser which can parse and extract structured and unstructured text from PDF files. How to use an Apache...

Jacob Collins
Jan 24, 2023 1 min read

Understanding the PDF file format – Text, shapes and…

I have been looking at an issue for a potential client recently which required the generation of different views of the page. This is...

Mark Stephens
May 26, 2010 1 min read

PDF mystery – what is the correct value for…

I came across an interesting issue with PDF Text fields while debugging a file this week. We were sent a 2 page document created...

Chris Wade
Apr 19, 2010 1 min read

What text format and style information is in a…

Because PDF is very much an output and display format it does not contain much text formatting information such as paragraph breaks and spaces...

Mark Stephens
Sep 3, 2009 39 sec read