Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to read PDF metadata in Java (Tutorial)

2 min read

PDFBox alternative (Jpedal)

How to view pdf metadata using Java (PDF logo)
As someone who works with PDFs, it is not straightforward to view PDF metadata since it is not directly supported by Java. This tutorial shows you how to check and extract metadata from a PDF file in simple steps using the JPedal Java PDF library.

What is PDF Metadata

PDF Metadata is data about data, so it contains information about the PDF. This may include author, creation date, length and additional details. It is embedded within the PDF document and describes its various attributes. This metadata is often used for the purpose of organising, searching and managing PDF files.
 
Early versions of PDF contained an information ‘Dictionary’ however, in 2001 Adobe introduced the Extensible Metadata Platform (XMP) which allowed for more complex and standardized metadata. The current version of PDF Standard ISO 32000-2 (2007) further polishes accessibility and security features.
 

Why is PDF metadata important

Metadata allows documents to be used across different systems and platforms. It also gives the user a thorough understanding of the legal aspects of the document for compliance and auditing purposes. PDF metadata also allows you to find the PDF version you are using.
 
Other than that it has a important details about encryptions and permissions, helping secure sensitive information. Likewise, tags and structural information can help improve accessibility of documents for disabled people. With Java, you can extract PDF metadata using a few lines of code.

How to read PDF Metadata

There are tools online like PDFescape and Smallpdf which help you read PDF. However if you want to view your metadata programmatically, you can find solutions which are language-specific.
 
When it comes to Java for example, JPedal can help you view your PDF metadata and provide many additional features which let you have more control over your PDF.

How to find a PDF file page count

  1. Add JPedal to your class or module path. (download the trial jar).
  2. Create a File handle, InputStream or URL pointing to the PDF file
  3. Include a password if file password protected
  4. Open the PDF file
  5. Read the page count
  6. Close the PDF file

and the Java code to get a page count…

You can try print out the result to see if it’s working:

How to access a PDF file page size and rotation

  1. Add JPedal to your class or module path. (download the trial jar).
  2. Create a File handle, InputStream or URL pointing to the PDF file
  3. Include a password if file password protected
  4. Open the PDF file
  5. Read the page size and rotation
  6. Close the PDF file

and the Java code to read PDF page size and rotation…

You can try print out the result to see if it’s working: (getPageDimensions returns a float[] with 5 values:- x,y,w,h, pageRotation)

How to access PDF Document properties

  1. Add JPedal to your class or module path. (download the trial jar).
  2. Create a File handle, InputStream or URL pointing to the PDF file
  3. Include a password if file password protected
  4. Open the PDF file
  5. Access the properties
  6. Close the PDF file

and the Java code to read PDF Document properties…

You can try print out the result to see if it’s working:

How to detect if embedded fonts used in PDF

  1. Add JPedal to your class or module path. (download the trial jar).
  2. Create a File handle, InputStream or URL pointing to the PDF file
  3. Include a password if file password protected
  4. Open the PDF file
  5. Query the PDF file status
  6. Close the PDF file

and the Java code to detect embedded fonts…

Again, you can try print out the result to see if it’s working:

In this article I showed you how you can view pdf metadata using Java.



The JPedal PDF library allows you to solve these problems in Java


Why do developers choose JPedal over alternatives?

  1. Actively developed commercial library with full support and no third party dependencies.
  2. Simple licensing options and source code access for OEM users.
  3. Process PDF files up to 3x faster than alternative Java PDF libraries.

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.