Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to access PDF file metadata in Java

1 min read

jpedal

PDF files are not directly supported by Java. This tutorial shows you how to extract metadata from a PDF file in simple steps using the JPedal PDF library.

Why use a third party library to handle PDF files?

PDF files are not directly supported by Java. PDF files are a very complex binary/text hybrid data structure which is a subset of the even more complicated Postscript format. The data needs to be parsed and assembled from many sources to create the pages displayed or extract images from a PDF file.  In this example, we will use our JPedal PDF library to make this task simple.

JPedal Java PDF Library contains a large number of utilties to access information about or inside a PDF file. Here are the common Developer uses.

Get a page count

JPedal makes it very easy to scan the pages of a PDF file for text. This features is built into all examples so it also accessible from other examples or as part of the PdfUtilities class. Here is a simple example (Javadoc).
JPedal includes some very powerful features for text search including regular expressions.

PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
      int pageCount=extract.getPageCount();    
 }
 
 extract.closePDFfile();

Access PDF page size and rotation

Every page in a  PDF document can have its own dimensions and rotation. MediaBox is the actual page size and CropBox is the visible page size (we recommend you always use CropBox.

PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
    float[] pageDimensions = extract.getPageDimensions(pageNum, PageUnits.Inches, 
    PageSizeType.CropBox););
 }
 
 extract.closePDFfile();

Detect if embedded fonts used in PDF Document

JPedal allows the user to see if embeddedFonts are used in the PDF document.

PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
      boolean usesEmbeddedFonts=extract.hasEmbeddedFonts();    
 }
 
 extract.closePDFfile();

Access PDF Document properties

A PDF document can contain a set of pre-defined Document properties or an XML value containing any data.

PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
      Map mapOfValuePairs=extract.getDocumentPropertyStringValuesAsMap();    
      String XMLStringData=extract.getDocumentPropertyFieldsInXML();
 }
 
 extract.closePDFfile();


JPedal makes it easy to access PDF file metadata


Java PDF SDK for working with PDF filesFind out more



Do you need to...

Display PDF files in Java Apps →

Convert PDF Files to image →

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.