PDF files are not directly supported by Java. This tutorial shows you how to extract metadata from a PDF file in simple steps using the JPedal PDF library.
Easy ways to access PDF metadata
JPedal Java PDF Library contains a large number of utilities to access information about or inside a PDF file. Here are the common Developer uses:-
1. Get a page count
JPedal makes it very easy to scan the pages of a PDF file for text. This features is built into all examples so it also accessible from other examples or as part of the PdfUtilities class.
PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
int pageCount=extract.getPageCount();
}
extract.closePDFfile();
2. Access PDF page size and rotation
Every page in a PDF document can have its own dimensions and rotation. MediaBox is the actual page size and CropBox is the visible page size (we recommend you always use CropBox.
PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
float[] pageDimensions = extract.getPageDimensions(pageNum, PageUnits.Inches,
PageSizeType.CropBox););
}
extract.closePDFfile();
3. Access PDF Document properties
A PDF document can contain a set of pre-defined Document properties or an XML value containing any data.
PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
Map mapOfValuePairs=extract.getDocumentPropertyStringValuesAsMap();
String XMLStringData=extract.getDocumentPropertyFieldsInXML();
}
extract.closePDFfile();
4. Detect if embedded fonts used in PDF Document
JPedal allows the user to see if embeddedFonts are used in the PDF document.
PdfUtilities extract=new PdfUtilities("C:/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
boolean usesEmbeddedFonts=extract.hasEmbeddedFonts();
}
extract.closePDFfile();