What is PDF Metadata?

Understanding PDF Metadata

PDF Metadata is data about data, so it contains information about the PDF. This may include author, creation date, length and additional details. It is embedded within the PDF document and describes its various attributes. This metadata is often used for the purpose of organising, searching and managing PDF files.

Early versions of PDF contained an information ‘Dictionary’ however, in 2001 Adobe introduced the Extensible Metadata Platform (XMP) which allowed for more complex and standardized metadata. The current version of PDF Standard ISO 32000-2 (2007) further polishes accessibility and security features.

What does it tell you?

Besides getting the general information about the PDF as stated above, you can also use metadata to find out several things. If you’re looking for further details, it can help you find the application used to create and modify the document.

Similarly you can also use it to find the PDF specification version the document conforms to. You get access to tags and bookmarks as well allowing you to better understand the document structure.

Why is PDF metadata important

Metadata allows documents to be used across different systems and platforms. It also gives the user a thorough understanding of the legal aspects of the document for compliance and auditing purposes.

Other than that it has a important details about encryptions and permissions, helping secure sensitive information. Likewise, tags and structural information can help improve accessibility of documents for disabled people. With Java, you can extract PDF metadata using a few lines of code.

How to check PDF Metadata

There are tools online like PDFescape and Smallpdf which help you read PDF. However if you want to view your metadata programmatically, you can find solutions which are language-specific.

When it comes to Java for example, JPedal can help you view your PDF metadata and provide many additional features which let you have more control over your PDF.

The JPedal PDF library allows you to solve these problems in Java

Viewer viewer = new Viewer();
viewer.setupViewer();
viewer.executeCommand(ViewerCommands.OPENFILE, "pdfFile.pdf");

//Convenience static method (see class for additional options)
ExtractClippedImages.writeAllClippedImagesToDir("inputFileOrDirectory", "outputDir", "outputImageFormat", new String[] {"imageHeightAsFloat", "subDirectoryForHeight"});

//Convenience static method (see class for additional options)
ExtractTextAsWordList.writeAllWordlistsToDir("inputFileOrDirectory", "outputDir", -1);

//Convenience static method (see class for additional options)
ArrayList resultsForPages = FindTextInRectangle.findTextOnAllPages("/path/to/file.pdf", "textToFind");

PrintPdfPages print = new PrintPdfPages("C:/pdfs/mypdf.pdf");

if (print.openPDFFile()) {
    print.printAllPages("Printer Name");
}

//Convenience static method (see class for additional options)
ExtractClippedImages.writeAllClippedImagesToDir("inputFileOrDirectory", "outputDir", "outputImageFormat", new String[] {"imageHeightAsFloat", "subDirectoryForHeight"});

Why do developers choose JPedal over alternatives?

Actively developed commercial library with full support and no third party dependencies.
Simple licensing options and source code access for OEM users.
Process PDF files up to 3x faster than alternative Java PDF libraries.