Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to Compare PDF files

1 min read

How to view pdf metadata using Java (PDF logo)

Asking how to compare PDF files is a frequent question on the PDF forums. It is important to understand what you are trying to compare…

Can different PDF files look identical?

Yes, they can. Different PDF creators can generate pages that look visually identical but are constructed in very different ways. PDF is a flexible file format which has many features. So you could create 2 different  PDF versions of a file using Acrobat and Ghostscript (as an example). The files would (hopefully) be identical to view. But the files would be different sizes and the internal structure of each would be very different.

Is it possible to compare the object structure of PDF files?

In theory, you could scan the COS object tree of the files and make any comparisons. You would need to write your own custom tool to do this and be clear which differences matter to you.

Can I visually compare PDF files?

This is what most people mean by comparing PDF files.

In developing a Java PDF library, we need to do an awful lot of regression testing to make sure that we do not break anything. So we need to compare a lot of files. We also like to test each change individually so we can investigate any problems. The easiest way to do this is to rasterize them and compare the output.

We will extract the text and convert the PDF to a png file. Here is the Java code we use. We compare this against a baseline. You still need a human to verify any changes, but it does provide very quick regression tests.

If the results are identical, we can be confident that the file has not changed. And doing the same with 2 PDF files allows you to quickly review and changes, especially if you get the comparison to highlight the area on the PNG that has changed.

We find that a very good way to compare PDF file results. What works for you?

You can also have a look at our other articles to understand the PDF format.



Our software libraries allow you to

Convert PDF to HTML in Java
Convert PDF Forms to HTML5 in Java
Convert PDF Documents to an image in Java
Work with PDF Documents in Java
Read and Write AVIF, HEIC, WEBP and other image formats
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

One Reply to “How to Compare PDF files”

Comments are closed.