PDF files sometimes reference external resources such as images or other documents. To increase portability and ease archival of such PDF files, these resources can be embedded within the PDF itself. These are known as attachments or embedded files.
If you wish to extract these embedded files from a PDF using Java, you will need to use a third-party library since Java does not have built-in support for processing PDF files.
This tutorial uses JPedal.
How to extract embedded files from a PDF file programmatically
- Add JPedal to your class or module path (download the trial jar)
- Run the following Java code:
ExtractEmbeddedFiles.extractAllFilesFromPdf("inputFile.pdf", "outputFolder");
How to extract embedded files from a PDF file using the commandline
- Add JPedal to your class or module path (download the trial jar)
- Run the following command:
java -cp jpedal.jar org.jpedal.examples.acroform.ExtractEmbeddedFiles inputFile.pdf outputFolder
You can expand your understanding of the PDF format by reading our other articles. Similarly, if there is a specific term for PDF you would like to know more about, our PDF Glossary has an extensive list of common terms.