PDF files sometimes reference external resources such as images or other documents. To increase portability and ease archival of such PDF files, these resources can be embedded within the PDF itself. These are known as attachments or embedded files.
If you wish to extract these embedded files from a PDF using Java, you will need to use a third-party library since Java does not have built-in support for processing PDF files.
This tutorial uses JPedal.
How to extract embedded files from a PDF file programmatically
- Add JPedal to your class or module path (download the trial jar)
- Run the following Java code:
ExtractEmbeddedFiles.extractAllFilesFromPdf("inputFile.pdf", "outputFolder");
How to extract embedded files from a PDF file using the commandline
- Add JPedal to your class or module path (download the trial jar)
- Run the following command:
java -cp jpedal.jar org.jpedal.examples.acroform.ExtractEmbeddedFiles inputFile.pdf outputFolder
You can expand your understanding of the PDF format by reading our other articles. Similarly, if there is a specific term for PDF you would like to know more about, our PDF Glossary has an extensive list of common terms.
Our software libraries allow you to
Convert PDF to HTML in Java |
Convert PDF Forms to HTML5 in Java |
Convert PDF Documents to an image in Java |
Work with PDF Documents in Java |
Read and Write AVIF, HEIC, WEBP and other image formats |