Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to extract images from PDF file in Java

58 sec read

pdf logo

PDF files are not directly supported by Java. This tutorial shows you how to extract images from a PDF file in 5 simple steps using the JPedal PDF library.

Why use a third party library to handle PDF files?

PDF files are a very complex binary/text hybrid data structure. The image data, color information and scaling details are all stored separately in a compressed format and need to be extracted and combined together.

A third-party library handles all the for you automatically. In this example, we will use our JPedal PDF library to make this task simple.

How to Extract images from PDF files with JPedal?

Step 1 Create a File handle, InputStream or URL pointing to the PDF file

ExtractImages extract=new ExtractImages(fileOrInputStreamOrURL);

Step 2 Include a password if file password protected

extract.setPassword("password");

Step 3 Open the PDF file

if (extract.openPDFFile()) {

Step 4 Iterate over the images on each page

     int pageCount=extract.getPageCount();
     for (int page=1; page<=pageCount; page++) {
 
        int imagesOnPageCount=extract.getImageCount(page);
        for (int image=0; image<imagesOnPageCount; image++) {
             BufferedImage image=extract.getImage(page, image, true);
         }
     }
 }

Step 5 Close the PDF file

 extract.closePDFfile();



Do you need to work with PDF files in Java?

Java PDF SDK for working with PDF filesJava PDF SDK for working with PDF files

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.