Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to convert a PDF file to Image or extract embedded images in Java

1 min read

Why use a library to handle PDF files?

PDF files are a very complex binary/text hybrid data structure which is a subset of the even more complicated Postscript format. The data needs to be parsed and assembled from many sources to create the pages displayed or extract images from a PDF file.  This is why we wrote JPedal, to make these tasks simple.

Simple PDF page to image Conversion in JPedal

The most popular use for JPedal is on a server for rasterizing the pages of a PDF file to BufferedImage, for saving as images. Here is how to do this conversion (Javadoc).

ConvertPagesToImages extract=new ConvertPagesToImages("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
     int pageCount=extract.getPageCount();
     for (int page=1; page<=pageCount; page++) {
        BufferedImage image=extract.getPageAsImage(page, isBackgroundTransparent);
     }
 }
 
 extract.closePDFfile();

Highly Configurable PDF page to image Conversion in JPedal

Many of our customers also ask for additional options to create bigger images, create a specific sized image, or make use of any high resolution images to create a higher quality image output. Here is how to do this conversion (Javadoc). The additional HashMap option allows for a large number of additional configuration settings.

ConvertPagesToImages extract=new ConvertPagesToHiResImages extract=new ConvertPagesToHiResImages("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 
//see  https://javadoc.idrsolutions.com/org/jpedal/constants/JPedalSettings.html
 HashMap options=new HashMap(); 
 
 if (extract.openPDFFile()) {
     int pageCount=extract.getPageCount();
     for (int page=1; page<=pageCount; page++) {
 
        BufferedImage image=extract.getPageAsHiResImage(page, isBackgroundTransparent, &nbsp;options);
     }
 }
 
 extract.closePDFfile();

Extract images from  PDF files with JPedal

PDF files can contain embedded images which are drawn when the PDF is displayed. JPedal provides direct image extraction  (Javadoc).

ExtractImages extract=new ExtractImages("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
     int pageCount=extract.getPageCount();
     for (int page=1; page<=pageCount; page++) {
 
        int imagesOnPageCount=extract.getImageCount(page);
        for (int image=0; image<imagesOnPageCount; image++) {
             BufferedImage image=extract.getImage(page, image, true);
         }
     }
 }
 
 extract.closePDFfile();

Extract Clipped images from  PDF files with JPedal

Images in a PDF often have a clip applied. JPedal allows for the clipped image to be extracted  (Javadoc).

ExtractClippedImages extract=new ExtractClippedImages("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
     int pageCount=extract.getPageCount();
     for (int page=1; page<=pageCount; page++) {
 
        int imagesOnPageCount=extract.getImageCount(page);
        for (int image=0; image<imagesOnPageCount; image++) {
             BufferedImage image=extract.getClippedImage(page, image);
        }
     }
 }
 
 extract.closePDFfile();

As you can see the JPedal API provides a great deal of easy to use functionality with PDF files and Image handling. Are there any other additional features you would like to see?

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to set up Jenkins in five minutes.

Recently we have been joined by Rudairo J Chitsenga who has spent some time with us having completed a course in software testing. During...
Guest Blogger
2 min read

What’s new in Java13?

Nirali
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2019. All rights reserved.