Bethan Palmer Bethan is a Java developer and a Java Champion. She has spoken at conferences including JavaOne/Code One, DevFest and NetBeans days. She has a degree in English Literature.

How to extract images from a PDF file

1 min read

PDF files contain the raw images used on the PDF pages. You can extract and combine this data into a self-contained image. In this post, I am going to give you a brief overview and some useful links to learn how to do this in JPedal Java PDF library.

How are images stored in PDF files?

First, you need to understand how images are stored in a PDF file. A PDF contains a raw image (which may be much better quality than the the scaled version displayed), a transformation (which can scale, sheer, rotate, stretch the image), and a clip (which may remove parts of the image).

We can make use of all this when we extract the images from the PDF file. More information about understanding this can be found in our previous blog post. A classic use for this is extracting high quality images of products from existing catalogues for your online store (or all those cute kitten pictures from that PDF you downloaded).

The image data is not stored in PDFs as an image such as a JPG, PNG, TIFF etc. Instead images are stored as XObjects within the file, which contain information about the image. The  binary data used for the pixels,the colorspace information, clipping are all separate and ‘merged’ together to create the final image when the PDF is displayed. Further details about this can be found in our previous article on how images are stored in PDF files.

Find out more about image extraction from the PDF

In our JPedal library, we have already done all the hard work of making it possible to extract images from PDFs. We also have lots of example code and documentation on image extraction to get you started. And if you just want to extract the clipped image, we have an option for that too.

Did you know...

IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?

It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page

Bethan Palmer Bethan is a Java developer and a Java Champion. She has spoken at conferences including JavaOne/Code One, DevFest and NetBeans days. She has a degree in English Literature.

How to read HEIC image files in Java with…

In this article, I will explain how to read HEIC files into Java as a BufferedImage. ImageIO does not read HEIC file types so...
Mark Stephens
1 min read

How to convert WMF files to SVG in java…

This article will show you how to convert WMF files into SVG files using our JDeli Java Image library. What is WMF? WMF is...
Amy Pearson
1 min read

How to write WebP images in Java

In this article, I will walk you through how to write out images as WebP images in Java. ImageIO does not support WebP images...
Mark Stephens
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.