Bethan Palmer Bethan is a Java developer and product manager for JPedal at IDRsolutions. She has spoken at conferences including JavaOne and NetBeans day and has a degree in English Literature.

Back to Basics: The 2017 Guide to to PDF Files – Extracting images from PDF files

1 min read

Did you know that not only can you convert PDF files into images, (as I explained last time), but you can also extract the actual images used on the page. In this post, I am going to tell you more…

How are images stored in PDF files?

First, you need to understand how images are stored in a PDF file. A PDF contains a raw image (which may be much better quality than the the scaled version displayed), a transformation (which can scale, sheer, rotate, stretch the image), and a clip (which may remove parts of the image).

We can make use of all this when we extract the images from the PDF file. More information about understanding this can be found in our previous blog post. A classic use for this is extracting high quality images of products from existing catalogues for your online store (or all those cute kitten pictures from that PDF you downloaded).

The image data is not stored in PDFs as an image such as a JPG, PNG, TIFF etc. Instead images are stored as XObjects within the file, which contain information about the image. The  binary data used for the pixels,the colorspace information, clipping are all separate and ‘merged’ together to create the final image when the PDF is displayed. Further details about this can be found in our previous article on how images are stored in PDF files.

Find out more about image extraction from the PDF

In our JPedal library, we have already done all the hard work of making it possible to extract images from PDFs. We also have lots of example code and documentation on image extraction to get you started. And if you just want to extract the clipped image, we have an option for that too.

Next time we will take a look at some further reading you can do on PDF & Java.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the  Blog RSS.

Bethan Palmer Bethan is a Java developer and product manager for JPedal at IDRsolutions. She has spoken at conferences including JavaOne and NetBeans day and has a degree in English Literature.

Leave a Reply

Your email address will not be published. Required fields are marked *