There are several versions of each image inside your PDF file

When you look at a PDF file you see images displayed. In fact there are ‘several’ versions of each image…

Firstly there is the raw, unclipped version of the image. This may be in an ‘odd colorspace’ – see this previous posting for a good example.

This RAW image may also be much bigger than what you see onscreen. This can be useful sometimes if you want to generate the highest quality version of the extracted image – for example putting content from a catalogue on a website. There is  a good example of this on the clipped image tab at the extraction examples page linking to a documented example. In this we use the high quality raw image (if present) and scale the clip and scaling up to the image rather than scaling the image down to the page.

The RAW image may also be rotated differently and have a background which is not present in the final PDF. When it is drawn on the page a transformation is applied (which can include scaling, rotation, sheering and clipping). In Java we also convert the images to sRGB.

The FINAL image is what you see on the PDF page so all of these other versions are ‘hidden’.

When you view the PDF page, you will always see the final page, but if you are doing extraction it can be useful to differentiate between the different versions. Sometimes they can be more useful. What would you use them for?

This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>