When you look at a PDF file you see images displayed. In fact there are ‘several’ versions of each image…
Firstly there is the raw, unclipped version of the image. This may be in an ‘odd colorspace’ – see this previous posting for a good example.
This RAW image may also be much bigger than what you see onscreen. This can be useful sometimes if you want to generate the highest quality version of the extracted image – for example putting content from a catalogue on a website. There is a good example of this on the clipped image tab at the extraction examples page linking to a documented example. In this we use the high quality raw image (if present) and scale the clip and scaling up to the image rather than scaling the image down to the page.
The RAW image may also be rotated differently and have a background which is not present in the final PDF. When it is drawn on the page a transformation is applied (which can include scaling, rotation, sheering and clipping). In Java we also convert the images to sRGB.
The FINAL image is what you see on the PDF page so all of these other versions are ‘hidden’.
When you view the PDF page, you will always see the final page, but if you are doing extraction it can be useful to differentiate between the different versions. Sometimes they can be more useful. What would you use them for?
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Latest posts by Mark Stephens (see all)
- Introducing the new XFA Parser in FormVu - May 16, 2018
- Moving to JPedal release 8 - May 2, 2018
- Which version of Java SE should I use? - April 25, 2018
- How we are improving our code quality with IDEA in 2018 - March 7, 2018
- How we are improving our code quality with NetBeans in 2018 - March 1, 2018