How to extract JPG data from PDF

Table of Contents show

Overview

It is actually possible to extract some raw images from the PDF file. In general, images do not exist inside a PDF file – TIFFs and PNGs are ripped apart and the data stored in separate objects. The data is compressed using various compression formats (JBIG2, CCITT, FLATE, LZW).

However, one of the formats used for image data is the DCT format. This is actually a JPEG, and if you take the binary data out and save it in a file with a .jpeg format, you can open it. It includes not just the pixel data but also the JPEG header at the start – it is a complete file.

Trial JDeli Now

How is the JPEG data stored?

If you open a PDF file, the stored JPEG data will appear in the XObject image. Here is an example.

14 0 obj
<<
/Intent/RelativeColorimetric
/Type/XObject
/ColorSpace/DeviceGray
/Subtype/Image
/Name/X
/Width 2988
/BitsPerComponent 8
/Length 134030
/Height 2286
/Filter/DCTDecode
>>
stream (binary data) endstream

Key Indicators in the PDF Object

The /Type shows that this is an image. The key section is the /Filter value – DCTDecode indicates a JPEG (JPX shows a JPEG2000) which also works.

The data is between stream and endstream. You need to extract the raw data (cut and paste of text is unlikely to work) for the jpeg file. The /Length value shows how long it is.

Understanding the Colour Space

Lastly, the /Colorspace is important because it shows the colour-coding used in the JPEG. If it is DeviceRGB, it will look exactly as it is in the PDF display. Not many viewers understand types like DeviceCMYK – you may need a heavyweight package like Photoshop to see it correctly.

Notes on Clipped Images

If the image is clipped, you may find you can see background details not in the PDF display and the image may also be a different size or even upside down. But you have extracted the raw image data!

As experienced Java developers, we help you work with images in Java and bring over a decade of hands-on experience with many image file formats.

Are you a Java Developer working with Image files?

// Read an image
BufferedImage bufferedImage = JDeli.read(avifImageFile);

// Write an image
JDeli.write(bufferedImage, "avif", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(dicomImageFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(heicImageFile);

// Write an image
JDeli.write(bufferedImage, "heic", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(jpegImageFile);

// Write an image
JDeli.write(bufferedImage, "jpeg", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(jpeg2000ImageFile);

// Write an image
JDeli.write(bufferedImage, "jpx", outputStreamOrFile);

// Write an image
JDeli.write(bufferedImage, "pdf", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(pngImageFile);

// Write an image
JDeli.write(bufferedImage, "png", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(tiffImageFile);

// Write an image
JDeli.write(bufferedImage, "tiff", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(webpImageFile);

// Write an image
JDeli.write(bufferedImage, "webp", outputStreamOrFile);

5 Replies to “How to extract JPG data from PDF”

Sleeper says:
April 29, 2016 at 4:26 pm
One caveat, if the PDF has been encrypted then this won’t work. The binary data in the stream that makes up the Jpeg will have been encrypted. The file produced by this procedure won’t be a valid Jpeg.
Edie says:
August 7, 2017 at 2:42 pm
Is it possible to do this with TIFF images as well? (The PDF samples I am looking at have been compressed – FlateDecode)
1. Mark Stephens says:
  August 7, 2017 at 2:44 pm
  Only if it is CCITT encoded and you write your own meta header onto it
Waldek says:
March 29, 2019 at 9:43 am
What about GIF image (LZW compression)?
1. Mark Stephens says:
  March 29, 2019 at 9:48 am
  There is no GIF header on the LZW data and you would still need to build in the colorspace (which could need data conversion).

Comments are closed.

How to extract JPG data from PDF

Overview

How is the JPEG data stored?

Key Indicators in the PDF Object

Understanding the Colour Space

Notes on Clipped Images

Are you a Java Developer working with Image files?

What is JDeli?

Why use JDeli?

What licenses are available?

How does JDeli compare?

What is PDF/A?

Apache Commons Imaging Alternative for Java: JDeli

TwelveMonkeys Alternative for Java Image Processing

5 Replies to “How to extract JPG data from PDF”

How to extract JPG data from PDF

Overview

How is the JPEG data stored?

Key Indicators in the PDF Object

Understanding the Colour Space

Notes on Clipped Images

Related posts:

Are you a Java Developer working with Image files?

What is JDeli?

Why use JDeli?

What licenses are available?

How does JDeli compare?

What is PDF/A?

Apache Commons Imaging Alternative for Java: JDeli

TwelveMonkeys Alternative for Java Image Processing

5 Replies to “How to extract JPG data from PDF”