Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to read ‘JPEG’ data inside a PDF in Java

1 min read

JPG icon

How is the data stored?

Pdf files contain compressed raw image data. This file is sometimes equivalent to a JPEG file so if you can extract the raw data and save it as a file with a filetype .jpeg, it will open as a JPEG.

Sometimes is the key word here because you may well need to interpet the data using colour information in the PDF file. For example, the actual data may be encoded Gray or DeviceRGB data (in which case it will look correct when you open the JPEG. But it may need some additional details (such as indexed colours) or be YCCK, in which case you will see the image but the colours will be wrong.

Although it cannot always make sense of these JPEG data (because the colour detail is not in the PDF, you can still use Java to open and access the pixel data in Java using ImageIO. The actual pixel data is stored in a Raster object.

So if you want to recreate the image you will need to get the pixel data and ‘merge’ it with the colour data. Here is how you can read the actual pixel data in Java. Even if Java does not understand the colours, it can access the actual pixels themselves.

Step 1 Read the JPEG data

//read the image data - data is a byte[] containing the data
in = new ByteArrayInputStream(data);
 
//choose JPEG decoder
Iterator iterator = ImageIO.getImageReadersByFormatName("JPEG");
 
while (iterator.hasNext())
{
Object o = iterator.next();
iir = (ImageReader) o;
if (iir.canReadRaster())
break;
}
 
ImageIO.setUseCache(false);
iin = ImageIO.createImageInputStream((in));
iir.setInput(iin, true);

Step 2 Read the pixels

//this is the actual pixel data
Raster ras=iir.readRaster(0, null);

Are you working with JPEG Images in Java?

You might like to check out our JDeli image library. It offers lots of advantages over ImageIO and free alternatives such as:-

  • prevent heap related JVM crashes
  • support for additional image formats such as Heic
  • reduce output file size
  • improve read/write performance
  • create smaller files
  • control over output
  • support threading
  • superior image scaling algorithms

Would you like to learn more about PDF files?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 20 years worth of PDF knowledge and tips, so click here to visit our series index!



Do you need to solve any of these problems in Java?

Convert PDF to HTML5
Convert PDF to HTML5
Convert PDF to SVG
Convert PDF to SVG
View Forms in the browser
View Forms in the browser
Java PDF Reader and Viewer
View PDF Documents
Convert PDF to image
Convert PDF to image
Extract Text from PDF
Extract Text from PDF
Read/Write images in Java
Read/Write images
Replace ImageIO
Replace ImageIO
Convert Image to PDF
Convert Image to PDF
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.