The PDF file I have been looking at today has an issue which is more of a Java ‘feature‘ than a PDF bug but it does cover some PDF features so it is worth covering.
When you define an ICCColorspace, you can define an alternate Colorspace which can be used to display the data (generally DeviceRGB for 3 colors and DeviceCMYK for 4). This can be used instead of the ICCprofile and gives good enough results for most cases. As an ICCProfile is relatively slow in Java, we use it in preference for image conversion. And if the Data is compressed with DCTDecode we can use ImageIO to extract the data. So far so good.
However, we have found a file which does not work with ImageIO. It gives an exception inside Java itself
java.awt.color.CMMException: Invalid image format
at com.sun.imageio.plugins.jpeg.JPEGImageReader.readImage(Native Method)
The JPG does work,however, if you decode it as an ICC JPeg by extraction the Raster and then converting manually. So we have adopted the pragmatic solution. We will still try to decode it with the Alternate colorspace, so we get all the benefits. But we will check to see if it fails, and treat it as an ICCColorspace. It seems a reasonable workaround to allow for an issue in the JVM.
I have come across quite a few issues with ICCColorspaces in Java5 and there are still some in Java6 so I hope Java7/8 will improve on ICC support. Do you have any favorite workarounds for ICCcolorspace limitations in Java?
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Do you need to solve any of these problems in Java?
Convert PDF to HTML5
Convert PDF to SVG
View Forms in the browser
View PDF Documents
Convert PDF to image
Extract Text from PDF
Convert Image to PDF