There are some ‘odd’ images in PDF files. They pretend to be CMYK but in fact they are not… Here is a description of what they really are and how to handle them.
PDF files can contain image data which is DCT encoded (ie it is a JPEG image). These JPEGs can be any colorspace (sRGB, CMYK, etc). However, not all CMYK images are actually CMYK. If you were to view them (even in a package which can handle CMYK JPEGs), they would look horrible.
These images are actually encoded as YCCK – you can tell by looking at the header or use this Java code.
com.sun.image.codec.jpeg.JPEGImageDecoder decoder = com.sun.image.codec.jpeg.JPEGCodec.createJPEGDecoder(in); Raster currentRaster = decoder.decodeAsRaster(); //4 is CMYK, 7 is YCCK int colorType = decoder.getJPEGDecodeParam().getEncodedColorID();
Like CMYK, YCCK is made up of 4 channels but they are not the same.
CMYK consists of a mix of Cyan, Magenta, Yellow and Key (black). YCCK encodes the data so that information less sensitive to the huge eye is discarded (YCbCr), allowing it to keep more of the detail which our eye would notice. This is the YCC bit (K is the same).
So for each pixel value we need to translate the YCC parts into CMY values. Luckily there is a standard formula for doing this, defined in the original Postscript format (the Red Book). Here it is
R = clip(Y + 1.402 * Cr - 179.456); G= clip(Y - 0.34414 * Cb - 0.71414 * Cr + 135.45984); B = clip(Y + 1.772 * Cb - 226.816);
This gives us a value for RGB, which is not the RGB values for the pixel (we have not included the K value). But we can translate it into CMY using anther formula
C = 255 - (int)R; M = 255 - (int)G; Y = 255 - (int)B;
This gives us the CMY pixels values which with the unaltered K value gives us CMYK. We can translate this into sRGB using profiles or several formulae. Dealing with colors is a very colorful experience!
Do you have any tips for color conversion?
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.