There are some ‘odd’ images in PDF files. They pretend to be CMYK but in fact they are not… Here is a description of what they really are and how to handle them.
PDF files can contain image data which is DCT encoded (ie it is a JPEG image). These JPEGs can be any colorspace (sRGB, CMYK, etc). However, not all CMYK images are actually CMYK. If you were to view them (even in a package which can handle CMYK JPEGs), they would look horrible.
These images are actually encoded as YCCK – you can tell by looking at the header or use this Java code.
com.sun.image.codec.jpeg.JPEGImageDecoder decoder = com.sun.image.codec.jpeg.JPEGCodec.createJPEGDecoder(in); Raster currentRaster = decoder.decodeAsRaster(); //4 is CMYK, 7 is YCCK int colorType = decoder.getJPEGDecodeParam().getEncodedColorID();
Like CMYK, YCCK is made up of 4 channels but they are not the same.
CMYK consists of a mix of Cyan, Magenta, Yellow and Key (black). YCCK encodes the data so that information less sensitive to the huge eye is discarded (YCbCr), allowing it to keep more of the detail which our eye would notice. This is the YCC bit (K is the same).
So for each pixel value we need to translate the YCC parts into CMY values. Luckily there is a standard formula for doing this, defined in the original Postscript format (the Red Book). Here it is
R = clip(Y + 1.402 * Cr - 179.456); G= clip(Y - 0.34414 * Cb - 0.71414 * Cr + 135.45984); B = clip(Y + 1.772 * Cb - 226.816);
This gives us a value for RGB, which is not the RGB values for the pixel (we have not included the K value). But we can translate it into CMY using anther formula
C = 255 - (int)R; M = 255 - (int)G; Y = 255 - (int)B;
This gives us the CMY pixels values which with the unaltered K value gives us CMYK. We can translate this into sRGB using profiles or several formulae. Dealing with colors is a very colorful experience!
Do you have any tips for color conversion?
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Latest posts by Mark Stephens (see all)
- 3 ways that the European Union is changing the way Companies write software in 2018 - January 31, 2018
- IDRsolutions product range update for 2018 - January 22, 2018
- 4 ways Companies can make remote working successful - December 21, 2017
- My experience of a Turkish bath (visiting Istanbul for DevFest) - November 24, 2017
- My 5 key takeaways from JavaOne 2017 - October 6, 2017