There are some ‘odd’ images in PDF files. They pretend to be CMYK but in fact they are not… Here is a description of what they really are and how to handle them.
PDF files can contain image data which is DCT encoded (ie it is a JPEG image). These JPEGs can be any colorspace (sRGB, CMYK, etc). However, not all CMYK images are actually CMYK. If you were to view them (even in a package which can handle CMYK JPEGs), they would look horrible.
These images are actually encoded as YCCK. You need to look at the image header to discover this.
Like CMYK, YCCK is made up of 4 channels but they are not the same.
CMYK consists of a mix of Cyan, Magenta, Yellow and Key (black). YCCK encodes the data so that information less sensitive to the huge eye is discarded (YCbCr), allowing it to keep more of the detail which our eye would notice. This is the YCC bit (K is the same).
So for each pixel value we need to translate the YCC parts into CMY values. Luckily there is a standard formula for doing this, defined in the original Postscript format (the Red Book). Here it is
R = clip(Y + 1.402 * Cr - 179.456); G= clip(Y - 0.34414 * Cb - 0.71414 * Cr + 135.45984); B = clip(Y + 1.772 * Cb - 226.816);
This gives us a value for RGB, which is not the RGB values for the pixel (we have not included the K value). But we can translate it into CMY using another formula or an ICC profile.
C = 255 - (int)R; M = 255 - (int)G; Y = 255 - (int)B;
This gives us the CMY pixels values which with the unaltered K value gives us CMYK. We can translate this into sRGB using profiles or several formulae. Dealing with colors is a very colorful experience!
Start reading and writing images with one line of code
BufferedImage image = JDeli.read(streamOrFile);
JDeli.write(myBufferedImage, OutputFormat.HEIC, outputStreamOrFile)