  1. George

    I am working with USN ship deck logs provided by National Archives. My objective is the extraction of the page images for image processing to locate certain features on the page(s). My S/W is able to locate the image stream . . . endstream & length from the PDF.

    In early versions of the PDFs, the pages consisted of JPEG images embedded within the PDF. In the latest PDFs, the images are stored in what appears to be an LZ77 format. I can extract & inflate the image stream using zlib, bu the resulting image looks like salt & pepper.

    Are your aware of any other info defined in the PDF that might be needed for the inflate op?

    BTW – the image stream starts with x78 x9C, that is, there is no header in the stream before those to byte.

  2. markstephens

    If it is a DCTDecode block, my guess is that it is not RGB. In that case you would need to post-process it. I would recommend using a tool like Photoshop or Itext’s RUPS to drill down and see what is happening.

