CCITT encoding in PDF files – Converting PDF CCITT data into a TIFF image

CCITT is used in PDF files to compress black and white image data. Using Huffman encoding, the data is squeezed into a much smaller compressed stream. You can manually extract the CCITT data, and the Dictionary values (K, isBlack, etc) or use several libraries to do this.

It actually turns out that CCITT is a compression format used in TIFF files. By adding some additional bytes and saving it in a file ending .tif, you can create a TIFF of the raw Image without you having to decode the data. Underneath is some Java code (but it should be easy to recode in any language) which will take the raw data and add the required bytes.

There may be some differences between the raw image and the image in the PDF – remember this is the raw image which may be inverted, coloured, clipped, etc.

We have been doing some work on improving our CCITT decoder, so it seemed like a good item to write a series of articles about. You can read all the published articles in the series by clicking here.

public static void saveAsTIFF(int w, int h,PdfObject DecodeParms,
byte[] data, String fileName){

/**
 * get values from stream
**/boolean isBlack = false;  //flag to show if default is black/white
int k = 0;

//DecodeParms is our Java object to contain all the data
// - you will need to manaually get these items with the CCITT data
if(DecodeParms!=null){

   //get k (type of encoding)
   k = DecodeParms.getInt(PdfDictionary.K);

   int columnsSet = DecodeParms.getInt(PdfDictionary.Columns);
   if(columnsSet!=-1)
      w=columnsSet;

      //get flag for white/black as default
      isBlack=DecodeParms.getBoolean(PdfDictionary.BlackIs1);

   }

/**
 * build the image
**/ByteArrayOutputStream bos=new ByteArrayOutputStream();

/**
 * tiff header (id, version, offset)
**/String[] headerValues={"4d","4d","00","2a","00","00","00","08"};
for(int i=0;i<headervalues.length;i++) bos.write(integer.parseint(headervalues[i],16));="" int="" tagcount="9;" appears="" to="" be="" minimum="" needed="" writeword="" and="" write="" tag="" are="" convenience="" methods="" add="" the="" values="" as="" bytes="" stream="" **="" *="" ifd="" image="" file="" directory="" writeword(string.valueof(tagcount),bos);="" num="" of="" entries="" writetag("256",="" "04",="" "01",="" string.valueof(w),="" bos);="" **width*="" writetag("257",="" string.valueof(h),="" **length*="" **bitspersample="" 258="" -="" b&w="" 1="" bit="" image*="" writetag("258",="" "03",="" "00010000h",="" if="" (k="=" 0){="" writetag("259",="" "00030000h",="" compression="" }else=""> 0)
   writeTag("259", "03", "01", "00020000h", bos); //compression
else if (k < 0)
   writeTag("259", "03", "01", "00040000h", bos); //compression

//photometricInterpretation
if(!isBlack)
   writeTag("262", "03", "01", "00000000h", bos);
else
   writeTag("262", "03", "01", "00010000h", bos);

//stripOffsets -start of data after tables
writeTag("273", "04", "1","122", bos); 

//samplesPerPixel
writeTag("277", "03", "01", "00010000h", bos);
//rowsPerStrip - uses height
writeTag("278", "04", "01", String.valueOf(h), bos);
//stripByteCount - 1 strip so all data
writeTag("279", "04", "1", String.valueOf(data.length),bos);
// write next IOD offset  zero as no other table
writeDWord("0",bos); 

/**
 * write the CCITT image data at the end
**/try{

   bos.write(data);
   bos.close();

} catch (IOException e) {
   LogWriter.writeLog("[PDF] Tiff exception  "+e);
}

/**save data as image */try {

   FileOutputStream fos=new FileOutputStream(fileName);
   fos.write(bos.toByteArray());
   fos.close();

   } catch (Error err) {
      LogWriter.writeLog("[PDF] Tiff error "+err);
   } catch (Exception e1) {
      LogWriter.writeLog("[PDF] Tiff exception  "+e1);
   }
}

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

8 thoughts on “CCITT encoding in PDF files – Converting PDF CCITT data into a TIFF image

  1. Ash

    Can you publish the code for writeWord and writeTag?

    • Here is the whole class (we also use it to decode CCITT with JAI)

      /**
      * ===========================================
      * Java Pdf Extraction Decoding Access Library
      * ===========================================
      *
      * Project Info: http://www.jpedal.org
      * (C) Copyright 1997-2008, IDRsolutions and Contributors.
      *
      * This file is part of JPedal
      *
      @LICENSE@
      *
      * —————
      * TiffDecoder.java
      * —————
      */
      package org.jpedal.io;

      import java.awt.image.DataBuffer;
      import java.awt.image.DataBufferByte;
      import java.awt.image.Raster;

      import java.io.ByteArrayOutputStream;
      import java.io.IOException;
      import java.util.Map;

      import org.jpedal.objects.raw.PdfDictionary;
      import org.jpedal.objects.raw.PdfObject;
      import org.jpedal.utils.LogWriter;

      /**
      * converts CCITT stream into either an image of bytestream
      *
      * Many thanks to Brian Burkhalter for all his help
      */
      public class TiffDecoder {

      private byte[] bytes;

      /**
      * called with values from PDF
      * Map contains values from PDF as stream pair
      */
      public TiffDecoder(int w, int h,Map values,byte[] data){

      //return value
      bytes=null;

      /**
      * get values from stream
      */
      //flag to show if default is black or white
      boolean isBlack = false;
      //int columns = 1728; //in PDF spec
      int k = 0;
      //boolean isByteAligned=false; //in PDF spec

      //get k (type of encoding)
      String value = (String) values.get(“K”);
      if (value != null)
      k = Integer.parseInt(value);

      /**
      //get flag for white/black as default
      value = (String) values.get(“EncodedByteAlign”);
      if (value != null)
      isByteAligned = Boolean.valueOf(value).booleanValue();*/

      //get flag for white/black as default
      value = (String) values.get(“BlackIs1”);
      if (value != null){
      isBlack = Boolean.valueOf(value).booleanValue();

      }

      /**not used but in Map from PDF
      value = (String) values.get(“Rows”);
      if (value != null)
      rows = Integer.parseInt(value);

      value = (String) values.get(“Columns”);
      if (value != null)
      columns= Integer.parseInt(value);*/

      buildImage(w, h, data, isBlack, k);
      }

      public TiffDecoder(int w, int h,PdfObject DecodeParms,byte[] data){

      //Map values=new HashMap();
      //return value
      bytes=null;

      /**
      * get values from stream
      */
      //flag to show if default is black or white
      boolean isBlack = false;
      //int columns = 1728; //in PDF spec
      int k = 0;
      //boolean isByteAligned=false; //in PDF spec

      if(DecodeParms!=null){

      //get k (type of encoding)
      k = DecodeParms.getInt(PdfDictionary.K);

      int columnsSet = DecodeParms.getInt(PdfDictionary.Columns);
      if(columnsSet!=-1)
      w=columnsSet;

      //get flag for white/black as default
      isBlack=DecodeParms.getBoolean(PdfDictionary.BlackIs1);

      }

      /**not used but in Map from PDF
      value = (String) values.get(“Rows”);
      if (value != null)
      rows = Integer.parseInt(value);

      value = (String) values.get(“Columns”);
      if (value != null)
      columns= Integer.parseInt(value);*/

      buildImage(w, h, data, isBlack, k);
      }

      /**
      * convenience method to add a header to a CCITT data block so it can be viewed as a TIFF
      * @param w
      * @param h
      * @param DecodeParms
      * @param data
      * @param fileName
      */
      public static void saveAsTIFF(int w, int h,PdfObject DecodeParms,byte[] data, String fileName){

      /**
      * get values from stream
      */
      boolean isBlack = false; //flag to show if default is black or white
      int k = 0;

      if(DecodeParms!=null){

      //get k (type of encoding)
      k = DecodeParms.getInt(PdfDictionary.K);

      int columnsSet = DecodeParms.getInt(PdfDictionary.Columns);
      if(columnsSet!=-1)
      w=columnsSet;

      //get flag for white/black as default
      isBlack=DecodeParms.getBoolean(PdfDictionary.BlackIs1);

      }

      /**
      * build the image
      */
      ByteArrayOutputStream bos=new ByteArrayOutputStream();

      /**
      * tiff header (id, version, offset)
      * */
      final String[] headerValues={“4d”,”4d”,”00″,”2a”,”00″,”00″,”00″,”08″};
      for(int i=0;i 0)
      writeTag(“259”, “03”, “01”, “00020000h”, bos); /**compression 259 */
      else if (k < 0) writeTag("259", "03", "01", "00040000h", bos); /**compression 259 */ if(!isBlack) writeTag("262", "03", "01", "00000000h", bos); /**photometricInterpretation 262 */ else writeTag("262", "03", "01", "00010000h", bos); /**photometricInterpretation 262 */ writeTag("273", "04", "1","122", bos); /**stripOffsets 273 -start of data after tables */ writeTag("277", "03", "01", "00010000h", bos); /**samplesPerPixel 277 */ writeTag("278", "04", "01", String.valueOf(h), bos); /**rowsPerStrip 278 - uses height */ writeTag("279", "04", "1", String.valueOf(data.length),bos); /**stripByteCount 279 - 1 strip so all data */ writeDWord("0",bos); /** write next IOD offset zero as no other table*/ /** * write the CCITT image data at the end */ try{ bos.write(data); bos.close(); } catch (IOException e) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff exception "+e); } /**save image */ try { java.io.FileOutputStream fos=new java.io.FileOutputStream(fileName); fos.write(bos.toByteArray()); fos.close(); } catch (Error err) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff error "+err); } catch (Exception e1) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff exception "+e1); } } private void buildImage(int w, int h, byte[] data, boolean isBlack, int k) { /** * build the image */ByteArrayOutputStream bos=new ByteArrayOutputStream(); /** * tiff header (id, version, offset) * */final String[] headerValues={"4d","4d","00","2a","00","00","00","08"}; for(int i=0;i 0)
      writeTag(“259”, “03”, “01”, “00020000h”, bos); /**compression 259 */
      else if (k < 0) writeTag("259", "03", "01", "00040000h", bos); /**compression 259 */ if(!isBlack) writeTag("262", "03", "01", "00000000h", bos); /**photometricInterpretation 262 */ else writeTag("262", "03", "01", "00010000h", bos); /**photometricInterpretation 262 */ writeTag("273", "04", "1","122", bos); /**stripOffsets 273 -start of data after tables */ writeTag("277", "03", "01", "00010000h", bos); /**samplesPerPixel 277 */ writeTag("278", "04", "01", String.valueOf(h), bos); /**rowsPerStrip 278 - uses height */ writeTag("279", "04", "1", String.valueOf(data.length),bos); /**stripByteCount 279 - 1 strip so all data */ writeDWord("0",bos); /** write next IOD offset zero as no other table*/ /** * write the CCITT image data at the end */ try{ bos.write(data); bos.close(); } catch (IOException e) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff exception "+e); } /**setup image */try { /**write out to debug* System.out.println("mac_"+data.length+".tiff"); java.io.FileOutputStream fos=new java.io.FileOutputStream("mac_"+data.length+".tiff"); fos.write(bos.toByteArray()); fos.close(); /***/JAIHelper.confirmJAIOnClasspath(); com.sun.media.jai.codec.ByteArraySeekableStream fss=new com.sun.media.jai.codec.ByteArraySeekableStream(bos.toByteArray());//.wrapInputStream(bis,true); javax.media.jai.RenderedOp op = (javax.media.jai.JAI.create("stream",fss)); Raster raster=op.getData(); //Raster raster = img2.getData(); DataBuffer db = raster.getDataBuffer(); DataBufferByte dbb = (DataBufferByte) db; bytes=dbb.getData(); if(!isBlack){ //invert if needed int bcount=bytes.length; for(int i=0;i>8)); //high byte
      bos.write(value & 0xFF); //low byte

      }

      /**write Dword (4 bytes to stream) */
      private static void writeDWord(String i, ByteArrayOutputStream bos) {

      int value=0;

      //allow decimal,octal or hex
      if(i.endsWith(“h”))
      value=Integer.parseInt(i.substring(0,i.length()-1),16);
      else if(i.endsWith(“o”))
      value=Integer.parseInt(i.substring(0,i.length()-1),8);
      else
      value=Integer.parseInt(i);

      bos.write((value>>24) & 0xff); //high byte
      bos.write((value>>16) & 0xff);
      bos.write((value>>8) & 0xff);
      bos.write(value & 0xFF); //low byte

      }

      /**write a tag to stream*/
      private static void writeTag(String TagId, String dataType, String DataCount, String DataOffset, ByteArrayOutputStream bos) {

      writeWord(TagId,bos);
      writeWord(dataType,bos);
      writeDWord(DataCount,bos);
      writeDWord(DataOffset,bos);

      }

      }

      • Ash

        Mark,
        Thanks for the post. I noticed that because the big-endian encoding you had to write the tag values in hex encoding, i.e. value of 1 = 00010000h, etc. I changed the methods to write in a litttle-endian in case somone is interested.

        /**
        *
        * @param data
        * @param w
        * @param h
        * @param encodedType
        * @param isBlack
        * @param fileName
        */
        private static void saveAsTIFF(byte[] data, int w, int h, int encodedType, Boolean isBlack, String fileName) throws IOException {

        String[] headerValues = {“49”, “49”,”2a”,”00″,”08″,”00″,”00″,”00″};

        /* build the image
        **/
        ByteArrayOutputStream bos = new ByteArrayOutputStream();

        /**
        * tiff header (id, version, offset)
        **/

        for(int i = 0; i 0)
        writeTag(259, 3, 1, 2, bos); //compression
        else if (encodedType > 8)); //high byte
        }

        /**
        * write Dword (4 bytes to stream)
        *
        * */
        private static void writeDWord(int value, ByteArrayOutputStream bos) {

        bos.write(value & 0xff); //low byte
        bos.write((value >> 8) & 0xff);
        bos.write((value >> 16) & 0xff);
        bos.write((value >> 24) & 0xff); //high byte
        }
        /**
        * write a tag to stream
        *
        * */
        private static void writeTag(int TagId, int dataType, int DataCount, int DataOffset, ByteArrayOutputStream bos) {

        writeWord(TagId, bos);
        writeWord(dataType, bos);
        writeDWord(DataCount, bos);
        writeDWord(DataOffset, bos);

        }

  2. Thanks for the alternative suggestion.

  3. William Lofgren

    Hi guys, Did either of you ever get this to work? I am trying to implement it via C++, using the little endian style. It looks like it is working (sizes are right, file is “complete” in that all data is there) but I keep getting an indication that the .tif file is not valid. Am I missing something? Note, I have a routine (GetFilterParams) that does what it says. Here is the code.

    long PDFHelper::sDecodeCCITT(char* sInput, int nK, int nCols, int nHeight,
    int nSize, bool bIsBlack)
    {
    char*sTIFF;
    intnComma, i, j;

    sTIFF = new char [TIFF_HEADER_SIZE];
    ZeroMemory(sTIFF, TIFF_HEADER_SIZE);
    CString sHeaderValues = “73,73,42,00,08,00,00,00”;
    nComma = sHeaderValues.Find(‘,’);
    i = 0;

    while (nComma != -1)
    {
    sTIFF[i] = atoi(sHeaderValues.Mid(nComma-2, nComma));
    nComma = sHeaderValues.Find(‘,’, nComma +1);
    i++;
    }
    sTIFF[i] = atoi(sHeaderValues.Mid(sHeaderValues.GetLength() – 2));
    i++;
    i = writeWord(9 , i, sTIFF);// num of directory entries
    // nID, nType, nDataCnt, nOffset
    i = writeTag(256, 4, 1, nCols, i, sTIFF);// width
    i = writeTag(257, 4, 1, nHeight, i, sTIFF);// height = length = scan lines = rows
    /**BitsPerSample 258 – b&w 1 bit image*/
    i = writeTag(258, 3, 1, 1, i, sTIFF);

    if (nK == 0)
    {
    i = writeTag(259, 3, 1, 3, i, sTIFF);//compression
    }
    else if (nK > 0)
    {
    i = writeTag(259, 3, 1, 2, i, sTIFF);//compression
    }
    else if (nK < 0)
    {
    i = writeTag(259, 3, 1, 4, i, sTIFF);//compression
    }

    //photometricInterpretation
    if(!bIsBlack)
    {
    i = writeTag(262, 3, 1, 0, i, sTIFF);
    }
    else
    {
    i = writeTag(262, 3, 1, 1, i, sTIFF);
    }

    //stripOffsets -start of data after tables
    i = writeTag(273, 4, 1, 122, i, sTIFF);

    //samplesPerPixel
    i = writeTag(277, 3, 1, 1, i, sTIFF);
    //rowsPerStrip – uses height
    i = writeTag(278, 4, 1, nHeight, i, sTIFF);
    //stripByteCount – 1 strip so all data
    i = writeTag(279, 4, 1, nSize, i, sTIFF);
    // write next IOD offset zero as no other table
    i = writeDWord(0, i, sTIFF);

    // Copy sTmp data to sTIFF
    for (j = 0; j < i; j++)
    {
    m_sStream[j] = sTIFF[j];
    }

    // write the CCITT image data at the end
    for (j = i; j >8;//high byte
    i++;
    return(i);
    }

    int PDFHelper::writeTag(int nID, int nType, int nDataCnt,
    int nOffset, int i, char* sStream)
    {
    //writeTag(256, 04, 01, nCols, i, sTIFF);// width
    i = writeWord(nID, i, sStream);
    i = writeWord(nType, i, sStream);
    i = writeDWord(nDataCnt, i, sStream);
    i = writeDWord(nOffset, i, sStream);
    return(i);
    }

    int PDFHelper::writeDWord(int nID, int i, char* sStream)
    {
    sStream[i] = nID & 0xFF;//low byte
    i++;
    sStream[i] = nID >>8;
    i++;
    sStream[i] = nID >16;
    i++;
    sStream[i] = nID >>24;//high byte
    i++;
    return(i);
    }

  4. No. We already have a Java implementation which works.

  5. Hi Mark Stephens,
    >No. We already have a Java implementation which works.
    Could you share me your code?

    I follow up your code above, but there is issues of format, so it is error.

    Thank you.

    • This code is just putting a header onto the Tiff data and asking Java to decode it. Java has some issues with some Tiff Data.

      There are some improvements to Java support for Tiffs in Java9 and we also now offer our own commercial image library with much better Tiff support (https://www.idrsolutions.com/jdeli).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>