What is CCITT data?
CCITT is used to compress black and white image data. Using Huffman encoding, the data is squeezed into a much smaller compressed stream.
CCITT is also a compression format used in the TIFF file format. By adding some additional bytes to your raw CCITT data, and saving it in a file ending .tif, you can create a TIFF Image from raw CCITT data. My example is written in Java (but it should be easy to recode in any language). It will take the raw data and add the required bytes.
CCITT data in PDF files
CCITT is used as a compression format in PDF files for images in XObjects. You can manually extract the CCITT data and the Dictionary values (K, isBlack, etc) from PDF files if you want to reuse the images. If you have extracted the CCITT data from a PDF, there may be some differences between the raw image and the image in the PDF – remember this is the raw image which may be inverted, coloured, clipped, etc.
How to convert CCITT to a Tiff
- Get the CCITT parameters
- Create a metadata header
- Append the raw CCITT data
and the Java code to write TIFF…
Are you a Java Developer working with Image files?
What is JDeli?
JDeli is a commercial Java Image library that is used to read, write, convert, manipulate and process many different image formats.
Why use JDeli?
To handle many well known formats such as JPEG, PNG, TIFF as well as newer formats like AVIF, HEIC and JPEG XL in java with no calls to any external system or third party library.
What licenses are available?
We have 3 licenses available:
Server for on premises and cloud servers, Distribution for use in a named end user applications, and Custom for more demanding requirements.
How does JDeli compare?
We work hard to make sure JDeli performance is better than or similar to other java image libraries. Check out our benchmarks to see just how well JDeli performs.
Can you publish the code for writeWord and writeTag?
Here is the whole class (we also use it to decode CCITT with JAI)
/**
* ===========================================
* Java Pdf Extraction Decoding Access Library
* ===========================================
*
* Project Info: http://www.jpedal.org
* (C) Copyright 1997-2008, IDRsolutions and Contributors.
*
* This file is part of JPedal
*
@LICENSE@
*
* —————
* TiffDecoder.java
* —————
*/
package org.jpedal.io;
import java.awt.image.DataBuffer;
import java.awt.image.DataBufferByte;
import java.awt.image.Raster;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.Map;
import org.jpedal.objects.raw.PdfDictionary;
import org.jpedal.objects.raw.PdfObject;
import org.jpedal.utils.LogWriter;
/**
* converts CCITT stream into either an image of bytestream
*
* Many thanks to Brian Burkhalter for all his help
*/
public class TiffDecoder {
private byte[] bytes;
/**
* called with values from PDF
* Map contains values from PDF as stream pair
*/
public TiffDecoder(int w, int h,Map values,byte[] data){
//return value
bytes=null;
/**
* get values from stream
*/
//flag to show if default is black or white
boolean isBlack = false;
//int columns = 1728; //in PDF spec
int k = 0;
//boolean isByteAligned=false; //in PDF spec
//get k (type of encoding)
String value = (String) values.get(“K”);
if (value != null)
k = Integer.parseInt(value);
/**
//get flag for white/black as default
value = (String) values.get(“EncodedByteAlign”);
if (value != null)
isByteAligned = Boolean.valueOf(value).booleanValue();*/
//get flag for white/black as default
value = (String) values.get(“BlackIs1”);
if (value != null){
isBlack = Boolean.valueOf(value).booleanValue();
}
/**not used but in Map from PDF
value = (String) values.get(“Rows”);
if (value != null)
rows = Integer.parseInt(value);
value = (String) values.get(“Columns”);
if (value != null)
columns= Integer.parseInt(value);*/
buildImage(w, h, data, isBlack, k);
}
public TiffDecoder(int w, int h,PdfObject DecodeParms,byte[] data){
//Map values=new HashMap();
//return value
bytes=null;
/**
* get values from stream
*/
//flag to show if default is black or white
boolean isBlack = false;
//int columns = 1728; //in PDF spec
int k = 0;
//boolean isByteAligned=false; //in PDF spec
if(DecodeParms!=null){
//get k (type of encoding)
k = DecodeParms.getInt(PdfDictionary.K);
int columnsSet = DecodeParms.getInt(PdfDictionary.Columns);
if(columnsSet!=-1)
w=columnsSet;
//get flag for white/black as default
isBlack=DecodeParms.getBoolean(PdfDictionary.BlackIs1);
}
/**not used but in Map from PDF
value = (String) values.get(“Rows”);
if (value != null)
rows = Integer.parseInt(value);
value = (String) values.get(“Columns”);
if (value != null)
columns= Integer.parseInt(value);*/
buildImage(w, h, data, isBlack, k);
}
/**
* convenience method to add a header to a CCITT data block so it can be viewed as a TIFF
* @param w
* @param h
* @param DecodeParms
* @param data
* @param fileName
*/
public static void saveAsTIFF(int w, int h,PdfObject DecodeParms,byte[] data, String fileName){
/**
* get values from stream
*/
boolean isBlack = false; //flag to show if default is black or white
int k = 0;
if(DecodeParms!=null){
//get k (type of encoding)
k = DecodeParms.getInt(PdfDictionary.K);
int columnsSet = DecodeParms.getInt(PdfDictionary.Columns);
if(columnsSet!=-1)
w=columnsSet;
//get flag for white/black as default
isBlack=DecodeParms.getBoolean(PdfDictionary.BlackIs1);
}
/**
* build the image
*/
ByteArrayOutputStream bos=new ByteArrayOutputStream();
/** 0) 0)>8)); //high byte
* tiff header (id, version, offset)
* */
final String[] headerValues={“4d”,”4d”,”00″,”2a”,”00″,”00″,”00″,”08″};
for(int i=0;i
writeTag(“259”, “03”, “01”, “00020000h”, bos); /**compression 259 */
else if (k < 0) writeTag("259", "03", "01", "00040000h", bos); /**compression 259 */ if(!isBlack) writeTag("262", "03", "01", "00000000h", bos); /**photometricInterpretation 262 */ else writeTag("262", "03", "01", "00010000h", bos); /**photometricInterpretation 262 */ writeTag("273", "04", "1","122", bos); /**stripOffsets 273 -start of data after tables */ writeTag("277", "03", "01", "00010000h", bos); /**samplesPerPixel 277 */ writeTag("278", "04", "01", String.valueOf(h), bos); /**rowsPerStrip 278 - uses height */ writeTag("279", "04", "1", String.valueOf(data.length),bos); /**stripByteCount 279 - 1 strip so all data */ writeDWord("0",bos); /** write next IOD offset zero as no other table*/ /** * write the CCITT image data at the end */ try{ bos.write(data); bos.close(); } catch (IOException e) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff exception "+e); } /**save image */ try { java.io.FileOutputStream fos=new java.io.FileOutputStream(fileName); fos.write(bos.toByteArray()); fos.close(); } catch (Error err) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff error "+err); } catch (Exception e1) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff exception "+e1); } } private void buildImage(int w, int h, byte[] data, boolean isBlack, int k) { /** * build the image */ByteArrayOutputStream bos=new ByteArrayOutputStream(); /** * tiff header (id, version, offset) * */final String[] headerValues={"4d","4d","00","2a","00","00","00","08"}; for(int i=0;i
writeTag(“259”, “03”, “01”, “00020000h”, bos); /**compression 259 */
else if (k < 0) writeTag("259", "03", "01", "00040000h", bos); /**compression 259 */ if(!isBlack) writeTag("262", "03", "01", "00000000h", bos); /**photometricInterpretation 262 */ else writeTag("262", "03", "01", "00010000h", bos); /**photometricInterpretation 262 */ writeTag("273", "04", "1","122", bos); /**stripOffsets 273 -start of data after tables */ writeTag("277", "03", "01", "00010000h", bos); /**samplesPerPixel 277 */ writeTag("278", "04", "01", String.valueOf(h), bos); /**rowsPerStrip 278 - uses height */ writeTag("279", "04", "1", String.valueOf(data.length),bos); /**stripByteCount 279 - 1 strip so all data */ writeDWord("0",bos); /** write next IOD offset zero as no other table*/ /** * write the CCITT image data at the end */ try{ bos.write(data); bos.close(); } catch (IOException e) { if(LogWriter.isOutput()) LogWriter.writeLog("[PDF] Tiff exception "+e); } /**setup image */try { /**write out to debug* System.out.println("mac_"+data.length+".tiff"); java.io.FileOutputStream fos=new java.io.FileOutputStream("mac_"+data.length+".tiff"); fos.write(bos.toByteArray()); fos.close(); /***/JAIHelper.confirmJAIOnClasspath(); com.sun.media.jai.codec.ByteArraySeekableStream fss=new com.sun.media.jai.codec.ByteArraySeekableStream(bos.toByteArray());//.wrapInputStream(bis,true); javax.media.jai.RenderedOp op = (javax.media.jai.JAI.create("stream",fss)); Raster raster=op.getData(); //Raster raster = img2.getData(); DataBuffer db = raster.getDataBuffer(); DataBufferByte dbb = (DataBufferByte) db; bytes=dbb.getData(); if(!isBlack){ //invert if needed int bcount=bytes.length; for(int i=0;i
bos.write(value & 0xFF); //low byte
}
/**write Dword (4 bytes to stream) */
private static void writeDWord(String i, ByteArrayOutputStream bos) {
int value=0;
//allow decimal,octal or hex
if(i.endsWith(“h”))
value=Integer.parseInt(i.substring(0,i.length()-1),16);
else if(i.endsWith(“o”))
value=Integer.parseInt(i.substring(0,i.length()-1),8);
else
value=Integer.parseInt(i);
bos.write((value>>24) & 0xff); //high byte
bos.write((value>>16) & 0xff);
bos.write((value>>8) & 0xff);
bos.write(value & 0xFF); //low byte
}
/**write a tag to stream*/
private static void writeTag(String TagId, String dataType, String DataCount, String DataOffset, ByteArrayOutputStream bos) {
writeWord(TagId,bos);
writeWord(dataType,bos);
writeDWord(DataCount,bos);
writeDWord(DataOffset,bos);
}
}
Mark,
Thanks for the post. I noticed that because the big-endian encoding you had to write the tag values in hex encoding, i.e. value of 1 = 00010000h, etc. I changed the methods to write in a litttle-endian in case somone is interested.
/**
*
* @param data
* @param w
* @param h
* @param encodedType
* @param isBlack
* @param fileName
*/
private static void saveAsTIFF(byte[] data, int w, int h, int encodedType, Boolean isBlack, String fileName) throws IOException {
String[] headerValues = {“49”, “49”,”2a”,”00″,”08″,”00″,”00″,”00″};
/* build the image
**/
ByteArrayOutputStream bos = new ByteArrayOutputStream();
/**
* tiff header (id, version, offset)
**/
for(int i = 0; i 0)
writeTag(259, 3, 1, 2, bos); //compression
else if (encodedType > 8)); //high byte
}
/**
* write Dword (4 bytes to stream)
*
* */
private static void writeDWord(int value, ByteArrayOutputStream bos) {
bos.write(value & 0xff); //low byte
bos.write((value >> 8) & 0xff);
bos.write((value >> 16) & 0xff);
bos.write((value >> 24) & 0xff); //high byte
}
/**
* write a tag to stream
*
* */
private static void writeTag(int TagId, int dataType, int DataCount, int DataOffset, ByteArrayOutputStream bos) {
writeWord(TagId, bos);
writeWord(dataType, bos);
writeDWord(DataCount, bos);
writeDWord(DataOffset, bos);
}
Thanks for the alternative suggestion.
Hi guys, Did either of you ever get this to work? I am trying to implement it via C++, using the little endian style. It looks like it is working (sizes are right, file is “complete” in that all data is there) but I keep getting an indication that the .tif file is not valid. Am I missing something? Note, I have a routine (GetFilterParams) that does what it says. Here is the code.
long PDFHelper::sDecodeCCITT(char* sInput, int nK, int nCols, int nHeight,
int nSize, bool bIsBlack)
{
char*sTIFF;
intnComma, i, j;
sTIFF = new char [TIFF_HEADER_SIZE];
ZeroMemory(sTIFF, TIFF_HEADER_SIZE);
CString sHeaderValues = “73,73,42,00,08,00,00,00”;
nComma = sHeaderValues.Find(‘,’);
i = 0;
while (nComma != -1)
{
sTIFF[i] = atoi(sHeaderValues.Mid(nComma-2, nComma));
nComma = sHeaderValues.Find(‘,’, nComma +1);
i++;
}
sTIFF[i] = atoi(sHeaderValues.Mid(sHeaderValues.GetLength() – 2));
i++;
i = writeWord(9 , i, sTIFF);// num of directory entries
// nID, nType, nDataCnt, nOffset
i = writeTag(256, 4, 1, nCols, i, sTIFF);// width
i = writeTag(257, 4, 1, nHeight, i, sTIFF);// height = length = scan lines = rows
/**BitsPerSample 258 – b&w 1 bit image*/
i = writeTag(258, 3, 1, 1, i, sTIFF);
if (nK == 0)
{
i = writeTag(259, 3, 1, 3, i, sTIFF);//compression
}
else if (nK > 0)
{
i = writeTag(259, 3, 1, 2, i, sTIFF);//compression
}
else if (nK < 0)
{
i = writeTag(259, 3, 1, 4, i, sTIFF);//compression
}
//photometricInterpretation
if(!bIsBlack)
{
i = writeTag(262, 3, 1, 0, i, sTIFF);
}
else
{
i = writeTag(262, 3, 1, 1, i, sTIFF);
}
//stripOffsets -start of data after tables
i = writeTag(273, 4, 1, 122, i, sTIFF);
//samplesPerPixel
i = writeTag(277, 3, 1, 1, i, sTIFF);
//rowsPerStrip – uses height
i = writeTag(278, 4, 1, nHeight, i, sTIFF);
//stripByteCount – 1 strip so all data
i = writeTag(279, 4, 1, nSize, i, sTIFF);
// write next IOD offset zero as no other table
i = writeDWord(0, i, sTIFF);
// Copy sTmp data to sTIFF
for (j = 0; j < i; j++)
{
m_sStream[j] = sTIFF[j];
}
// write the CCITT image data at the end
for (j = i; j >8;//high byte
i++;
return(i);
}
int PDFHelper::writeTag(int nID, int nType, int nDataCnt,
int nOffset, int i, char* sStream)
{
//writeTag(256, 04, 01, nCols, i, sTIFF);// width
i = writeWord(nID, i, sStream);
i = writeWord(nType, i, sStream);
i = writeDWord(nDataCnt, i, sStream);
i = writeDWord(nOffset, i, sStream);
return(i);
}
int PDFHelper::writeDWord(int nID, int i, char* sStream)
{
sStream[i] = nID & 0xFF;//low byte
i++;
sStream[i] = nID >>8;
i++;
sStream[i] = nID >16;
i++;
sStream[i] = nID >>24;//high byte
i++;
return(i);
}
No. We already have a Java implementation which works.
Hi Mark Stephens,
>No. We already have a Java implementation which works.
Could you share me your code?
I follow up your code above, but there is issues of format, so it is error.
Thank you.
This code is just putting a header onto the Tiff data and asking Java to decode it. Java has some issues with some Tiff Data.
There are some improvements to Java support for Tiffs in Java9 and we also now offer our own commercial image library with much better Tiff support (https://www.idrsolutions.com/jdeli).