How does CCITT compress image data?

Table of Contents show

How does CCITT compression work?

CCITT encodes black and white data. It does this by encoding runs of black or white pixels. We can do this in various ways (G31D/ G32D/G42D). They are also known as Group 3/ Group 4 compression. We explain how the most common type (G31D) works in detail below.

As most images contain more white than black, we assume that we start with white. For cases where we do not start with white, we add a marker at the start to show this.

If we encode black as value 1, we just set these bits in our decompressed data – we do not explicitly need to set white values (because it is binary, not setting a value to black means that it is white).

But sometimes, we find that there are more pixels that are black that white. Well, in this case, we can just invert the image (flipping bits is very fast) and then we get the best compression.

All we need is a flag (BlackIs1 in the PDF file format – it’s default value is false) so flag that the image data needs inversion to appear correctly.

Trial JDeli Now

How does G31D compression work?

This is the simpler form of CCITT to decode. Firstly here are some keywords that would make it easier to understand how G31D works.

Key Terms

Pixel run- Usually 1-bit, 1 for Black and 0 for White. A block of pixels all the same.
Scan line– The width of data from one end of the page to the other.
Code Words– This contains information regarding what the data does eg makeup or Terminating.
Run Length– Block of either White or Black bits to be decoded/ encoded.
End of line(EOL)- Unique 12-bit code word used to determine the start and end of a scan line.
Return to control(RTC)- Six EOL code words occurring consecutively usually determines the end of the file. EOL & RTC would become more obvious in later blogs.

Overview of G31D

G31D CCITT is a variation to the Huffman keyed compression scheme. Essentially to decode a G31D PDF file, a scan line is read in single bit pixel runs. Each of these bits representing a number of white or black pixels.

The black and white run length alternate and vary in length making them uniquely identified when decoded, the maximum size of a the run lengths is bounded to the maximum width of the scan line(page width).

More frequently occurring run-lengths are assigned to smaller code words while less frequently occurring run-lengths are assigned to longer code words. This is particularly useful as in a typical hand written or printed document more short run-lengths are encountered than long run-lengths.

Encoding and Decoding Process

While still on the subject of pixel runs and run-lengths it is important to mention facts about how pixel runs are encoded which in turn makes it easier to to decode.
Pixel runs which are between 0 and 63 pixels in length are generally encoded using a single terminating code while runs between 64 and 2623 are encoded by a single make up code and a terminating code.
When the run length is above 2623 pixels they are encoded using as many make up codes as needed and only a terminating code.

Firstly a pre-calculated lookup table for both the black and white pixel runs have to be created to which the current data is compared against. You want to be able to keep track of your current bit location in the scan line.

This is so that when a different bit is hit, be it black or white the decoder can group the previous bits into code word of either make up (longer code words) or terminating (shorter code words) code words which are then checked against the table and decoded as needed.

The make-up code word represents long run-lengths while the short run-length is represented by the terminating cord-words. The sum of the length values of each code word makes up the run length. The process is repeated as new EOLs are hit.

It is also worth mentioning that each EOL usually starts with a white run length code word. But there are some unusual cases where it does not follow the norm i.e. begins with a black run-length.

In this situation, the beginning of that scan is preceded by a zero length white run-length code word. However, if 6 EOLs are hit consecutively then this denotes the end of the file i.e. RTC.

Advantages and Disadvantages

Advantages

Good compression of black and white data.

Disadvantages

Cannot optimise across lines or for multiple empty lines.
Takes a while to get to grips with the algorithm.

Do you need to read or write Tiff files in Java?

Our JDeli image library (the best enterprise-level Java image library for performance and efficiency) offers a range of advantages over ImageIO and alternatives for Tiff files, including:

prevents heap related JVM crashes
reads 1-32 bit bilevel, grayscale, rgb, argb, cmyk, acmyk, ycbcr Colorspaces, and converts to sRGB BufferedImage
implements both Little and Big Endian Byte Ordering
decompresses uncompressed, CCITT group 3 and 4, Deflate/Adobe Deflate, LZW, Packbits
support for Single, Multi-file, Tiling, Planar (Chunky, Separated), Predictor, 16,32 bit floating samples
improve read performance
supports threading
superior image scaling algorithms

Learn more about JDeli, or download it to try it yourself.

As experienced Java developers, we help you work with images in Java and bring over a decade of hands-on experience with many image file formats.

Are you a Java Developer working with Image files?

// Read an image
BufferedImage bufferedImage = JDeli.read(avifImageFile);

// Write an image
JDeli.write(bufferedImage, "avif", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(dicomImageFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(heicImageFile);

// Write an image
JDeli.write(bufferedImage, "heic", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(jpegImageFile);

// Write an image
JDeli.write(bufferedImage, "jpeg", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(jpeg2000ImageFile);

// Write an image
JDeli.write(bufferedImage, "jpx", outputStreamOrFile);

// Write an image
JDeli.write(bufferedImage, "pdf", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(pngImageFile);

// Write an image
JDeli.write(bufferedImage, "png", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(tiffImageFile);

// Write an image
JDeli.write(bufferedImage, "tiff", outputStreamOrFile);

// Read an image
BufferedImage bufferedImage = JDeli.read(webpImageFile);

// Write an image
JDeli.write(bufferedImage, "webp", outputStreamOrFile);

2 Replies to “How does CCITT compress image data?”

Jun says:
September 23, 2011 at 12:11 am
Hi Mark,
Question on CCITT Group4 encoding. is this equivalent to MMR encoding, No more no less?
1. Mark Stephens says:
  September 23, 2011 at 6:26 am
  They are essentially the same.

Comments are closed.

How does CCITT compress image data?

How does CCITT compression work?

How does G31D compression work?

Key Terms

Overview of G31D

Encoding and Decoding Process

Advantages and Disadvantages

Advantages

Disadvantages

Do you need to read or write Tiff files in Java?

Are you a Java Developer working with Image files?

What is JDeli?

Why use JDeli?

What licenses are available?

How does JDeli compare?

Apache Commons Imaging Alternative for Java: JDeli

TwelveMonkeys Alternative for Java Image Processing

The Best PDF Inspector Tools for Developers

2 Replies to “How does CCITT compress image data?”

How does CCITT compress image data?

How does CCITT compression work?

How does G31D compression work?

Key Terms

Overview of G31D

Encoding and Decoding Process

Advantages and Disadvantages

Advantages

Disadvantages

Do you need to read or write Tiff files in Java?

Related posts:

Are you a Java Developer working with Image files?

What is JDeli?

Why use JDeli?

What licenses are available?

How does JDeli compare?

Apache Commons Imaging Alternative for Java: JDeli

TwelveMonkeys Alternative for Java Image Processing

The Best PDF Inspector Tools for Developers

2 Replies to “How does CCITT compress image data?”