Chika Okereke Chika is a Java developer. When not experimenting with the new features of Java, he is a keen basketball player (he is the tall guy you might see at Devoxx).

CCITT encoding in PDF files – Decoding CCITT data

58 sec read

As part of the rewrite of our Java CCITT Decoder, I have spent a lot of time with the format. It is quite a complicated topic so I thought some articles would be helpful.

There are actually several types of CCITT. When decoding CCITT encoded PDF files there are three different types of CCITT formats that could be encountered. This is usually differentiated by the K-value in the PDF file. The K-value by default is set to 0 however; it could be greater or less than zero depending on the CCITT format used in encoding the PDF file.

Have a look at this example from a PDF file created with OpenOffice which has a K-value of -1. Notice also that we ignore Rows because it has a value of zero (obviously not possible).

The main types of CCITT format used in encoding PDFs include:

  • Group 3 One-Dimensional (G31D): usually have a K-value of 0.
  • Group 3 Two-Dimensional (G32D): usually have a K-values greater than 0.
  • Group 4 Two-Dimensional (G42D): usually have K-values less than 0.

All these formats have their advantages and work differently in how the data in the PDF stream is decoded. The names of the formats are quite self explanatory in the way that they work except for the G42D which works in a more interesting manner. This would be covered in future articles. Stay tuned!

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Did you know...

IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?

It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page

Chika Okereke Chika is a Java developer. When not experimenting with the new features of Java, he is a keen basketball player (he is the tall guy you might see at Devoxx).

How to read HEIC image files in Java with…

In this article, I will explain how to read HEIC files into Java as a BufferedImage. ImageIO does not read HEIC file types so...
Mark Stephens
1 min read

How to convert WMF files to SVG in java…

This article will show you how to convert WMF files into SVG files using our JDeli Java Image library. What is WMF? WMF is...
Amy Pearson
1 min read

How to write WebP images in Java

In this article, I will walk you through how to write out images as WebP images in Java. ImageIO does not support WebP images...
Mark Stephens
1 min read

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.