In my second article on CCITT encoding I am going to explain exactly how 1D decoding works. Just to make life complicated, this can have several names. I will be referring to the Group 3 One-Dimensional as G31D. This has also been referred to as 1D CCITT in our office (why complicate things ey..?).
A PDF file data stream encoded in this mode is one of the easier CCITT data structures to decode. Firstly here are some keywords that would make it easier to understand how G31D works.
Pixel run- Usually 1-bit, 1 for Black and 0 for White. A block of pixels all the same.
Scan line– The width of data from one end of the page to the other.
Code Words– This contains information regarding what the data does eg makeup or Terminating.
Run Length– Block of either White or Black bits to be decoded/ encoded.
End of line(EOL)- Unique 12-bit code word used to determine the start and end of a scan line.
Return to control(RTC)- Six EOL code words occurring consecutively usually determines the end of the file. EOL & RTC would become more obvious in later blogs.
That is quite a lot of jargon so in my next article I will explain how it all works and how we read all this data. Any questions so far?
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.