In my second article on CCITT encoding I am going to explain exactly how 1D decoding works. Just to make life complicated, this can have several names. I will be referring to the Group 3 One-Dimensional as G31D. This has also been referred to as 1D CCITT in our office (why complicate things ey..?).
A PDF file data stream encoded in this mode is one of the easier CCITT data structures to decode. Firstly here are some keywords that would make it easier to understand how G31D works.
Pixel run- Usually 1-bit, 1 for Black and 0 for White. A block of pixels all the same.
Scan line– The width of data from one end of the page to the other.
Code Words– This contains information regarding what the data does eg makeup or Terminating.
Run Length– Block of either White or Black bits to be decoded/ encoded.
End of line(EOL)- Unique 12-bit code word used to determine the start and end of a scan line.
Return to control(RTC)- Six EOL code words occurring consecutively usually determines the end of the file. EOL & RTC would become more obvious in later blogs.
That is quite a lot of jargon so in my next article I will explain how it all works and how we read all this data. Any questions so far?
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Did you know...
IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?
It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page