Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

CCITT encoding in PDF files – rows and height gotcha

44 sec read

While rewriting our CCITT decoder (almost finished!) we came across an interesting item to take care of. When reading the values of a CCITT encoded data stream, you need to be careful with the values you read in. In general, the CCITT dictionary can contain a /Rows value which is actually the height of the image data.

Have a look at this example from a PDF file created with OpenOffice.

 0 obj
BitsPerComponent 1
/ColorSpace/DeviceGray
/Filter[/CCITTFaxDecode]
/DecodeParms[<>]
/Length 8 0 R
>>

In this there is a /Rows value of zero (which is obviously impossible). So I double-checked the PDF File specification and a value of zero is allowed (in fact it is the default). But it is ignored and you use the height value instead. So why does OpenOffice write out a default value which is ignored (making the file larger)? And why does the PDF file specification allow a value it then ignores in the first place?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *