Update: JPDF2HTM5 has been rebranded as BuildVu and JPDFForms has been rebranded as FormVu

LZW decompression – Early Change

Working on a Java PDF library means that we see all sorts of PDF documents that take liberties with the PDF specification, this week however I altered our LZW decompression algorithm to take in to account a parameter in the PDF reference that hasnt been encounted in the 10 odd years of developing JPedal.

We were sent a PDF document to debug that displayed a rare, but depressing sight in our viewer:  a blank page where there should be stuff.  The image that was meant to be shown had a parameter for the LZWDecode filter showing /EarlyChange 0.  The PDF specification says:

(LZWDecode only) An indication of when to increase the code length. If the value of this entry is 0, code length increases are postponed as long as possible. If the value is 1, code length increases occur one code early. This parameter is included because LZW sample code distributed by some vendors increases the code length one code earlier than necessary. Default value: 1.

The meaning of this may appear obvious to you, but it seemed a little vague to me and certainly seemed more complicated than what the eventual solution turned out to be.

LZW compression involves creating codes for sequences of data and replaces the data with the codes, hopefully ending up with less data than you started with.  This form of compression has no idea how many codes it will eventually need so its starts with each code being a certain width (9 bits) and then if it goes beyond its upper capacity it makes the code width one bit longer.

It turns out that because of the way LZW compression might be implemented the point where the code lengths are increased may differ when decompressed.  If the decompression algorithm isn’t aware of this all the data associated with the codes gets out of whack and you end up with a blank page and red writing spewed into the output console.

The solution is simple:  if early change is enabled you read the data associated with the code and then check to see if you should increase the bit width of the code.  If early change is disabled you check to see if you should increase the bit width, increase it and then get the data associated with the code.  I think I would have got the answer quicker if I hadnt read the PDF spec first!

Related Posts:

The following two tabs change content below.

Daniel

Developer at IDR Solutions
When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.
Daniel

About Daniel

When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

4 thoughts on “LZW decompression – Early Change

  1. Josh

    Sounds like my PDF finally hit your blog queue. 🙂

    • Actually the file came from another source but the fix should benefit everyone.

  2. Josh

    🙁

    I was going by the “10 years” comment. 🙂

    • Daniel

      So you were the one that sent in that PDF on a papyrus!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>