PDF mystery – what is the correct value for a Text Field

I came across an interesting issue with PDF Text fields while debugging a file this week. We were sent a 2 page document created with IText, containing some text fields and we were displaying both pages with text fields containing identical values – they appear different in Acrobat. Obviously Acrobat is always right (even when it disagrees with the PDF specification) so we dug deeper to see what was going on….

With PDF forms, all form objects can share common Parent objects and they can then inherit values from them. So if a text field does not have a text value, it can inherit its Parent’s value. This is really useful because you can avoid having to repeat common values.

In this PDF, the Text fields on both pages shared the same Parent and because they had no text values, we were inheriting the value from the Parent. So our viewer displayed the same text value on both pages. However, form objects can also have an Appearance Stream which defines the display of the form object. This is what accounts for the different appearance.

So I found out that it is “allowed” to have 2 forms with different Appearance Streams, with a single parent that defined the text value for the field. So they both had the same text value but the appearance was different.

So either the appearance over-rides the text value in read only text fields, or the child value is more important in defining the display of the form. So in this example the appearance streams are more important than the text value of the form object.

Its not an ideal way to work, because any software reading the text value for the form will not get the value which the user sees. For reading text values, the file is essentially broken. But our viewer now displays it as Adobe would (which is all most users care about at the end of the day).

We are working on a way to generate a readable string from the appearance stream, so we can make this file more useful in text extraction, so keep watching this space.

So that is another mystery solved for me, and yet another way to interpret the spec. Have you come across any interesting and mysterious PDF files where things are not as they should be?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.
Chris developed much of the Forms handling code and also the hooks for the XFA.
Chris

About Chris Wade

Chris developed much of the Forms handling code and also the hooks for the XFA.

One thought on “PDF mystery – what is the correct value for a Text Field

  1. Nice Job 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>