In this article I aim to provide an general introduction to XFA forms, the XML Forms Architecture, and how the data is stored in a PDF file.
So how do you find XFA forms in a PDF?
As with the AcroForms format you will find a tag called ‘AcroForms’ within this tag there are others, one of which we found last time as ‘Fields’ which define the AcroForms, also within the AcroForms tag you may be able to find an ‘XFA’ tag. Don’t worry if you cannot that just means you do not have any XFA forms in that PDF Document.
If you do then you have found the XFA forms inside your PDF, Congratulations that is the easier bit, now have a look what is inside it…
So what are all the parts to an XFA form?
Well it is a set of XML documents:
- preamble (which can be ignored)
- postamble (which again can be ignore)
- config (which is defined to have some permissions in, and may have some in newer versions but we have found that ignoring this really does not affect things much at all)
- template (the most useful Document out of them all, it details everything about the appearance of the XFA fields)
- datasets (which hold various values to the fields defined in the template Document, not all fields will have values defined here)
– there may be other documents defined here but as yet we have not found examples of any usages that alter the XFA forms appearance or use, though I am sure there will be some in the future, with the XFA architecture ever improving and developing.
As you look through the ‘template’ XML Document, you will see objects within object, there are a lot of objects that can be used including PageSets (which define the page dimensions) and then, Buttons (which define a Button field).
The structure that you read the Document in is vital to allocating the field dimensions, values, attributes, actions etc to the correct fields. This structure is the classic object-orientated style, and lends itself the languages like java for working with.
We have lots more articles about both XFA Forms and PDF files in our series of Articles on Understanding the PDF File format.