In this article I aim to provide an general introduction to XFA forms, the XML Forms Architecture, and how the data is stored in a PDF file.
So what is an XFA form?
XFA forms are a (now deprecated) technology introduced into the PDF specification by Adobe. Unlike the original AcroForms, data is stored inside separate XML structures within the PDF file. XFA has now been removed from the PDF file format. So you may need to work with existing XFA documents, but should use AcroForms for new documents and workflows.
So how do you find XFA forms in a PDF?
As with the AcroForms format you will find a tag called ‘AcroForms’ within this tag there are others, one of which we found last time as ‘Fields’ which define the AcroForms, also within the AcroForms tag you may be able to find an ‘XFA’ tag. Don’t worry if you cannot that just means you do not have any XFA forms in that PDF Document.
If you do then you have found the XFA forms inside your PDF, Congratulations that is the easier bit, now have a look what is inside it…
So what are all the parts to an XFA form?
Well it is a set of XML documents:
- preamble (which can be ignored)
- postamble (which again can be ignore)
- config (which is defined to have some permissions in, and may have some in newer versions but we have found that ignoring this really does not affect things much at all)
- template (the most useful Document out of them all, it details everything about the appearance of the XFA fields)
- datasets (which hold various values to the fields defined in the template Document, not all fields will have values defined here)
– there may be other documents defined here but as yet we have not found examples of any usages that alter the XFA forms appearance or use, though I am sure there will be some in the future, with the XFA architecture ever improving and developing.
As you look through the ‘template’ XML Document, you will see objects within object, there are a lot of objects that can be used including PageSets (which define the page dimensions) and then, Buttons (which define a Button field).
The structure that you read the Document in is vital to allocating the field dimensions, values, attributes, actions etc to the correct fields.
Do you need to solve any of these problems?
|Use PDF Forms in the Web Browser|
|Integrate PDF Forms into Web Apps|
|Convert PDF forms to HTML5|