XFA data exists in a different format to normal PDF data. This article is designed to help you understand how this happens and how you can make sue of this. At the end I will link to a tutorial showing how we allow users to access this data in our commercial Java software library.
What is XFA?
XFA is a way of describing both dynamic form content and static pages using an XML mark-up language. It was invented by a company called JefForm which was acquired by Adobe. The idea was to replace the old FDF form technology with a new XML based format and a new Forms Application (LiveCycle) to edit and create it. It is now heavily used by many government and corporate organisations who have a significant investment in Adobe technology.
Because XFA is a very complex add-in for PDF (and is not in the PDF specification), most tools do not support it. You will see a screen telling you that you need to upgrade to the latest version of Adobe Acrobat. The only tools I am aware of with good XFA support are Adobe’s toolset, Fixit, IText and our JPedal PDF library.
Getting at the XFA data inside the PDF file
XFA is stored as a set of XML streams inside traditional PDF Objects (so they can be encrypted and compressed). So reading it really needs a PDF library which can access and decode PDF objects for you. Unlike Annotations (which are defined on a page level), they are defined at the document level. Each XML stream is in its own PDF object. inside the Acroforms XFA entry.
The layout of the forms and pages (template) is separate from the actual dataset (which can also define several items you might think of as part of the layout such as the number rows). So if you only need to access or edit the form data, you just need to access the dataset.
This is what it looks like in a sample PDF file, using the Acrobat object viewer.
Is the XFA self-contained?
No. It can also access data in other PDF objects (for example image data or Signature details).
Where can I learn more?
If you want to learn more about XFA, there is a wiki post on XFA and we have a whole series of XFA articles. IText software have tutorials and tools on their site. And if you want to see how do we access the PDF data in our PDF library, click here.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.