One of the big enhancements we are adding into our Version 5 release of our PDF library is support for XFA, so I thought it would be very helpful to tell you what XFA is and why it matters.
XFA is an area of the PDF specification which is very poorly supported. I can only think of 2 viewers (three from next month!), which provide proper XFA support. And one of them is Acrobat. The latest version of IText also offers XFA support for editing XFA data. But it is a very important technology because it makes it much easier to create, edit, manipulate Forms.
When Adobe originally created the PDF format, they developed forms support using Acroforms in PDF version 1.2 .This creates custom PDF objects which contain all the Form data. So the data is locked inside the PDF in lots of different places. This makes it very hard to manipulate and use in other tools.
A company called JetForms developed an XML based forms architecture (called XFA), where the data is held in several XML streams (with data and layout being nicely separated out). JetForms was acquired by Adobe who added the technology into PDF version 1.5 and created a tool called LiveCycle to allow you to manipulate the forms. In some ways the PDF file format becomes a wrapper for the far more flexible XFA form format. Being XML makes it much easier to use Forms. XFA is about much more than Forms, because can include all the page description (text, shapes, images) as well as the forms.
Adobe added a ‘legacy’ mode (where the data is essentially duplicated using the old data structures) but most tools cannot handle pure XFA – you usually get a page display which tells you to upgrade to the latest version of Acrobat.
As we convert Forms internally into our own data structures, it will also mean that we will be gaining the ability to convert XFA PDF files into HTML5 (as well as allowing you to access all the XML data). Maybe something for another blog article?