The actual XML Structure
let us take a simple example in order to start to build the parser from scratch
<?xml version=”1.0″ encoding=”UTF-8“?>
<!– this is comment on defining the class of subforms –>
<subform id=’class1′ name=”class”>
<field name=”student” id=”s112″></field>
<field name=”student” id=”s113″></field>
<subform id=’class2 name=”class”>
<field name=”student” id=”k200″></field>
Now I will examine each section in turn with some suggestions.
1. Processing instructions:
In the above example xml version number and template designer are processing instructions. They Start with <% and ends with %>, Either you need to ignore it or you have to delete it to proceed forward
Do not consider comments as xml node, and ignore it. comments starts with <!– and ends with –> notation.
3. CData section:
Some xml files may contain CData and Doctype definitions. You can skip it unless you need to do any validation on the files.
4. Empty Nodes:
Some XFA files consist of empty nodes with or without attributes such as script in the above example.
5. white spaces between nodes
6. Handling attributes:
Attributes are separated by spaces and attribute nodename is seprated by “=” sign from its value.
If you are viewing XML in pretty printed format you may end up with whitespaces (tabs, linebreaks and spaces) between two nodes. Use a regular expression to remove it.
Unlike w3c dom parser, ecmascript parser follow object, array related notation to access child elements:
for example: to access second student of class1 subform in root element
1. in w3c dom:
2. in ecmascript:
However the “draw” child of root element should be accessed as root.draw without array notation; So object property has to be defined as array if it has more than one child with the same name otherwise it has to be treated as single object.
Learning more about ECMA
You can find more information on events, attributes and methods under LiveCycle® Designer ES Scripting Reference. This reference is divided into three sections which are known as methods, objects and properties. We need to implement properties and methods in our xml parser to support XFA events.
For the moment however, I have chosen to go with a Java solution for our Java PDF Viewer because I have found performance issues with my current approach when using Nashorn. I will try to document this (and possible solutions) in a later article.
This post is part of our “XFA Articles Index” in these articles, we aim to help you understand XFA.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.