Recently I have been working on implementing full script support for XFA in our PDF products. As well as flat content and JavaScript, XFA can contain events which are implemented as part of the XML and which I needed to support in both Java and HTML5. This required taking apart the XML and implementing in a way I can handle in JavaScript.
The goal of these articles is to provide some possible tips and techniques that you may find useful if you are attempting to write your own XML parser. We needed to support XFA events in HTML5 so I developed an XML parser to provide this in JavaScript.
The actual XML Structure
let us take a simple example in order to start to build the parser from scratch
<?xml version=”1.0″ encoding=”UTF-8“?>
<root>
<!– this is comment on defining the class of subforms –>
<subform id=’class1′ name=”class”>
<field name=”student” id=”s112″></field>
<field name=”student” id=”s113″></field>
</subform>
<subform id=’class2 name=”class”>
<field name=”student” id=”k200″></field>
</subform>
<draw><ui></ui></draw>
<script/>
<%template Designer%>
</root>
Now I will examine each section in turn with some suggestions.
1. Processing instructions:
In the above example xml version number and template designer are processing instructions. They Start with <% and ends with %>, Either you need to ignore it or you have to delete it to proceed forward
2. Comments:
Do not consider comments as xml node, and ignore it. comments starts with <!– and ends with –> notation.
3. CData section:
Some xml files may contain CData and Doctype definitions. You can skip it unless you need to do any validation on the files.
4. Empty Nodes:
Some XFA files consist of empty nodes with or without attributes such as script in the above example.
5. white spaces between nodes
6. Handling attributes:
Attributes are separated by spaces and attribute nodename is seprated by “=” sign from its value.
If you are viewing XML in pretty printed format you may end up with whitespaces (tabs, linebreaks and spaces) between two nodes. Use a regular expression to remove it.
Unlike w3c dom parser, ecmascript parser follow object, array related notation to access child elements:
for example: to access second student of class1 subform in root element
1. in w3c dom:
root.getElementsByTagName(“subform”)[0].getElementsByTagName(“field”)[1];
2. in ecmascript:
root.subform[0].field[1];
However the “draw” child of root element should be accessed as root.draw without array notation; So object property has to be defined as array if it has more than one child with the same name otherwise it has to be treated as single object.
In my next article I will provide more details on what I do with this data in JavaScript. See you then…
Learning more about ECMA
You can find more information on events, attributes and methods under LiveCycle® Designer ES Scripting Reference. This reference is divided into three sections which are known as methods, objects and properties. We need to implement properties and methods in our xml parser to support XFA events.
Final Thoughts
The solution above works really nicely in JavaScript running on Webkit and browsers,
For the moment however, I have chosen to go with a Java solution for our Java PDF Viewer because I have found performance issues with my current approach when using Nashorn. I will try to document this (and possible solutions) in a later article.
This post is part of our “XFA Articles Index” in these articles, we aim to help you understand XFA.
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Do you need to solve any of these problems?
Display PDF documents in a Web app |
Use PDF Forms in a web browser |
Convert PDF Documents to an image |
Work with PDF Documents in Java |
Nice work well done
You made this sound simple. I could follow your article. I think I learned more about XML from here than the last year.Good job Suda