Dissecting XML for parsing in JavaScript

XFA_iconRecently I have been working on implementing full script support for XFA in our PDF products. As well as flat content and JavaScript, XFA can contain events which are implemented as part of the XML and which I needed to support in  both Java and HTML5. This required taking apart the XML and implementing in a way I can handle in JavaScript.

The goal of these articles is to provide some possible tips and techniques that you may find useful if you are attempting to write your own XML parser. We needed to support XFA events in HTML5 so I developed an XML parser to provide this in JavaScript.

The actual XML Structure

let us take a simple example in order to start to build the parser from scratch

<?xml version=”1.0″ encoding=”UTF-8?>
<root>
<!– this is comment on defining the class of subforms –>
<subform id=’class1′ name=”class”>
<field name=”student” id=”s112″></field>
<field name=”student” id=”s113″></field>
</subform>
<subform id=’class2 name=”class”>
<field name=”student” id=”k200″></field>
</subform>
<draw><ui></ui></draw>
<script/>
<%template Designer%>
</root>

Now I will examine each section in turn with some suggestions.

1. Processing instructions:
In the above example xml version number and template designer are processing instructions.  They Start with  <% and ends with %>, Either you need to ignore it or you have to delete it to proceed forward

2. Comments: 
Do not consider comments as xml node, and ignore it. comments starts with <!– and ends with –> notation.

3. CData section:
Some xml files may contain CData and Doctype definitions. You can skip it unless you need to do any validation on the files.

4. Empty Nodes:
Some XFA files consist of empty nodes with or without attributes such as script in the above example.

5. white spaces between nodes

6. Handling attributes:
Attributes are separated by spaces and attribute nodename is seprated by “=” sign from its value.

If you are viewing XML in pretty printed format you may end up with whitespaces (tabs, linebreaks and spaces) between two nodes. Use a regular expression to remove it.

Unlike w3c dom parser, ecmascript parser follow object, array related notation to access child elements:

for example: to access second student of class1 subform in root element
1. in w3c dom:
root.getElementsByTagName(“subform”)[0].getElementsByTagName(“field”)[1];
2. in ecmascript:
root.subform[0].field[1];

However the “draw” child of root element should be accessed as root.draw without array notation; So object property has to be defined as array if it has more than one child with the same name otherwise it has to be treated as single object.

In my next article I will provide more details on what I do with this data in JavaScript. See you then…

Learning more about ECMA

You can find more information on events, attributes and methods under LiveCycle® Designer ES Scripting Reference. This reference is divided into three sections which are known as methods, objects and properties. We need to implement properties and methods in our xml parser to support XFA events.

Final Thoughts

The solution above works really nicely in JavaScript running on Webkit and browsers,

For the moment however, I have chosen to go with a Java solution for our Java PDF Viewer because I have found performance issues with my current approach when using Nashorn. I will try to document this (and possible solutions) in a later article.

 

This post is part of our “XFA Articles Index” in these articles, we aim to help you understand XFA.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the Blog RSS.

Ebook Page Link

The following two tabs change content below.

suda

Java EE developer at IDRSolutions
Suda is the Senior Java EE Developer at IDR Solutions, and specialises in XFA, Fonts, True-Type Fonts, application servers and conversions. He is a keen science-fiction fan in his spare time.

Related Posts:

suda

About suda

Suda is the Senior Java EE Developer at IDR Solutions, and specialises in XFA, Fonts, True-Type Fonts, application servers and conversions. He is a keen science-fiction fan in his spare time.

2 thoughts on “Dissecting XML for parsing in JavaScript

  1. tanu

    Nice work well done

  2. Skew

    You made this sound simple. I could follow your article. I think I learned more about XML from here than the last year.Good job Suda

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>