When we create an HTML5 form from a PDF file we have a choice of how we handle the form names. We can either use an autogenerated value, or we can use the name of the Form from the PDF. The second is much more useful because it makes it easy to compare the PDF and the HTML5. There is, however, a slight catch (isn’t there always!).
Characters which are allowed in the PDF form name are not valid in an HTML5 form name (space, ‘.’, etc). So we need to replace these in the HTML version (we use underscore so dodgy name.1 would become dodgy_name_1).This works fine but it stops us being able to easily use the name to identify the PDF form.
But, we can easily add additional attributes to the HTML. So, we add a new value (pdfFileName) which will be ignored by the HTML5 parser, but allow any user to easily identify the original PDF form field used. Here is an example.
<input type="text" tabindex="12" id="Geb-Datum" value="" pdfFieldName="Geb.Datum" /> <input type="text" tabindex="34" id="Ehepartner_AHV-Nr" value="" pdfFieldName="Ehepartner AHV-Nr" />
At the cost of making the HTML5 file slightly bigger we get the ability to identify easily between PDF and HTML form components. Would any other PDF file information be useful inside the HTML?
Click Here to see all the articles in the PDF to HTML5 conversion series.
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.