Author Archives: Mark Stephens

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Interesting PDF bugs – How wrong can the references be?

I have been looking at a customer PDF file today which highlights how ‘elastic’ the PDF file specification can be….

In theory, all objects are pointed to by a reference so if you got to the byte offset for object 100, you would see

100 0 obj

......

endobj

The data starts with the reference number and generation number. So far so good :-)

I was looking at a file today which gave me this data for object 100

0 endobj

100 0 obj

end

In other words, the offset is set to 8 bytes too early in the stream, so you get the end of the previous object before the correct data for object 100. Most people would regard this as a PDF bug, but it opens in Acrobat (or course it would) which is very forgiving and does lots of error checking.

The real problem is not to correctly fix this file, but to fix the issue without adding code that does not slow down or break all the billions of PDF files out there which work correctly. It is also why you need a very large library of PDF files to regression test any code changes. I have just bought some new i7 PCs with SSD drives to help with our running our continuous testing our collection of PDF files!

After a morning of coding, I now have a code tweak which can handle this (and does not slow down our library in any way), but it is a good example of the issues which you can find in badly created PDF files.

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the  Blog RSS.

Related Posts:

XFA updates

We have had lots of questions on the first official release of XFA. So the current status is that it is live in versions used in development/QA and will be live by default in daily builds next week. If we have no major issues in testing we will do a release on friday 21st June. We will give you a full list of supported and unsupported features at release. It will also be possible to disable the new XFA support.

It is also working well in our PDF to HTML converter so that will also go live on friday next week as well (well, that is the plan)…

If you would like to test next week, we would welcome your feedback from testing daily builds.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the  Blog RSS.

 

 

 

Related Posts:

June edition of Entrepreneur country now available online as HTML5 magazine

We are regular followers of the excellent online Entrepreneur country magazine for TWO reasons. Firstly it uses our PDF to HTML5 converter so we get some really good feedback and ideas from them. It also really nicely showcases our technology (with page turning and the ability to send out social media links). As pure HTML5, it is fully searchable by google, gives you all the analytics and displays of all modern devices.

EC Country June

But we also follow it closely because we are Entrepreneurs and it is a great read for anyone involved in business (or thinking of diving in).  Topics this month include big data, crowd-sourcing and how to retain talent. On page 9, there is a great interview with David Courtier-Dutton and there is a pitch for investors from Seek and Adore on page 34. There is even advice on how large companies can retain the innovation of Entrepreneurs.

EC Country June 2

You can read the magazine online for free at http://www.entrepreneurcountry.com/ and there are lots of useful resources on the main site as well. It is well worth reading, which you can now do on PC, tablet and IPhone…..

And if you have a magazine which deserves to get the HTML5, treatment, maybe you should have a look at our online converter (which is currently free to use).

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the  Blog RSS.

Related Posts:

Customising your keyboard shortcuts in NetBeans IDE

One of my favourite features of NetBeans IDE is the ease with which you can customise the keyboard shortcuts. This feature is accessed from the NetBeansPreferences window.

Netbeans preferences

Firstly, you can choose not only the standard NetBeans shortcuts but also Eclipse, IDE and Emacs. This is very useful if you started with a different IDE and do not want to have to relearn all the settings.

The search boxes allow you to filter all the commands from a keyword or by typing in a keyboard shortcut (I want to check what is assigned to ctrl+f).

NetBeans search filter

 

NetBeans shortcut filter

 

 

All the columns are editable so you can type in a box to remove a keyboard shortcut. Or press your selection of keys and the values will be inserted.

edit netbeans

 

Once you click on apply, the changes will be saved and NetBeans will recognise your new settings. You will see the new keyboard shortcuts on any menu.

All this makes it very easy to configure and setup NetBeans exactly as you want. My keyboard combination is the IDEA IDE set (which is what I learnt) with the format shortcut redefined to Indent. What combination do you use?

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the  Blog RSS.

Related Posts:

PDF puzzlers – when is a return character significant in a stream?

The PDF file format is a very ‘flexible’ file format. You can put returns into the middle of a most objects. There is a certain PDF creation tool which believes lines should never exceed 80 characters and so inserts a breaks if the line is too long…

So when parsing a PDF file you need to be very ‘flexible’. In the following cases, we have escaped octal character sequences and string objects with some returns. Some of them need to be ignored and some are legitimate parts of the data. So in which cases should we use the value and which should we ignore?

I have added (13) so show the exact byte.

Case A

2 0 obj << /Title (\376\377\000U\000m\000l\000a\000u\000t\000e\000:\000\344\000,\000 \000\304\(13)\000,\000 \000\366\000,\000 \000\326\000,\000 \000\374\000,\000 \000\334\(13))

Case B

/V (\376\377\000N\000\260\000 \000i\000d\000e\000n\000t\000i\000f\000i\000a\000n\000t\000 \0009\0009\000\(13)9\0009\0009\0009\0009\000X)

Case C

/V (\376\377\000O\000b\000j\000e\000t\000 \000:\000 \000A\000t\000t\000e\000s\000t\000a\000t\000i\000o\000\(13)
n\000 \000d\000e\000 \000p\000a\000i\000e\000m\000e\000n\000t\000 \000d\000\351\000l\000i\000v\000r\000\(13)
\351\000e\000 \000p\000a\000r\000 \000p\000o\000l\000e\000-\000e\000m\000p\000l\000o\000i\000.\000f\000\(13)
r)

 

Over to you???

Related Posts: