Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

PDF puzzlers – when is a return character significant in a stream?

41 sec read

The PDF file format is a very ‘flexible’ file format. You can put returns into the middle of a most objects. There is a certain PDF creation tool which believes lines should never exceed 80 characters and so inserts a breaks if the line is too long…

So when parsing a PDF file you need to be very ‘flexible’. In the following cases, we have escaped octal character sequences and string objects with some returns. Some of them need to be ignored and some are legitimate parts of the data. So in which cases should we use the value and which should we ignore?

I have added (13) so show the exact byte.

Case A

2 0 obj << /Title (\376\377\000U\000m\000l\000a\000u\000t\000e\000:\000\344\000,\000 \000\304\(13)\000,\000 \000\366\000,\000 \000\326\000,\000 \000\374\000,\000 \000\334\(13))

Case B

/V (\376\377\000N\000\260\000 \000i\000d\000e\000n\000t\000i\000f\000i\000a\000n\000t\000 \0009\0009\000\(13)9\0009\0009\0009\0009\000X)

Case C

/V (\376\377\000O\000b\000j\000e\000t\000 \000:\000 \000A\000t\000t\000e\000s\000t\000a\000t\000i\000o\000\(13)
n\000 \000d\000e\000 \000p\000a\000i\000e\000m\000e\000n\000t\000 \000d\000\351\000l\000i\000v\000r\000\(13)
\351\000e\000 \000p\000a\000r\000 \000p\000o\000l\000e\000-\000e\000m\000p\000l\000o\000i\000.\000f\000\(13)
r)

 

Over to you???

IDRsolutions develop a Java PDF Viewer and SDK, an Adobe forms to HTML5 forms converter, a PDF to HTML5 converter and a Java ImageIO replacement. On the blog our team post anything interesting they learn about.

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

© IDRsolutions Ltd 2019. All rights reserved.