Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

PDF puzzlers – when is a return character significant in a stream?

41 sec read

The PDF file format is a very ‘flexible’ file format. You can put returns into the middle of a most objects. There is a certain PDF creation tool which believes lines should never exceed 80 characters and so inserts a breaks if the line is too long…

So when parsing a PDF file you need to be very ‘flexible’. In the following cases, we have escaped octal character sequences and string objects with some returns. Some of them need to be ignored and some are legitimate parts of the data. So in which cases should we use the value and which should we ignore?

I have added (13) so show the exact byte.

Case A

2 0 obj << /Title (\376\377\000U\000m\000l\000a\000u\000t\000e\000:\000\344\000,\000 \000\304\(13)\000,\000 \000\366\000,\000 \000\326\000,\000 \000\374\000,\000 \000\334\(13))

Case B

/V (\376\377\000N\000\260\000 \000i\000d\000e\000n\000t\000i\000f\000i\000a\000n\000t\000 \0009\0009\000\(13)9\0009\0009\0009\0009\000X)

Case C

/V (\376\377\000O\000b\000j\000e\000t\000 \000:\000 \000A\000t\000t\000e\000s\000t\000a\000t\000i\000o\000\(13)
n\000 \000d\000e\000 \000p\000a\000i\000e\000m\000e\000n\000t\000 \000d\000\351\000l\000i\000v\000r\000\(13)
\351\000e\000 \000p\000a\000r\000 \000p\000o\000l\000e\000-\000e\000m\000p\000l\000o\000i\000.\000f\000\(13)
r)

 

Over to you???

Did you know...

IDRsolutions offers a whole range of online file converters to convert PDF and Microsoft Excel, Word and Office Documents to HTML5, SVG or image formats?

It is free to use for single file conversions and also includes Developer links if you want to use our commercial software for bulk conversions. Find out more on this page

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Enabling SVG Gzip Compression on Apache and NGINX

Gzip compression is a widely supported method of reducing the size of the content sent from a web server in order to improve the...
Leon Atherton
47 sec read

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2020. All rights reserved.