Make Your Own PDF – Part 2b: Create your own non-working PDF

This is part of a series on How to make your own PDF files.

In the previous article of this series, we learn to use a text editor to structure and create a PDF file. The only problem with the PDF we are going to make is that it is not going to work. It will however give us an error message we can understand in Acrobat PDF viewer. This is going to form the basis for creating a working PDF file in the posts that follow. The ingredients you require are: a text editor, a hex editor (I’m going to use HxD) and a at least partially functioning human brain. Preferably your own.

We are going to create all the parts I mentioned in the last article in a text editor and figure out the address of the things we put in our file using the HxD. We can also see what error messages we can produce from Acrobat.

Firstly I’m gonna make a new blank file called myPdf.pdf. Just because I can I’m gonna load it in Acrobat to see what it says:

“Adobe Reader could not open ‘myPdf.pdf’ because it is either not a supported file type or because the file has been damaged.”

Hardly surprising, but if you get this message from a supposedly working PDF in the future you can be sure its a bit knackered.

Now I’m adding the header part, which only requires a version number in the form: %PDF-2.0. Next we have the body sections where all the objects go. For this section we’re just have one object: Object number 1 and its going to be a dictionary object (that we are not going to put anything in…yet!).

%PDF-2.0
1 0 obj << >>
endobj

Next we want the Cross Reference Table section. First we need the xref keyword. Then the number of the first object in our list and the amount of objects in our file. So far we have two objects: 1 0 obj that is in our body section and object 0 which is the head of the linked list that I described in Part 2. So we end up with a line with 0 2 on it. The entries that follow have the information about our objects. They all have the same format which is 10 characters, a space, 5 characters, a space and then a letter describing what kind of object it is.

xref
0 2
0000000000 65535 f
0000000010 00000 n

Notice I’ve put 10 as the address of object 1. As each letter is a byte its pretty easy to count up %PDF-2.0 plus return characters, but if you want to check you can open your file in HxD (set the width box to 10 and the number system to decimal to make life easier) and click on the 1 of 1 0 obj to get the starting address of 1 0 obj.

Next you need the final part which is the trailer section. You need a startxref then a trailer dictionary with the size in objects of the file and a reference to the root object:

trailer <</Size 2/Root 1 0 R>>
startxref

Then you need the address from the Cross Reference Table (where the xref keyword starts in bytes) which is 32 on mine. Finish of the file with %%EOF. So you end up with:

%PDF-2.0
1 0 obj << >>
endobj
xref
0 2
0000000000 65535 f
0000000010 00000 n
trailer <</Size 2/Root 1 0 R>>
startxref
33
%%EOF

If you open this in Acrobat you’ll get a different kind of error. If you hold down Ctrl while clicking OK you see another part of the error message: “Expected a dict object.” Which is fair enough as we havent put any values in it.

Next time: DIY Blank Page!