A PDF file consists of a ‘dump’ of PDF objects and a reference table defining where they are located in the PDF fille and the root object. This makes the PDF file format very powerful – objects only need to be read as required.
The PDF file format was also desined so that it could be easily updated. Rather than having to rewrite the whole file, it allows you to add new or changed objects onto the end of the file stream and then add a new reference table with the changes. So you might have a hypothetical PDF file containing objects 1,2,3, and 4 with a reference table. Then you edit the PDF file with a new version of object 4 and new object 5 – the updated object 4 is added to the end of the file and then a new table telling the PDF viewer to use this new version of Object 4. The original version of object 4 can still be in the file but is now ignored.
The new references table will contain the changed object location and a /Prev pointer to the previous table. You can chain any number of ref tables. So the way we would read our hypothetical PDF would be to read the first table and note the location of object 4 and 5. We would also note there is a /Prev entry and then go to that table. We would read the location of object 1,2,3 but ignore object 4 because we have alreasy found a newer version. There are no more /Prev table so we would stop there.
The location of the first table is always found at the end of the file so if we are appending data, it is easy to add a new ref pointer at the end.
This is one of the key features which gives the PDF file format its power and flexibility.
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!