A PDF file consists of a ‘dump’ of PDF objects and a reference table defining where they are located in the PDF file and the root object. This makes the PDF file format very powerful – objects only need to be read as required.
The PDF file format was also designed so that it could be easily updated. Rather than having to rewrite the whole file, it allows you to add new or changed objects onto the end of the file stream and then add a new reference table with the changes. So you might have a hypothetical PDF file containing objects 1,2,3, and 4 with a reference table. Then you edit the PDF file with a new version of object 4 and new object 5 – the updated object 4 is added to the end of the file and then a new table telling the PDF viewer to use this new version of Object 4. The original version of object 4 can still be in the file but is now ignored.
The new references table will contain the changed object location and a /Prev pointer to the previous table. You can chain any number of ref tables. So the way we would read our hypothetical PDF would be to read the first table and note the location of object 4 and 5. We would also note there is a /Prev entry and then go to that table. We would read the location of object 1,2,3 but ignore object 4 because we have already found a newer version. There are no more /Prev table so we would stop there.
The location of the first table is always found at the end of the file so if we are appending data, it is easy to add a new ref pointer at the end.
This is one of the key features which gives the PDF file format its power and flexibility.
Our software libraries allow you to
Convert PDF files to HTML |
Use PDF Forms in a web browser |
Convert PDF Documents to an image |
Work with PDF Documents in Java |
Read and write HEIC and other Image formats in Java |