When you access a PDF file across the Internet (using a URL), it can take some time to open the file. This is down to the way a PDF file is designed – it consists of lots of PDF objects (which describe the pages) and a table linking these objects to each page.
This makes it very fast on a file system – the PDF viewer just reads the table (at the end of the PDF file) and loads just the required objects for any page using Random Access. A file system allows you to access any bytes in a file without having to start at the beginning. With a URL stream you cannot do this, you have to read them in order from the start. But an internet connection does not allow for Random Access. And to read the end of the file, you need to download the whole file – you cannot just skip to the end of the stream.
However you can create PDF files so that they store the table and all the objects for the first page at the start of the file. This means that the PDF can be displayed much faster. This is known as Linearized PDF. This mode allows you to view the PDF before it is fully downloaded and access the pages as soon as they are available.
So the answer to the question depends on how the PDF files are made. If they are linearized, you can access them much faster. Otherwise you will have to download the whole file because all the important information is stored at the end of the file.
Do you have any questions about the PDF file format? If you would like us to try and answer them in a blog post, contact us.
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
IDRsolutions develop a Java PDF library, a PDF forms to HTML5 converter, a PDF to HTML5 or SVG converter and a Java Image Library that doubles as an ImageIO replacement. On the blog our team post about anything interesting they learn about.