PDF page size in bytes

An interesting question on our forums, made me look at PDF files in a new light. We know how big a PDF file is in bytes, but how big is each page?

To answer this, you need to understand a little about the contents of a PDF file and how a file is constructed. A PDF file is a dump of PDF objects. It consists of the objects themselves and a trailer – metadata so that each object can be found. The objects usually consist of a set of key values and often a compressed binary stream of data. The binary data is usually image data or colour data or the set of instructions used to draw the page. The data is decompressed in memory when the object is read.

A page itself does not have a size – you cannot say it  starts at a certain point and ends at another. What you could say however is that a page consists of a set of objects:-

1. The Page objects which describe the page and contain the binary stream of page instructions used to contruct its contents.

2. The local Resources objects which contains colors, fonts and images used on the page.

3. Global Resource objects (which may be used on any page) and also consist of colors, fonts and images.

4. A proportion of the PDF file metadata.

The last item is probably small enough that it can be reasonably ignored and we can also reasonably ignore the non-binary content of the objects.

So a good guess for a pagesize is the sum of the binary streams which might be used on it. The compressed size probably provides a good guess as to the PDF page size in the PDF file and the uncompressed size might well be an equally good guess at how much an unrendered page (ie not drawn) would use in memory if you needed that.

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Related Posts:

The following two tabs change content below.

Mark Stephens

System Architect and Lead Developer at IDRSolutions
Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.
Markee174

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>