Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

Do you need to process or display PDF files?

Find out why you should be using IDRSolutions software

What is PDF pagesize?

1 min read

Have you ever wondered how big the page of a PDF Document is? This is a surprisingly complex question because a PDF file can contain several possible values.

What is a MediaBox?

All PDF files have a MediaBox which defines the size of the PDF in units. Values 0,0,595,842 (listed in the file as an Array [0 0 595 842]) are the normal values for an A4 portrait page, although they can also be any custom settings. The values define a rectangle and can also be converted into inches/centimetres (Adobe Acrobat does this). Values can even be negative – although that is a little unusual.

What is a CropBox?

Sometimes though, a PDF file might also have a CropBox value. This is usually the same size or smaller than the MediaBox. In this case, the CropBox is actually the page size you will see – it is the visible page area. Things can still be drawn on the area of the MediaBox but will not generally be visible.

The are several reasons why this happens. The most likely is that with its origins in the print industry, pages are often created bigger than the final output to allow room for CMYK color boxes, Crop marks, etc.

So always take the CropBox if present – otherwise look at the MediaBox to see the actual page size.

For completeness, you can also have an option ArtBox, BleedBox and TrimBox, but if you do not already know what these are you do not need to and can generally ignore them.

The PDFReference guides from Adobe are usually good examples of a CropBox and MediaBox and the screenshots show the figures converted in Adobe Acrobat and unconverted in JPedal PDF Viewer.

What happens if the CropBox is larger than the MediaBox?

The CropBox is usually the visible page area within a larger MediaBox. It is often used to hide things like printers crop marks and is generally the visible part you see in a PDF viewer. So what happens if the CropBox is larger than the MediaBox? Does the PDF even work?

Here is the  raw Root object from an example PDF file I have been looking at.

3 0 obj<</CropBox[0 0 595.22 842] /Parent 2 0 R
/B[347 0 R] /Contents 4 0 R /Rotate 0 /BleedBox[0 0 595.22 842] /ArtBox[0 0 595.22 842] /MediaBox[56.6929 56.6929 476.22 651.969] /TrimBox[0 0 595.22 842] /Resources<< /Font<> /ProcSet[/PDF/Text/ImageB/ImageC/ImageI] /Properties<> /ExtGState<>>>/Type/Page>>
endobj

As you can see the MediaBox is actually inside the CropBox. Do we:-

1. Use the CropBox value and display a ‘margin’ around the actual page data.

2. Use the smaller MediaBox as the CropBox value.

3. Throw an error.

As usual, our guide is how Acrobat behaves – the correct answer is 2. Did you guess correctly?



Our software libraries allow you to

Convert PDF files to HTML
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Read and write HEIC and other Image formats in Java
Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

How to insert an image into a PDF

Recently, we released JPedal 2023.07 which contains the ability to insert images into PDF files. All you need is a copy of JPedal, a...
Jacob Collins
18 sec read

2 Replies to “What is PDF pagesize?”

  1. It’s [0 0 595 842], and not “0,0,795,842” as you write most incorrectly.
    (note the first “5” !)

Comments are closed.