Site iconJava PDF Blog

Make your own PDF file. Part 4: Hello World PDF

Back when dinosaurs roamed the earth I talked about the different objects that are used to form a Pdf file.  One type I mentioned were stream objects.  Stream objects are the objects that contain all the instructions describing what a Pdf page is going to look like.  By the end of this article we are going to be able to make a Hello World Pdf.  I’m going to have to make use of a stream object so I can put some text in a Pdf document.

If you open up any old Pdf in a text editor the majority of text you will see will be contained in stream objects.  Its format is slightly different than the other objects:  Its starts with a dictionary. This must have a /Length mapping saying how long the stream is in bytes.  The length of the stream is everything between the keywords stream and endstream (minus the final end-of-file characters if the stream has one).  Normally when you open a Pdf the stuff in the stream is compressed.  You can tell what kind of compression by the /Filter key in the streams main dictionary.  For example

10 0 obj<</Length 40 /Filter /FlateDecode>>
stream

…bunch of compressed stuff…

endstream
endobj

If you went to the trouble of uncompressing this stuff you would find a list of instructions.  The list of  instructions are the commands that create all the content in a Pdf.  Here is the contents of the stream uncompressed:

BT
/F1 24 Tf
175 720 Td
(Hello World!)Tj
ET

BT means Begin Text and ET means End Text.  The stuff in between sets the font, position and what its going to say.  The instructions are Tf, Td and Tj.  Note how the values that these instructions need are written first.  So for the first instruction Tf, it needs a reference to a font (/F1, I’ll come back to that in a bit) and a font size (24).  The Td operator sets the text position.  The first number is the amount of units from the left and the second parameter’s the units from the bottom.  The units are quite interesting.  They are related to a logical representation of a coordinate system that only gets translated to real world coordinates when something has to be rendered to a real life thing, such as a printer or a monitor.  This allows, for example, the size and positioning of text to be consistent on different mediums.  Finally we have the Tj instruction and the characters in the brackets get drawn on the Pdf document.

Before I add that to my Pdf document we have to sort that reference to /F1 out.  In streams you can’t reference objects in the same way you do when outside a stream (ie 10 0 R) you have to map /F1 to a object and make that available to the /Resources dictionary.   This dictionary of resources is associated with a /Contents mapping which points to your Stream object:

3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 7 0 R>>
endobj
4 0 obj<</Font 5 0 R>>
endobj
5 0 obj<</F1 6 0 R>>
endobj
6 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
7 0 obj<</Length 40>>
stream
BT
/F1 24 Tf
…..

So we are making use of a /Page object.  The pages /Contents entry points to a Stream object that prints our text.  The stream needs to know about what object /F1 points to.  Our /Resources dictionary is at 4 0 R and only contains a /Font entry which points to where /F1 is mapped to.  You can see in 5 0 obj that it maps to an object that represents one of the default fonts: Helvetica.  Even though it seems a bit long winded it actually helps towards speeding up a Pdf viewer.  Instead of loading a font you just hang on to the reference, if it doesnt get called (you don look at the page the font is on) you dont have to load the font.

Anyway put it all together with the text from Part 3: DIY Blank Page and you get, possibly, a world first:  How to make a “Hello World” pdf document!

%PDF-2.0
1 0 obj <</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj <</Type /Pages /Kids [3 0 R] /Count 1>>
endobj
3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 6 0 R>>
endobj
4 0 obj<</Font <</F1 5 0 R>>>>
endobj
5 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
6 0 obj
<</Length 44>>
stream
BT /F1 24 Tf 175 720 Td (Hello World!)Tj ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000009 00000 n
0000000056 00000 n
0000000111 00000 n
0000000212 00000 n
0000000250 00000 n
0000000317 00000 n
trailer <</Size 7/Root 1 0 R>>
startxref
406
%%EOF

Next Time: Drawing lines