Daniel When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

Do you need to process or display PDF files?

Find out why you should be using IDRSolutions software

Make your own PDF file. Part 4: Hello World PDF

2 min read

Back when dinosaurs roamed the earth I talked about the different objects that are used to form a Pdf file.  One type I mentioned were stream objects.  Stream objects are the objects that contain all the instructions describing what a Pdf page is going to look like.  By the end of this article we are going to be able to make a Hello World Pdf.  I’m going to have to make use of a stream object so I can put some text in a Pdf document.

If you open up any old Pdf in a text editor the majority of text you will see will be contained in stream objects.  Its format is slightly different than the other objects:  Its starts with a dictionary. This must have a /Length mapping saying how long the stream is in bytes.  The length of the stream is everything between the keywords stream and endstream (minus the final end-of-file characters if the stream has one).  Normally when you open a Pdf the stuff in the stream is compressed.  You can tell what kind of compression by the /Filter key in the streams main dictionary.  For example

10 0 obj<</Length 40 /Filter /FlateDecode>>
stream

…bunch of compressed stuff…

endstream
endobj

If you went to the trouble of uncompressing this stuff you would find a list of instructions.  The list of  instructions are the commands that create all the content in a Pdf.  Here is the contents of the stream uncompressed:

BT
/F1 24 Tf
175 720 Td
(Hello World!)Tj
ET

BT means Begin Text and ET means End Text.  The stuff in between sets the font, position and what its going to say.  The instructions are Tf, Td and Tj.  Note how the values that these instructions need are written first.  So for the first instruction Tf, it needs a reference to a font (/F1, I’ll come back to that in a bit) and a font size (24).  The Td operator sets the text position.  The first number is the amount of units from the left and the second parameter’s the units from the bottom.  The units are quite interesting.  They are related to a logical representation of a coordinate system that only gets translated to real world coordinates when something has to be rendered to a real life thing, such as a printer or a monitor.  This allows, for example, the size and positioning of text to be consistent on different mediums.  Finally we have the Tj instruction and the characters in the brackets get drawn on the Pdf document.

Before I add that to my Pdf document we have to sort that reference to /F1 out.  In streams you can’t reference objects in the same way you do when outside a stream (ie 10 0 R) you have to map /F1 to a object and make that available to the /Resources dictionary.   This dictionary of resources is associated with a /Contents mapping which points to your Stream object:

3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 7 0 R>>
endobj
4 0 obj<</Font 5 0 R>>
endobj
5 0 obj<</F1 6 0 R>>
endobj
6 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
7 0 obj<</Length 40>>
stream
BT
/F1 24 Tf
…..

So we are making use of a /Page object.  The pages /Contents entry points to a Stream object that prints our text.  The stream needs to know about what object /F1 points to.  Our /Resources dictionary is at 4 0 R and only contains a /Font entry which points to where /F1 is mapped to.  You can see in 5 0 obj that it maps to an object that represents one of the default fonts: Helvetica.  Even though it seems a bit long winded it actually helps towards speeding up a Pdf viewer.  Instead of loading a font you just hang on to the reference, if it doesnt get called (you don look at the page the font is on) you dont have to load the font.

Anyway put it all together with the text from Part 3: DIY Blank Page and you get, possibly, a world first:  How to make a “Hello World” pdf document!

%PDF-2.0
1 0 obj <</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj <</Type /Pages /Kids [3 0 R] /Count 1>>
endobj
3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 6 0 R>>
endobj
4 0 obj<</Font <</F1 5 0 R>>>>
endobj
5 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
6 0 obj
<</Length 44>>
stream
BT /F1 24 Tf 175 720 Td (Hello World!)Tj ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000009 00000 n
0000000056 00000 n
0000000111 00000 n
0000000212 00000 n
0000000250 00000 n
0000000317 00000 n
trailer <</Size 7/Root 1 0 R>>
startxref
406
%%EOF

Next Time: Drawing lines



Our software libraries allow you to

Convert PDF files to HTML
Use PDF Forms in a web browser
Convert PDF Documents to an image
Work with PDF Documents in Java
Read and write HEIC and other Image formats in Java
Daniel When not delving into obscure PDF or Java bugs, Daniel is exploring the new features in JavaFX.

How to insert an image into a PDF

Recently, we released JPedal 2023.07 which contains the ability to insert images into PDF files. All you need is a copy of JPedal, a...
Jacob Collins
18 sec read

7 Replies to “Make your own PDF file. Part 4: Hello World…”

  1. the values of xref table are incorrect the are:
    %PDF-1.4
    1 0 obj <>
    endobj
    2 0 obj <>
    endobj
    3 0 obj<>
    endobj
    4 0 obj<</Font <>>>
    endobj
    5 0 obj<>
    endobj
    6 0 obj
    <>
    stream
    BT /F1 24 Tf 175 720 Td (Hello World!)Tj ET
    endstream
    endobj
    xref
    0 7
    0000000000 65535 f
    0000000010 00000 n
    0000000059 00000 n
    0000000116 00000 n
    0000000219 00000 n
    0000000259 00000 n
    0000000328 00000 n
    trailer <>
    startxref
    425
    %%EOF

  2. Thank you for this!

    I am going through the PDF Specification, however it does not have anything like paragraphs, tables etc. Do we need to create these manually?

    If I need to create a paragraph I need to calculate the width of the words and when it is more than the page with I need to move a few pixels down and print the next words?

    Similarly for tables do I need to draw lines to create the table?

    1. There are no specific structures for these. Tools like IText allow you create such structures in PDF and you can also define them for extraction using Marked Content. But you have to do all the work here.

    1. We would really recommend using Itext for document creation. As the article hopefully makes clear, creating PDF files manually is not something for the faint hearted.

Comments are closed.