We have been working with PDF files since 1999 and developed complex software to display PDF files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below.
If you are interested in using our software to display your PDF documents (we can rasterize them, convert them to HTML5 or SVG, or provide a complete Java PDF Viewer) why not setup a call with us and see if we can help?
Here is an overview of the topics covered in this article:
- Quick Tutorials
- Frequently Asked Questions
- The PDF File itself
- Images in PDF
- Color handling in PDF
- Text in PDF
- Fonts in PDF
- PDF Forms, Annotations & Interactive Elements
- PDF Security
- PDF Bugs
- CCITT Encoding in PDF
- Make your own PDF file manually
Quick tutorials showing how to use our software to solve common PDF tasks
Questions developers often ask us
Why can’t I just open and edit a PDF File?
How do I find out the PDF version used?
How big is a PDF Page in bytes?
What does an OCR PDF file contain?
What is PDF Pagesize? CropBox, MediaBox, ArtBox, BleedBox, TrimBox?
How to calculate PDF Page Size in Inches or Centimetres?
Why is my PDF Producer showing in Chinese?
This section covers the actual file format and how it works
How to view PDF objects
Where do your PDF objects start in a PDF file?
How Text, Shapes and Images work together in a PDF file
What are PDF Object Streams?
Multiple Trailers in a PDF File
What are PDF Xref tables?
Understanding PDF Text Objects
How does a decodeArray work on Images?
How are images stored in a PDF file?
What is a PDF Dictionary?
Linearized PDF Files
2 Problems with Corrupt PDF Data Streams
How can a PDF file be broken?
How do stacks work in PDF files
Identifying a PDF File from its first line
No Startxref found in last 1024 bytes?
Embedding your own data in PDF Files
Intriguing PDF xref Issue
Strange PDF File of the Week
Why writing a PDF parser is such a challenging task (Part 234)
Corrupt PDFs? Maybe this is your problem
Images can be stored in PDF files in several ways
Images – An Overview
3 Examples of unusual ways to use PDF Image Masks
3 Types of Image Mask
PDF Image DPI
Advantages of JBIG2 compression in PDF explained
There are several version of each image inside your PDF file
Do you need an image that big in your PDF file?
Small Images can cause big problems in PDF Files
A suggestion to the Prawn development team on making smaller PDF files
Making sure image names are unique in PDF files
Large images in a PDF File
Extract Raw JPEG Images from a PDF File
Filter and DecodeParms Objects for a PDF Image
Color support inside PDF files is very powerful and complex.
Color – An Overview
PDF Image Color Depth
The Color White in PDF Files
YCCK Color Conversion in PDF Files
CMYK does not always mean CMYK
Fine Tuning PDF Image Color with ICC Profiles
Convert PDF to Grayscale or Black and White
How Text is stored, displayed and extracted from a PDF file
PDF Text – An Overview
Does a PDF file contain any format and style information for text?
PDF Text Co-ordinates
Carriage returns, spaces and other gaps
PDF Mystery – What is the correct value for a Text Field
PDF Text Extraction with Java
The easy way to discover if a PDF File contains structured content
Why can I not extract text from this GhostScript generated PDF file?
Why can’t I extract text from this PDF file?
Extracting Text References from a PDF File
Extracting Structured Text from PDF Files
Space is a special character
Text Spaces in PDF Files
Space: The Final Frontier… in PDF
PDF files can use three different font technologies for display
PDF Fonts – An Overview
Introduction to PDF Font Technologies
Embedded CMAP Tables
What are CID Fonts?
Custom Font Encodings
Are there really 3 types of fonts in PDF files?
Standard Font Information
Glyph Names – What is in a name?
TrueType Font Hinting
Why the TrueType Hinting Patent Expiration Matters
Be careful with your PDF Fonts
Are your TrueType CMap Tables lying to you?
Mystery of the PDF file and the missing euro character
Problems caused by arial fonts in PDF files
Differences in the PDF Differences Tables
TrueType Hinting – Big Screens for Small Details
Why are CID Fonts far more complicated than non-CID Fonts?
Embedded PDF Truetype Fonts are always MAC encoded unless they are not
PDF with odd Type3 Fonts in Ghostscript 8.50
PDF files can contain interactive elements
Introduction to PDF Forms
Introduction to AcroForms
Introduction to XFA Forms
Layers in PDFs
Extracting Flattened Form Data from a PDF File
The Mystery Behind PDF Form Names
What is PDF Form Flattening?
What are PDF readonly text fields?
Not all forms are PDF forms
PDF files have their own security systems and processes
PDF Security (Passwords and Certificates)
Brief Overview of Security Features offered by the PDF file format
PDF Password Protection
Protecting PDF Content
Why do I need the PDF password to open the PDF file?
Creating your own test certificates and keys for signing PDF files
Here we write-up some of the more intriguing bugs we have investigated in PDF files.
An Extreme Case of Recursion
Using SMask and Image ‘the opposite way’ round
Zero Bytes in a String
X Marks the spot (or not)
ICC Colorspace Alt Setting
Simulating an SMask with Vector Graphics
Mixed up Font Object
PDF Text is really a tiny image with a big SMask
Tiny Dash Values and the Java JVM
Values out of Range
Missing Image Data
Missing Image Data 2
Dealing with 3 Types of Fonts
Pointless Font Inclusion
Odd text rendering issue in Acrobat on Mac
Phantom PDF Objects
CCITT is used to store compressed data inside PDF files.
CCITT Encoding in PDF – Converting CCITT data into a TIFF Image
CCITT Encoding in PDF – Black and White Facts
CCITT Encoding in PDF – Rows and Height Gotcha
CCITT Encoding in PDF – Decoding CCITT Data
CCITT Encoding in PDF – G31D CCITT Data Overview
CCITT Encoding in PDF – Decoding G31D CCITT Data
One of our developers bravely set out to write the ‘Hello World’ tutorial of PDF files, creating a PDF file from scratch manually, in a text editor. Follow the series:
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|Convert PDF to HTML5 or SVG|
|Convert AcroForms and XFA to HTML5|
|Java PDF SDK for working with PDF files|