We have been working with PDF files since 1999 and developed complex software to display PDF files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below.
If you are interested in using our software to display your PDF documents (we can rasterize them, convert them to HTML5 or SVG, or provide a complete Java PDF Viewer) pdf why not setup a call with us and see if we can help?
Here is an overview of the topics covered in this article:
- Quick Tutorials
- Frequently Asked Questions
- The PDF File itself
- Images in PDF
- Color handling in PDF
- Text in PDF
- Fonts in PDF
- PDF Forms, Annotations & Interactive Elements
- PDF Security
- CCITT Encoding in PDF
- Make your own PDF file manually
Quick tutorials showing how to use our software to solve common PDF tasks
Questions developers often ask us
Why can’t I just open and edit a PDF File?
How do I find out the PDF version used?
How big is a PDF Page in bytes?
What does an OCR PDF file contain?
What is PDF Pagesize? CropBox, MediaBox, ArtBox, BleedBox, TrimBox?
How to calculate PDF Page Size in Inches or Centimetres?
Why is my PDF Producer showing in Chinese?
How to Embed PDF files in HTML Web Pages
This section covers the actual file format and how it works
How to view PDF objects
How to read a PDF file
Where do your PDF objects start in a PDF file?
How Text, Shapes and Images work together in a PDF file
What are PDF Object Streams?
Multiple Trailers in a PDF File
What are PDF Xref tables?
Understanding PDF Text Objects
How does a decodeArray work on Images?
What is a PDF Dictionary?
What is a Linearized PDF File?
What are Form XObjects?
How are stacks used in PDF files?
How to identify a PDF File
No Startxref found in last 1024 bytes?
How to Embed your own data in PDF files
Why writing a PDF parser is such a challenging task (Part 234)
Corrupt PDFs? Maybe this is your problem
This section explores image related topics in the PDF File format
How are images stored in a PDF file?
How are images displayed in a PDF file?
What are PDF Image Masks?
How to calculate PDF Image DPI?
How to extract Raw JPEG Images from a PDF File?
How do Filter and DecodeParms Objects change a PDF Image?
Color support inside PDF files is very powerful and complex.
How does Color work in PDF files?
How does image color depth work in PDF files?
What is an Indexed Colorspace in a PDF file?
Why is white a special color in PDF Files?
What are ICCBased Colorspaces?
What is a YCCK colorspace in a PDF file?
How to convert YCCK color to RGB color
How Text is stored, displayed and extracted from a PDF file
How is text stored in a PDF file?
Why is pdf text extraction problematic?
What text format and style information is in a PDF file?
How to find out if a PDF file contains ‘structured content’
What does the ActualText dictionary tag do?
How do PDF Text Coordinates work?
How are carriage returns, spaces and other gaps defined in a PDF file?
PDF Mystery – What is the correct value for a Text Field?
PDF Text extraction – Why can I not extract text from a PDF file?
How are text links defined in a PDF file?
How are Text spaces created in a PDF file?
PDF files can use three different font technologies for display
Introductory PDF font tutorial
Introduction to PDF Font Technologies
How are Embedded CMAP tables defined in a PDF File?
What are CID Fonts?
What are subsetted fonts in PDF files?
Where do PDF viewers get font data for non-embedded fonts?
Glyph Names – What is in a name?
Are your TrueType CMap Tables lying to you?
Embedded Truetype Fonts are always MAC encoded unless they are not
Hercule Poirot solves the mystery of the PDF file and the missing Euro
Problems caused by arial fonts in PDF files
How does TrueType Hinting work?
Why are CID Fonts far more complicated than non-CID Fonts?
PDF files can contain interactive elements with Forms and Annotations
What are PDF Forms?
What are AcroForms?
What are XFA Forms?
How do PDF files add interactive elements?
How do Layers work in a PDF file?
Is it possible to extract flattened form data from a PDF file?
PDF Form Names explained
What is PDF Form Flattening?
PDF files have their own security systems and processes
CCITT is used to store compressed data inside PDF files.
CCITT Encoding in PDF – Converting CCITT data into a TIFF Image
CCITT Encoding in PDF – Black and White Facts
CCITT Encoding in PDF – Rows and Height Gotcha
CCITT Encoding in PDF – Decoding CCITT Data
CCITT Encoding in PDF – G31D CCITT Data Overview
CCITT Encoding in PDF – Decoding G31D CCITT Data
One of our developers bravely set out to write the ‘Hello World’ tutorial of PDF files, creating a PDF file from scratch manually, in a text editor. Follow the series:
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Do you need to solve any of these problems?
|Display PDF documents in a Web app|
|Use PDF Forms in a web browser|
|Convert PDF Documents to an image|
|Work with PDF Documents in Java|