Portable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems.

PDF

Table order in OTF fonts

As part of our TrueType to OpenType font conversion (we need this for PDF to HTML5 conversion to ensure fonts display on all browsers),...
Mark Stephens
49 sec read

How to extract Structured text from PDF files in…

TL;DR: PDFs use complex binary/compressed data that standard text editors can’t read. To inspect the internal structure, use JPedal (for debugging content streams), RUPS...
Mark Stephens
2 min read

How are Embedded CMAP tables defined in a PDF…

Every glyf inside a PDF file can have a display value and a different extraction value. This is useful because often you need to...
Mark Stephens
2 min read

How does a decodeArray work?

When you create an image in a PDF file it is possible to specify that it is inverted or control the range of values....
Mark Stephens
31 sec read

What does the ActualText dictionary tag do?

Text is defined in the PDF file format as a display value (normally what you see onscreen) and an extraction value. It is useful...
Mark Stephens
29 sec read

PDF to HTML conversion – matching PDF page size

A PDF file are designed to be resolution independent – they are defined using resolution independent units so that the page will always appear...
Mark Stephens
52 sec read