Text refers to the font/ text and/or is styled with some of the text formatting properties.

Text

4 ways to let go – Knowing how and…

In the interest of improving usability and maintainability of our PDF search code I have recently been creating a single PDF search method to...
Kieran France
2 min read

Understanding the PDF file Format – What are CID…

There are 2 main font technologies used in PDF font files (Postscript/Type1 and Truetype). There is also a ‘merged’ format which borrows features from...
Mark Stephens
1 min read

Problems with using non standard characters from unicode 3.0

Recently I have been looking at an issue for one of our potential clients. The text extraction was not working correctly due to an...
Kieran France
1 min read

Understanding the PDF file Format – PDF Text extraction…

This post was written in response to a request about how PDF text extraction works. If you have a specific PDF question, please feel...
Mark Stephens
1 min read

Text spaces in PDF files

Many PDF files do not actually contain any text spaces. They contain gaps between letters and the software has to guess if there is...
Mark Stephens
1 min read