Unicode is an international encoding standard for use with different languages and scripts, by which each letter, digit, or symbol is assigned a unique numeric value that applies across different platforms and programs.


Problems with using non standard characters from unicode 3.0

Recently I have been looking at an issue for one of our potential clients. The text extraction was not working correctly due to an...
Kieran France
1 min read

How is text stored in a PDF file?

Text is defined in PDF files by a Font object and a set of TJ commands. So you will see something like this in...
Mark Stephens
1 min read