Unicode is an international encoding standard for use with different languages and scripts, by which each letter, digit, or symbol is assigned a unique numeric value that applies across different platforms and programs.


Problems with using non standard characters from unicode 3.0

Recently I have been looking at an issue for one of our potential clients. The text extraction was not working correctly due to an...
Kieran France
1 min read

Understanding the PDF file Format – PDF Text extraction…

This post was written in response to a request about how PDF text extraction works. If you have a specific PDF question, please feel...
Mark Stephens
1 min read