Extraction relates to the extraction of fonts, images, etc from PDF, HTML5, SVG, etc.

Extraction

Why PDF to HTML conversion does not work very…

When people convert PDF files into HTML files, they tend to be disappointed with the results. The main reason for this tends to be...
Mark Stephens
1 min read

Understanding the PDF file format – Text, shapes and…

I have been looking at an issue for a potential client recently which required the generation of different views of the page. This is...
Mark Stephens
1 min read

What text format and style information is in a…

Because PDF is very much an output and display format it does not contain much text formatting information such as paragraph breaks and spaces...
Mark Stephens
39 sec read

Why is pdf text extraction problematic?

PDF text is a subject which causes much confusion. People look at PDF files and they are a fantastic way to present content. If...
Mark Stephens
1 min read