tagged pdf

How to extract text from a PDF as Markdown

Some PDF files can be “tagged” which means they contain information about the structure of the file. This structure is embedded as metadata within...
Jacob Collins
1 min read

How to extract text from PDF files as YAML…

This blog post shows you how to convert a PDF to YAML, by extracting marked content with structure tags. What is structured PDF? Some...
Jacob Collins
47 sec read

How to convert PDF files to ePUB

Not all PDFs are created equally. Some go beyond simple visual layouts and include internal tags that describe the document’s structure. These are known...
Jacob Collins
1 min read

How to extract text from a PDF as JSON

Some PDF files can be “tagged” which means they contain information about the structure of the file. This structure is embedded as metadata within...
Jacob Collins
1 min read