Text is defined in the PDF file format as a display value (normally what you see onscreen) and an extraction value. It is useful to have 2 options because some characters are displayed differently to what you extract (for example fl is one glyf onscreen but two in extracted text).
But did you know you can have an additional value set to show the actual Text enclosed in a Tj command? If you are using MArked Content, the Dictionary can contain a Dictionary value /ActualText which will be used in place of whatever is shown in the Tj command, which is ignored. So if you want to add something different (or just ensure that the value comes out exactly as you want, it is a useful feature. What would you use it for?
This post is part of our “Understanding the PDF File Format” series. In each article, we aim to take a specific PDF feature and explain it in simple terms. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!
Latest posts by Mark Stephens (see all)
- My 5 key takeaways from JavaOne 2017 - October 6, 2017
- My notes and pictures from thursday JavaOne 2017 - October 5, 2017
- My notes and pictures from Wednesday JavaOne 2017 - October 5, 2017
- My notes and pictures from Tuesday JavaOne 2017 - October 4, 2017
- My notes and pictures from Monday at JavaOne 2017 - October 3, 2017