Our first problem is getting the Shape information from a PDF file. You need to know the following to solve this problem.
1. The PDF file format is based on Postscript.
2. It includes a block of code to draw each page which contains THREE types of Shape (F, S, B). F is a Fill shape, S is a Stroke shape and B is both strokes and Fill.
3. Other Postscript Commands setup other values (color, stroke, size, clipping in the GraphicsState) and commands in the Postscript stream affect all subsequent commands so you need to decode the whole stream to get the correct data.
4. Shapes can include lines, rectangles and complex structures.
5. Co-ordinates used are using PDF co-ordinates so they may need changing if you want to use them.
There are 2 ways to solve this problem and you will need to parse the file in each case:-
1. Write out the shapes as the PDF is decoded (or include a callback so that users can track). Some PDF libraries offer this feature or you could hack it into one of the Open Source PDF libraries out there.
2. Turn the PDF into something where this information is extractable from the converted file (for example HTML, SVG, EPS). An image is not suitable because the shapes will be an integrated part of the rendered image.
I personally would use option 1.
How we would solve it with our software
Here is my solution:-
1. Download the JPedal trial jar.
2. Jpedal has several custom interfaces so that users can add callback into their code.
3. The ShapeTracker would seem an exact match for solving this problem.
4. There is a commented-out section in our ConvertPagesToImages. If you copy this code into your IDE, you can try it and adapt to your exact needs. [link]
/** * code to track shapes */org.jpedal.external.ShapeTracker myShapeTracker=new TestShapeTracker(); decode_pdf.addExternalHandler(myShapeTracker, org.jpedal.external.Options.ShapeTracker);
5. Remember we have a support page if you need any help or further details.
Are you a Developer working with PDF files?
Our developers guide contains a large number of technical posts to help you understand the PDF file Format.
Find out more about our software for Developers
|Convert PDF to HTML5 or SVG|
|Convert AcroForms and XFA to HTML5|
|Java PDF SDK for working with PDF files|