Site iconJava PDF Blog

What are Hyperlinks in PDF files?

jpedal

Hyperlinks are external, cliackable links which appear on web pages and other documents and allow you to go web pages or download files. PDF files can include hyperlinks and they are stored as an Annotation. Here is 2 examples of the raw data from inside a PDF file

12 0 obj<</Subtype/Link/Rect[ 205.39 637.21 320.54 651.01] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://www.yahoo.com/) >>>>endobj

13 0 obj<</Subtype/Link/Rect[ 201.4 609.61 304.55 623.41] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://www.cnn.com/) >>>>endobj

First of all, the Link is identified by its subtype (/Link). The Rect defines the area of the page which it applies to (using standard PDF co-ordinates). Clicking on this box will cause the link to open.

Because Hyperlinks are Annotations, it is relatively easy to edit or delete them using a Tool like IText. You just need to know the object reference numbers. So here is some code to extract the PDFobject details from a PDF file.

How to view Hyperlink Data in JPedal

PdfDecoder decodePdf = new PdfDecoder(false);

decodePdf.openPdfFile(file_name);

/**
* form code here
*/

//new list we can parse
for(int ii=1;ii<decodePdf.getPageCount()+1;ii++){
PdfArrayIterator annotListForPage = decodePdf.getFormRenderer().getAnnotsOnPage(ii);

if(annotListForPage!=null && annotListForPage.getTokenCount()>0){ //can have empty lists

while(annotListForPage.hasMoreTokens()){

//get ID of annot which has already been decoded and get actual object
String annotKey=annotListForPage.getNextValueAsString(true);

Object rawObj=decodePdf.getFormRenderer().getCompData().getRawForm(annotKey);
if(rawObj==null){
//no match found
System.out.println(“no match on “+annotKey);
}else{

//each PDF annot object – extract data from it
FormObject annotObj=(FormObject)rawObj;

int subtype=annotObj.getParameterConstant(PdfDictionary.Subtype);

if(subtype==PdfDictionary.Link){
System.out.println(“link object is “+annotKey);
float[] coords=annotObj.getFloatArray(PdfDictionary.Rect);
System.out.println(“Rect= “+coords[0]+” “+coords[1]+” “+coords[2]+” “+coords[3]);

//text in A subobject
PdfObject aData=annotObj.getDictionary(PdfDictionary.A);
if(aData!=null && aData.getNameAsConstant(PdfDictionary.S)==PdfDictionary.URI){
String text=aData.getTextStreamValue(PdfDictionary.URI); //+”ZZ”; deliberately broken first to test checking
System.out.println(“text=”+text);
}
}

}
}
}
}

}