Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Hyperlinks in PDF files (and how to edit them)

1 min read

Hyperlinks are external, cliackable links which appear on web pages and other documents and allow you to go web pages or download files. PDF files can include hyperlinks and they are stored as an Annotation. Here is 2 examples of the raw data from inside a PDF file

12 0 obj<</Subtype/Link/Rect[ 205.39 637.21 320.54 651.01] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://www.yahoo.com/) >>>>endobj

13 0 obj<</Subtype/Link/Rect[ 201.4 609.61 304.55 623.41] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://www.cnn.com/) >>>>endobj

First of all, the Link is identified by its subtype (/Link). The Rect defines the area of the page which it applies to (using standard PDF co-ordinates). Clicking on this box will cause the link to open.

Because Hyperlinks are Annotations, it is relatively easy to edit or delete them using a Tool like IText. You just need to know the object reference numbers. So here is some code to extract the PDFobject details from a PDF file.

PdfDecoder decodePdf = new PdfDecoder(false);

decodePdf.openPdfFile(file_name);

/**
* form code here
*/

//new list we can parse
for(int ii=1;ii<decodePdf.getPageCount()+1;ii++){
PdfArrayIterator annotListForPage = decodePdf.getFormRenderer().getAnnotsOnPage(ii);

if(annotListForPage!=null && annotListForPage.getTokenCount()>0){ //can have empty lists

while(annotListForPage.hasMoreTokens()){

//get ID of annot which has already been decoded and get actual object
String annotKey=annotListForPage.getNextValueAsString(true);

Object rawObj=decodePdf.getFormRenderer().getCompData().getRawForm(annotKey);
if(rawObj==null){
//no match found
System.out.println(“no match on “+annotKey);
}else{

//each PDF annot object – extract data from it
FormObject annotObj=(FormObject)rawObj;

int subtype=annotObj.getParameterConstant(PdfDictionary.Subtype);

if(subtype==PdfDictionary.Link){
System.out.println(“link object is “+annotKey);
float[] coords=annotObj.getFloatArray(PdfDictionary.Rect);
System.out.println(“Rect= “+coords[0]+” “+coords[1]+” “+coords[2]+” “+coords[3]);

//text in A subobject
PdfObject aData=annotObj.getDictionary(PdfDictionary.A);
if(aData!=null && aData.getNameAsConstant(PdfDictionary.S)==PdfDictionary.URI){
String text=aData.getTextStreamValue(PdfDictionary.URI); //+”ZZ”; deliberately broken first to test checking
System.out.println(“text=”+text);
}
}

}
}
}
}

}

Mark Stephens Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

What Chrome 45 dropping NPAPI Plug-in support means

With the recent news that Google has now killed off NPAPI plugins in Chrome 45, it has left many wondering exactly what NPAPI plugins...
Leon Atherton
3 min read

PDF XFA Data Binding and Data Access

In this article I demonstrates some tips and techniques that can be followed while trying to map PDF XFA data to a XFA template....
suda
2 min read

2 Replies to “Hyperlinks in PDF files (and how to edit them)”

  1. Hi Mark,

    I have some issue in URI action included in pdf file to be redirected to any website. The URI added will be opened as soon as we open the pdf. When I observe the pdf in a pdf exposure tool, I see below formation of URI action.
    “””
    <>
    “””
    My first question is that, when I open this pdf in Chrome/any browser, this URI is opened in the same tab. So is there a way that the URI opens in new tab? How can that be done?

    Second question is, when we open pdf files containing URI action in pdf viewer tools, it gives security alert if we want to allow this website or we want to block it. I don’t want this security alert. How can we do this? [Can we grammatically add the external website in the trusted list so that it doesn’t ask, however I am working in Python? Or any other way?]

    Thanks in advance. Can you suggest something asap?

Leave a Reply

Your email address will not be published. Required fields are marked *