Hyperlinks are external, cliackable links which appear on web pages and other documents and allow you to go web pages or download files. PDF files can include hyperlinks and they are stored as an Annotation. Here is 2 examples of the raw data from inside a PDF file
12 0 obj<</Subtype/Link/Rect[ 205.39 637.21 320.54 651.01] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://www.yahoo.com/) >>>>endobj
13 0 obj<</Subtype/Link/Rect[ 201.4 609.61 304.55 623.41] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://www.cnn.com/) >>>>endobj
First of all, the Link is identified by its subtype (/Link). The Rect defines the area of the page which it applies to (using standard PDF co-ordinates). Clicking on this box will cause the link to open.
Because Hyperlinks are Annotations, it is relatively easy to edit or delete them using a Tool like IText. You just need to know the object reference numbers. So here is some code to extract the PDFobject details from a PDF file.
How to view Hyperlink Data in JPedal
PdfDecoder decodePdf = new PdfDecoder(false);
decodePdf.openPdfFile(file_name);
/**
* form code here
*/
//new list we can parse
for(int ii=1;ii<decodePdf.getPageCount()+1;ii++){
PdfArrayIterator annotListForPage = decodePdf.getFormRenderer().getAnnotsOnPage(ii);
if(annotListForPage!=null && annotListForPage.getTokenCount()>0){ //can have empty lists
while(annotListForPage.hasMoreTokens()){
//get ID of annot which has already been decoded and get actual object
String annotKey=annotListForPage.getNextValueAsString(true);
Object rawObj=decodePdf.getFormRenderer().getCompData().getRawForm(annotKey);
if(rawObj==null){
//no match found
System.out.println(“no match on “+annotKey);
}else{
//each PDF annot object – extract data from it
FormObject annotObj=(FormObject)rawObj;
int subtype=annotObj.getParameterConstant(PdfDictionary.Subtype);
if(subtype==PdfDictionary.Link){
System.out.println(“link object is “+annotKey);
float[] coords=annotObj.getFloatArray(PdfDictionary.Rect);
System.out.println(“Rect= “+coords[0]+” “+coords[1]+” “+coords[2]+” “+coords[3]);
//text in A subobject
PdfObject aData=annotObj.getDictionary(PdfDictionary.A);
if(aData!=null && aData.getNameAsConstant(PdfDictionary.S)==PdfDictionary.URI){
String text=aData.getTextStreamValue(PdfDictionary.URI); //+”ZZ”; deliberately broken first to test checking
System.out.println(“text=”+text);
}
}
}
}
}
}
}
FormVu allows you to
Use Interactive PDF Forms in the Web Browser |
Integrate fillable PDF Forms into Web Apps |
Parse PDF forms as HTML5 |
Hi Mark,
I have some issue in URI action included in pdf file to be redirected to any website. The URI added will be opened as soon as we open the pdf. When I observe the pdf in a pdf exposure tool, I see below formation of URI action.
“””
<>
“””
My first question is that, when I open this pdf in Chrome/any browser, this URI is opened in the same tab. So is there a way that the URI opens in new tab? How can that be done?
Second question is, when we open pdf files containing URI action in pdf viewer tools, it gives security alert if we want to allow this website or we want to block it. I don’t want this security alert. How can we do this? [Can we grammatically add the external website in the trusted list so that it doesn’t ask, however I am working in Python? Or any other way?]
Thanks in advance. Can you suggest something asap?
That does not look right as a value as that would not work. I would recommend if you want to programmatically edit PDFs, you need a tool like IText.