Jacob Collins Jacob is the JPedal Product Lead and specialises in PDF creation and manipulation. He also develops Salesforce backend systems and contributes to marketing and support. Outside work, he’s a 1900‑rated chess player, guitarist, and French learner.

How to translate PDF files in Java (Tutorial)

1 min read

awjune page 1 translated

Today I will demonstrate a worked example to show how you can create a PDF translator using our PDF toolkit JPedal and Translator. This will convert any PDF Document from one language to another (in this case English to Chinese).

You can get a copy of JPedal here.

Extracting text

First, we will need to extract the text from the document so that it can be passed to a translation API.

JPedal has lots of different methods to extract text based on what you need. I am going to use the paragraph estimation feature so we can translate one paragraph at a time.

We can do this by decoding the file with PdfDecoder and calling the getParagraphAreasAs2dArray() method.


Next, we will need to convert our paragraph rectangles from X,Y,W,H format to X0,Y0,X1,Y1 so that we can pass them to the grouping algorithm which extracts the text.


Now, for each paragraph, we can extract the words.


Learn more about extracting text.

Translating the text

Second, we need to connect to a translation API to get the translated text.

I have chosen to use Translator because it is easy to use and works well, but you could use any library.


Annotating text

Finally, we need to insert the translated text as an annotation which overlays each paragraph on the page.

We can use JPedal’s PdfManipulator class to efficiently perform bulk edits to a PDF file.


Once all the annotations are added, and we are outside of the loop, we can then apply the queued edits and write them to the file.


Learn more about manipulating PDF documents.

Results

Before
awjune page 1

After
awjune page 1 translated

You can find the complete source code for this on our GitHub profile.

We can help you better understand the PDF format as developers who have been working with the format for more than 2 decades!



The JPedal PDF library allows you to solve these problems in Java


Jacob Collins Jacob is the JPedal Product Lead and specialises in PDF creation and manipulation. He also develops Salesforce backend systems and contributes to marketing and support. Outside work, he’s a 1900‑rated chess player, guitarist, and French learner.