5 Things Firefox 19′s new JavaScript PDF Viewer (PDF.js) Cannot Do Yet

Having spent nearly 3 years building our own PDF to HTML5/SVG/Android converter (which itself relies on our development of a Java PDF library), we know how hard this conversion is and we have a grudging respect for any others trying it. In particular we have been interested in the development of PDF.js…

Exactly one week ago today, Mozilla released Firefox 19, unveiling PDF.js as the new default PDF Viewer. I’d like to prefix this article by saying what an impressive job they have done in creating a PDF Viewer in JavaScript, and while there are many things that it can do, here are a few things that it can’t, and the reasons why.

1. Save PDF as HTML5

What PDF.js does is take the PDF file, decode and extract the content, and display it as HTML5 with some eye candy and controls around the edge. Wouldn’t it be great if you could save that HTML5 version and put it on your website instead of the PDF?

One of the downsides of the PDF file format is that it’s not particularly web friendly. Whilst it’s possible to create PDF files with Fast Web View, the majority of PDF files don’t have it, and the user is left to download the entire PDF file (which could be very large), regardless of how many pages they actually view. There’s an opportunity here to alleviate the issue by storing the HTML5 version on a page by page basis.

One of the things that PDF does particularly well is store very complex documents using very little data. The “markup” (I use this term loosely), is very well optimised. Unfortunately, HTML’s is not. Nor is it suited (in any way) to such fine control of text positioning and layout as PDF. What this adds up to is (depending on how you optimise) very large file sizes.

There’s a penalty if you want an immaculate representation of the PDF file, and at the time time a lot of saving to be had if you are prepared to compromise. In PDF.js’s case, it’s probably best that the content stays as PDF – it doesn’t matter how large your converted files are if they are stored locally.

The screenshot below shows an example of this. It demonstrates the HTML markup required to display a short phrase as the PDF intended. (Höchste Zeit für einen Original VELUX Aussenrollladen.)

velux

(Click to view full size)

For reference, here’s how it looks in the PDF file:

veluxpdf

Unfortunately Velux have not paid me for this…

2. Copy and Paste Text from PDF files legibly

Following on from my above point, my tests have shown that it is very common for PDF files to contain finely positioned (or spaced) text that has resulted in the creation of many divs to position correctly. When you copy text from across multiple divs, a new line gets inserted for each div, and what this means for the above text is that it should appear over 4 lines but is actually appearing over 22, with a couple of characters on each.

3. Edit & Save The HTML5 Version

This is a similar groan to point 1, except that I will focus on the ability to edit. It is a well known fact that if you made a typing error in a document you have saved as a PDF, and you have lost the original, it is generally a huge pain to correct this. There are many attempts at software that allow you to edit PDF files, but none that are particularly good.

A typing error is not the only reason you may want to edit what’s in a PDF file, there are many use cases. A personal favourite technique that we have seen used to “delete” from PDF is draw a white box over some sensitive text and to resave the PDF file. It’s reminiscent of using plain text to display a Captcha and JavaScript to distort it to make it difficult to read…

Regardless, saving out as HTML5 is the perfect opportunity to allow you to edit any text or images, and then share a widely compatible HTML5 version of your content. However, the text that you think you can see, and highlight, and copy/paste is actually a very good illusion.

If you were to edit the text, you would notice that visually nothing is changing, however when you select and copy you are selecting the updated text. The reason for this is that to ensure the perfect visual representation, PDF.js is actually using invisible text overlaid on the text drawn as an image.

The reason for this is the huge issue that is font conversion. PDF allows for a very wide spectrum of fonts, however web browsers are definitely not font friendly. This means that in order to display fonts from PDF files in a web browser, they need to be converted to a supported font format, which is fraught with difficulty as each browser has its own expectations of how a font should be defined.

You can see this optimisation in action if you highlight text, and focus on another window. Here’s the same example from earlier:

What you see vs how it actually looks.

What you see vs What the font actually looks like.

4. Layout the pages differently (e.g. magazine style)

It’s quite common for publishers to save magazines and books as PDF files which are intended to be viewed as if it were an open book or magazine, with even numbered pages on the left, and odd numbered pages on the right.

Here’s a nice example:

Click on image to view as HTML5!

Click on image to view as HTML5!

Unfortunately, PDF.js only has one viewing mode – a single column.

5. Support Interactive Elements

As previously mentioned, PDF is a very powerful format, and allows for lots of interactivity by allowing you to support interactive elements such as forms, audio and video. These are all things that HTML5 can also do well, so perhaps we will see them in a future Firefox release?

If these features are of interest to you, we are the creators a PDF to HTML5 converter with many of these features that may be worth taking for a spin. You can find out more about it and try it online here.

What do you think of the PDF.js viewer?

This post is part of our “SVG Article Index” in these articles, we aim to help you build knowledge and understand SVG.

The following two tabs change content below.
Leon is a Developer at IDRsolutions, focusing mainly on development of the PDF to HTML5/SVG converter. He was a speaker at JavaOne 2012, co-presenting a session titled 'Lessons Learned in Writing a PDF-to-JavaFX Converter for NetBeans'.

Related Posts:

Leon Atherton

About Leon Atherton

Leon is a Developer at IDRsolutions, focusing mainly on development of the PDF to HTML5/SVG converter. He was a speaker at JavaOne 2012, co-presenting a session titled 'Lessons Learned in Writing a PDF-to-JavaFX Converter for NetBeans'.

2 thoughts on “5 Things Firefox 19′s new JavaScript PDF Viewer (PDF.js) Cannot Do Yet

  1. Frisian

    PDF.js is far from perfect. Aside from the deficiencies you listed above it tends to omit umlauts ( bel und h chst rgerlich!).
    Yet, when you’re in a corporate environment, where the Acrobat Reader isn’t updated as regularly as it should, it’s a godsend. Firefox 18 and older had to be persuaded to display PDF files when you’re stuck with Acrobat Reader 9 with an extra mouse-click. Not anymore! A real projection by technics! (put that into Google translator ;-))

    • I think PDF.js is an impressive achievement and a great advertisement for how powerful JavaScript and HTML5 can be, and it’s great that there’s now a default PDF reader in Firefox, but I’m not sure I agree with Mozilla’s decision to overwrite the existing default viewer without first asking the user. It’s fantastic considering it’s in JavaScript, but it doesn’t alter the fact that it’s still a downgrade from something like Adobe Reader. As you say, it’s far from perfect. There are lots of basic things it does poorly (have you tried printing a PDF from it?).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>