Why convert PDF magazines to HTML5? – Part 6. Load quickly and save on bandwidth

One of the things that PDF does really well is portability – so well in fact that they decided to include it in the name (Portable Document Format). But a lot has changed since PDF was conceived in 1991, and in the age of the smartphone the definition has grown slightly.

With high quality graphics or a large number of pages (or video), the size of a PDF file can quickly grow to be very large. PDF handles this very well – if you ask for page 500 it will go to the index to find the location of page 500 and go straight to it, skipping over pages 1 to 499. Much the same as humans use contents pages to find a topic and go straight to it – if you want to read page 10, you don’t have to read pages 1 to 9 first.

This is fine when the PDF file is on your computer, but when viewing PDF from a website it’s a little more tricky because the content is streamed from start to end, and to gain access to the index you need to have the whole file. It’s a bit like you needing to know how to bake a cake and me handing you the recipe book one page at a time. You need to wait for me to give you all the pages in the book before you are able to access the index and find out which page you need to turn to.

Adobe added a “Fast Web View” feature that allows a viewer to display some pages while the rest are loaded, but this is of limited use if the PDF has not been set up to use this feature. And what about if you just want to view a couple of the pages? Regardless of how much you need, the whole file will still get downloaded.

If the readers of your magazine have slow internet connections or mobile phones and tablets with expensive data plans, this can be a big issue.

Wouldn’t it be great if your reader could download only the pages they actually read, and if you could create bespoke versions at different quality or zoom levels to optimise for devices with smaller screens such as mobile phones?

These are all features that we have thought about and included. Why not try out some sample magazines on our free online PDF to HTML5 converter? We even have a mode that will make it look like a magazine and allow you turn the pages using your mouse (or finger on mobiles/tablets). You can see it in action here.

EC Example

This article is part 6 of a series where we talk about the advantages of publishing your PDF magazines online as HTML5. Click here to visit the index and see more advantages.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the Blog RSS.

Related Posts:

Drawing Java Components without displaying them.

Recently I have had a need to draw a pdf page to a provided graphics object. Considering that we draw the page to a Panel this is no hard feat to achieve all we need is to pass the graphics object into the Panels paint method. The problem we have run into is that we need to do this without displaying the Panel itself.

For most of the pages content this is no problem and the output is correct. Form components on the other hand are not so simple. By default our form components are swing components added to the Panel so as to appear at the correct point on the page. Where we have custom appearances we can override the display of the components.

When passing in a provided graphics object everything appears to be rendered correctly but some forms are not displayed correctly, they appear as empty grey boxes on the page. I have spent some time looking into this and I have come to learn several important lessons.

  • The process of overriding the appearance of the components allows the form component to be drawn on the graphics object correctly.
  • Form components that use the standard appearance are not displayed correctly when they are not visible on the screen.
  • Swing components display appearance only seem to be created or made draw-able once a component is made displayable.
  • These components are only made displayable in certain circumstances.

So, what is the difference between displayable and visible. A visible component is currently visible on the screen. A displayable component means it has been added to a containment hierarchy that has been made displayable. This can be done by either calling pack() on the ancestor window or by making the ancestor visible.

So in order to allow the pdf page to be drawn to a graphics object and have all the form components appear correctly without displaying the pages to screen we need to add the form components to a dummy frame. By adding the form components to a JFrame and then calling pack() from the JFrame we can mark the for components and any children components as displayable.

Problem solved.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the Blog RSS.

Related Posts:

Saving your settings in our online PDF to HTML5 and SVG converter

Our online PDF converter now has quite a few options. You can output as HTML5 or SVG, there are realtext and image modes, lots of different layout modes, scaling and lots of optional features….

So we thought it would be really helpful if you could save your settings and reload them later. This makes it very easy to setup once and then easily reuse the saved configuration at a  later date. The data is stored in a simple text file which you can save to your computer and even pass onto others. You can have any number of saved configurations which you can then reload when you need them.

save configuration

If you use the page turning mode for your HTML5 pages, you will also notice that we have added a slick new look to it.

In fact, you may want to keep a close eye on the online PDF converter as the next few months are going to see some very cool new features appearing. We hope you like them…

 

Related Posts:

iOS and HTML5: Gotcha with Absolute Positioning

One of the aims of our PDF to HTML5 converter and all of it’s various view modes (all 9 of them) was to make viewing of PDF files easy and platform independent; where the user only needs a relatively modern web browser to view them.

And as we designed the output to be used by the browser we also allow you to select and search the text using your browser’s default tools, and this free functionality normally works great in all web browsers, across all platforms, even Android devices as you can see in the images in this post.

Example of view of our output HTML in AndroidHowever this sadly isn’t true within Apple’s current version of Safari on iOS and Chrome on the iOS. Currently both don’t quite support all the latest CSS to as great a degree as other mobile devices and as a result of it’s bizarre Selection engine it’s very difficult and often impossible to select text on pages containing complicated CSS (explained below).

We recently had a customer query us about why they couldn’t select the text of our output on their iPad which struck us as an odd question; the default mode for our output has had selectable text for as long as I can recall so my initial thought was that it may of just been a user unfamiliar with how to select text on an iPad. However we still checked to be sure and were surprised to find that the text wasn’t selectable.

This was puzzling because, as I before mentioned, the text has always been selectable, it is after all just text within a div tag in the HTML and we were sure it worked previously.Example of Searching our HTML output in Google Chrome

After going over our current output I found some older output that worked and had a look at the differences to the current version.

Visually they looked almost identical with a few improvements in regards to character spacing in our current version and a different background colour.

Structurally the newer version differs quite a lot from the older version. In our older versions we placed the text within div tags under our parent jpedal tag with styling like so:

<body style="background-color: rgb(55,55,65);">
<div id="jpedal" style="position:relative; width: 984px; margin: 0 auto;">

<!-- Shared CSS values -->
<style type="text/css" >
.t {
	position:absolute;
	white-space:nowrap;
	overflow:visible;
	z-index:1;
}

.tr {
	-webkit-transform-origin: left top;
	-ms-transform-origin: left top;
	-moz-transform-origin: left top;
	-o-transform-origin: left top;
}
</style>

<!-- Inline CSS values -->
<style type="text/css" >

#t1_1 {
	left:90px;
	top:60px;
	FONT-SIZE: 60px;
	FONT-FAMILY: CataneoBT-Regular1;
	color:rgb(0,85,149);
}

#t2_1 {
	-webkit-transform:matrix(0.97,0,-0.2,0.97,114, 181);
	-ms-transform:matrix(0.97,0,-0.2,0.97,114, 181);
	-moz-transform:matrix(0.97,0,-0.2,0.97,114, 181);
	-o-transform:matrix(0.97,0,-0.2,0.97,114, 181);
	FONT-SIZE: 21px;
	FONT-FAMILY: IGNACK-RaleighBT-Roman1;
	color:rgb(35,32,32);
}

#t3_1 {
	-webkit-transform:matrix(0.97,0,-0.2,0.97,350, 212);
	-ms-transform:matrix(0.97,0,-0.2,0.97,350, 212);
	-moz-transform:matrix(0.97,0,-0.2,0.97,350, 212);
	-o-transform:matrix(0.97,0,-0.2,0.97,350, 212);
	FONT-SIZE: 13px;
	FONT-FAMILY: IGNACK-RaleighBT-Roman1;
	color:rgb(35,32,32);
}
</style>

<!-- Any embedded fonts defined here -->
<style type="text/css" >
@font-face {
	font-family: CataneoBT-Regular1;
	src: url("01/fonts/CataneoBT-Regular.woff");
}

@font-face {
	font-family: IGNACK-RaleighBT-Roman1;
	src: url("01/fonts/IGNACK-RaleighBT-Roman.woff");
}
</style>

<!-- Text defined here and setup in CSS -->
<div id="t1_1" class="t">Some things never change</div>
<div id="t2_1" class="t tr">Never trust a dog to watch your food.</div>
<div id="t3_1" class="t tr">�</div>

Example of Searching our HTML output in FirefoxWe simply apply the correct styling and letter spacing to each element via it’s class and ID attributes.

To reduce the large amount of class=”t” which is a CSS class in our older output that contained some CSS rules common to all of our text and other repeated values in the CSS for each div’s ID we introduced several parent divs, that reduce file size and make our CSS easier to understand.

 

 

 

Below you can see an example of the current output and it’s structure (Note: As with the previous example this is just a snippet of the relevant parts of our output):

<body style="background-color:#919191;">
<div id="jpedal" style="position:relative; width: 984px; height: 1179px; overflow: hidden; margin: 0 auto; box-shadow: 0 2px 6px rgba(100, 100, 100, 0.5);">

<!-- Begin shared CSS values -->
<!--[if lt IE 9]><style type="text/css">.text div div{zoom: 25%;}</style><![endif]-->
<style type="text/css" >
.text {
	position: absolute;
	-webkit-transform-origin: top left;
	-moz-transform-origin: top left;
	-o-transform-origin: top left;
	-ms-transform-origin: top left;
	-webkit-transform: scale(0.25);
	-moz-transform: scale(0.25);
	-o-transform: scale(0.25);
	-ms-transform: scale(0.25);
	z-index: 1;
}
.text div div {
	position:absolute;
	white-space:nowrap;
	overflow:visible;
}
</style>
<!-- End shared CSS values -->

<!-- Begin inline CSS -->
<style type="text/css" >

#t1_1{left:360px;top:240px;}
#t2_1{-webkit-transform:matrix(0.97,0,-0.2,0.97,456, 724);-ms-transform:matrix(0.97,0,-0.2,0.97,456, 724);-moz-transform:matrix(0.97,0,-0.2,0.97,456, 724);-o-transform:matrix(0.97,0,-0.2,0.97,456, 724);}
#t3_1{-webkit-transform:matrix(0.97,0,-0.2,0.97,1400, 848);-ms-transform:matrix(0.97,0,-0.2,0.97,1400, 848);-moz-transform:matrix(0.97,0,-0.2,0.97,1400, 848);-o-transform:matrix(0.97,0,-0.2,0.97,1400, 848);}
#t4_1{left:1456px;top:848px;}

#t2_1,#t3_1 {
	-webkit-transform-origin: left top;
	-ms-transform-origin: left top;
	-moz-transform-origin: left top;
	-o-transform-origin: left top;
}

.s2_1{
	FONT-SIZE: 84px;
	FONT-FAMILY: IGNACK-RaleighBT-Roman1;
	color: rgb(35,32,32);
}

.s1_1{
	FONT-SIZE: 240px;
	FONT-FAMILY: CataneoBT-Regular1;
	color: rgb(0,85,149);
}

.s3_1{
	FONT-SIZE: 52px;
	FONT-FAMILY: IGNACK-RaleighBT-Roman1;
	color: rgb(35,32,32);
}

</style>
<!-- End inline CSS -->

<!-- Begin embedded font definitions -->
<style type="text/css" >

@font-face {
	font-family: CataneoBT-Regular1;
	src: url("index/fonts/CataneoBT-Regular.woff");
}

@font-face {
	font-family: IGNACK-RaleighBT-Roman1;
	src: url("index/fonts/IGNACK-RaleighBT-Roman.woff");
}

</style>
<!-- End embedded font definitions -->

<!-- Begin text definitions (Positioned/styled in CSS) -->
<div class="text">
<div class="s1_1">
<div id="t1_1">Some things never change</div>
</div>
<div class="s2_1">
<div id="t2_1">Never trust a dog to watch your food.</div>
</div>
<div class="s3_1">
<div id="t3_1">�</div>

Example of Searching our HTML output on Safari

This reduced our output length by a lot; not having to output the font-family per id and the class=”t” per element adds up to a lot of saved characters in the output files which consequently makes large converted files with a lot of similar text smaller.

However nesting these absolutely positioned elements appears to be what the issue is in iOS!

If you’re on an iPad I’ve created a simple example in an attempt to show the differences here. Pressing the button will switch between text that can easily be selected in the iOS and that which cannot.

This probably isn’t intended behaviour and may well be a bug with iOS!

One solution we’ve come up with for this is to change the output on the page when navigated to in iOS to something it can select the text of. Of course this effects the performance of our output when looked at on iOS devices which isn’t the best compromise. My personal hope is that this issue is rectified within the iOS itself so that other developers don’t have to encounter this oddity.

Have you had any difficulties with selecting text on iOS or other web browsers? We’d love to hear them and how you solved them!

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the Blog RSS.

Related Posts:

  • No Related Posts

PDF teasers – how would you handle this stack problem?

This article arose as a result of debugging a customer file which was not displaying properly. There are many PDF files out there which do not actually meet the spec so we spend a lot of time tweaking our library to allow for all these ‘interesting’ cases.

The PDF file format has a stack system so that you can save the current graphics Status, make some changes and restore it later. In the PDF stream you will see this with the Q/q command. Here is an example

q  //save stack
1 0 0 1 130.32 117.601 cm
/X7 Do //draw an image or execute some commands
Q //restore stack

This code saves the stack, makes a change to the co-ordinates, does something and then restores original values.

It can even nest calls so you can have

q //save orig state
//something
q //save new state
//something
Q //restore new state
Q //restore orig state

It is a very powerful feature.

The Do command can also call some code commands including saving and restoring the stack like this

q  //save stack
1 0 0 1 130.32 117.601 cm
/X7 Do //execute these commands

     q //save state
     7.92 0 0 7.92 0 -0.001 cm //move position
     0 0 0 rg //set color
     BI /W 34 /IM true /D [1 0] /BPC 1 /H 34 ID //draw image

     //end of stream of commands (we pushed value onto stack but did not use)

Q //oh dear!!!

What do I do? Do I use value pushed by subroutine or value pushed before the sub-routine

The answer really is whether there is a single, global stack or whether the sub-routine has its own stack….

Update: The answer is to that the sub-routine effectively has its own stack so any values left should be ignored. Did you get it right?

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the Blog RSS.

Related Posts: