Author Archives: Mark Stephens

About Mark Stephens

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX. He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Saving your settings in our online PDF to HTML5 and SVG converter

Our online PDF converter now has quite a few options. You can output as HTML5 or SVG, there are realtext and image modes, lots of different layout modes, scaling and lots of optional features….

So we thought it would be really helpful if you could save your settings and reload them later. This makes it very easy to setup once and then easily reuse the saved configuration at a  later date. The data is stored in a simple text file which you can save to your computer and even pass onto others. You can have any number of saved configurations which you can then reload when you need them.

save configuration

If you use the page turning mode for your HTML5 pages, you will also notice that we have added a slick new look to it.

In fact, you may want to keep a close eye on the online PDF converter as the next few months are going to see some very cool new features appearing. We hope you like them…

 

Related Posts:

PDF teasers – how would you handle this stack problem?

This article arose as a result of debugging a customer file which was not displaying properly. There are many PDF files out there which do not actually meet the spec so we spend a lot of time tweaking our library to allow for all these ‘interesting’ cases.

The PDF file format has a stack system so that you can save the current graphics Status, make some changes and restore it later. In the PDF stream you will see this with the Q/q command. Here is an example

q  //save stack
1 0 0 1 130.32 117.601 cm
/X7 Do //draw an image or execute some commands
Q //restore stack

This code saves the stack, makes a change to the co-ordinates, does something and then restores original values.

It can even nest calls so you can have

q //save orig state
//something
q //save new state
//something
Q //restore new state
Q //restore orig state

It is a very powerful feature.

The Do command can also call some code commands including saving and restoring the stack like this

q  //save stack
1 0 0 1 130.32 117.601 cm
/X7 Do //execute these commands

     q //save state
     7.92 0 0 7.92 0 -0.001 cm //move position
     0 0 0 rg //set color
     BI /W 34 /IM true /D [1 0] /BPC 1 /H 34 ID //draw image

     //end of stream of commands (we pushed value onto stack but did not use)

Q //oh dear!!!

What do I do? Do I use value pushed by subroutine or value pushed before the sub-routine

The answer really is whether there is a single, global stack or whether the sub-routine has its own stack….

Update: The answer is to that the sub-routine effectively has its own stack so any values left should be ignored. Did you get it right?

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the Blog RSS.

Related Posts:

Where do your PDF objects start in a PDF file?

In theory this is a really easy question to answer. There is a reference table to all the PDF objects in the file giving you the binary offset to the start byte of the PDF object. So if I have a PDF reference table which looks like this….

xref
6153 30
0000000016 00000 n
0000004052 00000 n
0000004206 00000 n
0000004691 00000 n
0000004730 00000 n
0000004845 00000 n
0000005665 00000 n
0000006418 00000 n
0000007221 00000 n
0000007946 00000 n
0000008730 00000 n
0000009275 00000 n
0000009768 00000 n
0000010025 00000 n
0000010605 00000 n
0000010875 00000 n
0000011349 00000 n
0000012081 00000 n
0000012618 00000 n
0000012880 00000 n
0000013448 00000 n
0000014283 00000 n
0000014904 00000 n
0000015265 00000 n
0000044743 00000 n
0000051663 00000 n
0000062772 00000 n
0000066596 00000 n
0000003791 00000 n
0000000914 00000 n

trailer
<<908F0712C6BF4DCDBB6825BD22FB3D57>]/Prev 53534925/XRefStm 3791>>
startxref
0
%%EOF

I would find 30 bytes from the start of the file

6153 0 obj
<>
endobj

and 4052 bytes from the start of the file.

6154 0 obj
<>/Metadata 2037 0 R/Pages 2023 0 R/StructTreeRoot 2039 0 R/Type/Catalog/ViewerPreferences<>>>
endobj

Things became more complicated with compressed objects, where objects can be embedded inside binary streams (allowing you to make the file smaller). So object 6154 might be embedded in compressed object data attached to object 2086. This is why you cannot see all the PDF objects inside PDF files if you open these PDF files in a text editor.

Where is gets very messy however, is that Adobe Acrobat does not enforce the rules about where an object starts (and generally adjusts to allow for errors). So you could find that

at 30 bytes from the start of the file, you see

<>
endobj

and 4052 bytes from the start of the file.

<>/Metadata 2037 0 R/Pages 2023 0 R/StructTreeRoot 2039 0 R/Type/Catalog/ViewerPreferences<>>>
endobj

I have recently seen several tools that do this and because the files work in Adobe Acrobat, they assume they are writing out ‘correct’ PDF files. This makes life very hard for us developers!

As with a lot of things in the PDF file format, there are clearly laid rules but they are not enforced. So this is where your PDF objects should start, but as you need to know that the values may not be totally correct. How much error do you allow for with the PDF file specification?

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the Blog RSS.

Related Posts:

Version 5 release – Swing and javaFX

In Part 5 of a series of articles regarding the JPedal Version 5.X update we explore the big changes that this update will bring.

A lot has changed in the Java world since we started writing our Java PDF library. As developers we try to steer a careful middle course between not changing anything and updating to remain current. With Java versions, that means testing on the latest Java versions (currently Java 7 and you will see we have been testing with Nashorn) and supporting previous versions (currently Java 6 which was released in 2006 and is no longer supported by Oracle).

When we created the original JPedal, it was a pure viewer so we created it to extend JPanel. This provided a clean, simple component we called PdfDecoder. We built a totally configurable Viewer around this but clients can still use PdfDecoder directly if they prefer.

This worked very well for its purpose but we have an increasing number of users using it for lots of other purposes which do not require Swing (PDF to image conversion, text extraction and search, etc). We have also altered the way the code works internally so it makes much less use of Swing Forms than the original.

With our version 5.0 release, we needed to change some features for XFA and enhanced JavaScript support. We decided to take the opportunity to look at what other major revisions might be worthwhile.

So we have created a new version (PdfDecoderServer). This does not include any of the Swing functionality (no printing – which needs Swing – or viewing and no PDF to  Graphics2D rendering).

You can still use PdfDecoder for everything but if you just want these ‘server’ functions, PdfDecoderServer is available (and should be quicker and use less memory). Both PdfDecoder and PdfDecoderServer now implement PdfDecoderInt with common methods in the Interface.

We did consider renaming PdfDecoder to something like  PdfSwingDecoder, but this would mean customers need to alter their code extensively. So we have kept it as it was.

Oracle have made it clear that the future of Display in Java is JavaFX. Swing will continue in Java but will not see updates. We have already started adding in JavaFX features to our Viewer (the PageFlow mode shows off what JavaFX can do). We have some more in the pipeline for later releases this year. And obviously we might create a PdfDecoderJavaFX in the future. Let’s just say we have some very exciting future plans and we are always interested in any customer suggestions…

In the meantime, we hope we have steered that middle course of keeping up to date, keeping good backward compatibility and allowing for future enhancements in our monthly releases.

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (TwitterFacebook and Google+) or the Blog RSS.

Related Posts:

Which languages should have examples when documenting a web service?

One of the great things about web services is that it removes the language issue. You can write you code in whatever language you think fit, and other developers can use their language of choice. Web services provide a quick and easy end to the language wars – everyone wins…. Or do they?

While the principles of web services are generic, each language has its own features and developers like to see their code.

When we released our web services to provide a web service to convert PDF to HTML5 or SVG via the cloud, we initially produced examples in Java and PHP as the 2 most popular requests That still leaves a large number of possible options (Ruby, DotNet, Objective C, etc), hence this open question.

So what languages do you think are being used for Web services, and which one should we document next? What do you want to see?

If you’re a first-time reader, or simply want to be notified when we post new articles and updates, you can keep up to date by social media (Twitter, Facebook and Google+) or the  Blog RSS.

Related Posts: