Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

PDF to HTML5 conversion – Hyphen is a special character

48 sec read

A dash or hyphen is a special character in HTML5 and needs to be treated with care. The reason is that it is not just a character but an indicator of a line break which is picked up by the width properties of the div element. So if you have the div element

<div>party games</div>

a browser will give you the width of a single height text element of 11 characters.

But

<div>party-games</div>

returns a width of 6 characters by default and assumes it can be wrapped. If we are trying to adjust the text to get a best fit, this will obviously cause a lot of problems. In the screenshot you can see what can happen.

The HTML page contains a div element with a hyphen

The single div element contains the text “Der 63-jährigeCano, dessenrichtiger” but as far as the width is concerned, the div contents are “Der 63-“. So if we try to adjust the content to fit the space we get a mess with the text wrapped over the next line. Not pretty 🙁

The solution is to break this into 2 divs (with the – end the end of the first value)

and it looks much better!

I think it can be improved still further, but that is for another post…



Watch how to use our PDF Viewer JPedal

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

2 Replies to “PDF to HTML5 conversion – Hyphen is a special…”

  1. Hi,

    I was using jpedal2html 5.13b16 version to convert pdf into html.
    After converting from pdf to html, all data after the hyphen are lost in the converted html.
    I wonder whether this is a known issue int he software version I am using? what is the latest version available? was it resolved in the later versions?

    Thanks
    Shiva

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2022. All rights reserved.