Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

PDF to HTML5 conversion – Hyphen is a special character

48 sec read

A dash or hyphen is a special character in HTML5 and needs to be treated with care. The reason is that it is not just a character but an indicator of a line break which is picked up by the width properties of the div element. So if you have the div element

<div>party games</div>

a browser will give you the width of a single height text element of 11 characters.

But

<div>party-games</div>

returns a width of 6 characters by default and assumes it can be wrapped. If we are trying to adjust the text to get a best fit, this will obviously cause a lot of problems. In the screenshot you can see what can happen.

The HTML page contains a div element with a hyphen

The single div element contains the text “Der 63-jährigeCano, dessenrichtiger” but as far as the width is concerned, the div contents are “Der 63-“. So if we try to adjust the content to fit the space we get a mess with the text wrapped over the next line. Not pretty 🙁

The solution is to break this into 2 divs (with the – end the end of the first value)

and it looks much better!

I think it can be improved still further, but that is for another post…



Converting PDF/ Office Documents to HTML?

Mark Stephens Mark has been working with Java and PDF since 1999 and is a big NetBeans fan. He enjoys speaking at conferences. He has an MA in Medieval History and a passion for reading.

2 Replies to “PDF to HTML5 conversion – Hyphen is a special…”

  1. Hi,

    I was using jpedal2html 5.13b16 version to convert pdf into html.
    After converting from pdf to html, all data after the hyphen are lost in the converted html.
    I wonder whether this is a known issue int he software version I am using? what is the latest version available? was it resolved in the later versions?

    Thanks
    Shiva

Leave a Reply

Your email address will not be published. Required fields are marked *

IDRsolutions Ltd 2021. All rights reserved.