View Single Post
Old 04-07-2014, 04:17 PM   #4
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Preventing line wraps around dashes in Spanish dialogues

As dashes are wrap points in HTML, dialogues in Spanish ebooks can look terrible.

Example in one line:
Code:
—Bla, Bla, Bla, —John said—. More bla, bla, bla.
Wrong:
Code:
—Bla, Bla, Bla, —John said
—. More bla, bla, bla.
Wrong:
Code:
—Bla, Bla, Bla, —
John said—. More 
bla, bla, bla.
Right:
Code:
—Bla, Bla, Bla, —John 
said—. More bla, bla, bla.

The next two following searches add a <span> around the partner word with a specified class. (In my example just <span class="nw">).

Then add the next CSS definition for this class:
Code:
.nw { white-space: nowrap; display: inline-block; text-indent: 0em;}
and you will have prevented the wrong wrapping in Spanish books.

Edit notes. Explanation of the workaround for RMSDK:
Spoiler:
The previous CSS class is a modification of my original one which only included the white-space: nowrap; code. But in latest versions of RMSDK the white-space property has stopped working. (It worked, and works, in my old Sony PRS-650).

But in an ebook I was recently reading I found that they prevented the wrapping inside formulas enclosing them in a <span> with display: inline-block; text-indent: 0em;. So I just decided to add this method to my previous one. And then it also works in newer versions of RMSDK (Kobo Aura H2O with firmware 3.15.0 as example).

The no-wrapping effect is actually obtained through the display: inline-block; part. But if this protected <span> started a new line it would inherit the text-indent value its parent <p> had. Because of that behaviour, the text-indent: 0em; setting is also added.



First S&R
Search:
Code:
\x20(—|–|&mdash;|&ndash;)([^ <]+)( |</p>|</div>)
Replace:
Code:
\x20<span class="nw">\1\2</span>\3
Second S&R
Search:
Code:
\x20([^ >]+)(—|–|&mdash;|&ndash;)(\.|\.\.\.|,|;|:|…|&hellip;)?\x20
Replace:
Code:
\x20<span class="nw">\1\2\3</span>\x20
Additional usage notes
Spoiler:
  • Yes, you need both S&R and in that order.
  • Do not forget about setting up the additional CSS style or it would be useless.
  • As you can see they look for dashes and just dashes (in unicode or in named entity flavour). Some horribly formatted books use minus signs that these searches won't catch.
  • Case Sensitive or Dot All settings are probably irrelevant but I've got them in OFF.
  • Because of the [^ <]+ and [^ >]+ parts of the Searches they are completely safe to use. I mean they won't catch and destroy code like:
    Code:
    —Bla, Bla, Bla, —<b>John</b> <i>said</i>—. More bla, bla, bla.
    They will just ignore it. You will never get something wrong like:
    Code:
    —Bla, Bla, Bla, <span class="nw">—<b>John</span></b> <i><span class="nw">said</i>—.</span> More bla, bla, bla.
    You'll have to manually fix this kind of situations.
  • Using them where dashes are used as sentence or word separators is also safe:
    Code:
    First sentence—Second sentence.
    This situation, pretty common in English books, is also ignored.
  • As hinted in other thread I've used \x20 for the starting and ending spaces needed in the regexes, in order to make them clearly visible.
  • Obviously there's no point in adding a <span> around the very first starting dash and word, and these searches don't do that.
  • Strange situation that I remember having found once or twice. If there's some kind of CSS setting directly on <span> tags then it will be also applied to the newly created tags. I remember suffering a
    Code:
    span {font-size: 1.3em;}
    which I had to override with
    Code:
    .nw {font-size: 1em; white-space: nowrap;}
    while not losing where it was being originally applied.

Last edited by arspr; 05-07-2015 at 02:53 PM. Reason: New .nw CSS definition - Workaround for RMSDK
arspr is offline   Reply With Quote