Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 10-29-2015, 07:28 AM   #1
crankypants
Hmm.
crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.
 
Posts: 124
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
How to make regex to replace 2 spaces between words, with one space?

Sigil 0.8.7 on Windows 8.

When I paste a text file into Sigil it does lots of formatting for me, that's great. But many times I end up with 2 spaces between words, and in the Sigil preview window, these 2 spaces are not compressed into 1 space, which I thought XHTML would do, just like HTML.

So, I want to replace 2 spaces (or more) that are between words, with one single space. Example where _ is a space.

Code:
Good__morning_today.
Should end up as:

Code:
Good_morning_today.
This should not be affected.

Code:
_____<p>This is the beginning of a paragraph.</p>
crankypants is offline   Reply With Quote
Old 10-29-2015, 08:41 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,758
Karma: 198099188
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
First off: Sigil Preview should render multiple space characters as a single space. That's the way all (x)html works. If it's not, it means that 1) the double spaces are inside a <pre> tag which indicates all whitespace is to be preserved; 2) same as #1, but "pre" is assigned through css (adobe products are notorious for this); or 3) the space characters are special no-breaking unicode characters. A fourth scenario is that spaces are being converted to &nbsp; entities when pasting with formatting. Look in code view to check.

But all that aside ... the Captain Overkill in me, would use something like:
Code:
(*UCP)\b[^\S\p{Zl}\p{Zp}\n\r\t]{2,}\b
and replace what it matches with a single space character.

But that's just me.

And even that's still not going to work for situations like:

Code:
Sometimes_punctuation_like_this,__will_screw_things_up.
In that case, I'd use something like:
Code:
(*UCP)(\b|\p{P})[^\S\p{Zl}\p{Zp}\n\r\t]{2,}(\b|\p{P})
After I did the initial find and replace (no replace expression given for that one, by the way. It complicates things).

Though that may not always achieve the desired result--depending on the text. The bottom line is: don't blindly do a replace all. Step through each instance and verify the replace.

It basically looks for word boundaries (\b - made unicode aware by (*UCP)) and looks for two or more consecutive whitespace characters (not including any newlines, returns, tabs, or unicode paragraph/line separators) between them.

** And yes ... that's "NOT not whitespace" logic in there.

People who think they don't have to worry about any possible unicode characters or punctuation issues could probably get away with:
Code:
\b[^\S\n\r\t]{2,}\b
** None of my regex will work if the whitespace is being achieved with html entities.

Last edited by DiapDealer; 10-29-2015 at 09:11 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 10-29-2015, 11:29 AM   #3
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,143
Karma: 18843349
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Then there is the very simplified version:

Search: " " (two spaces)
Replace: " " (one space)

When you save, it'll put all the organizing white space back in there so your paragraphs all line up.
Turtle91 is offline   Reply With Quote
Old 10-29-2015, 11:43 AM   #4
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,758
Karma: 198099188
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
But that doesn't meet the OP's requirement that the solution must leave indentation alone. Yes, pretty-print will put indentation back when you save, but what if the user has unchecked the Clean on Save/Open option(s) or disabled pretty-print altogether? (Did I mention Captain Overkill? )

Last edited by DiapDealer; 10-29-2015 at 11:46 AM.
DiapDealer is offline   Reply With Quote
Old 10-29-2015, 11:51 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,203
Karma: 57978778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
right-click Reformat HTML: clean source

will put back any missing code indention
The space before opening Block tags (<p <div <h# ...) is for Humans, just like the blank lines between

Code:
<p>stuff</p>p>more stuff</p>
displays no different if on 2 lines, with or without indents
theducks is offline   Reply With Quote
Advert
Reply

Tags
regex, space


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Missing spaces between words giwqnbha Calibre 2 10-18-2015 05:24 AM
regex - issue with spaces? cybmole Editor 43 12-31-2013 12:49 PM
Regex Find and Replace - Spaces essayhead Sigil 2 08-10-2012 07:41 PM
Troubleshooting can't make any spaces between words in my novel. fantaxy Amazon Kindle 2 08-03-2011 10:38 AM
RegEx: Removing Page Numbers that have Spaces captainslow Conversion 2 02-27-2011 04:14 PM


All times are GMT -4. The time now is 07:56 PM.


MobileRead.com is a privately owned, operated and funded community.