Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-27-2014, 03:55 AM   #1
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 677
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
s&r for paired tags

I've got a book file full of code like :

Code:
<p class="calibre2"><span class="none2">blah blah blah</span></p>

Is there a way I can remove these spans, (the "none2" ones) to get

Code:
<p class="calibre2">blah blah blah</p>
without messing up any other spans?

I can remove the opening by simple s&r, but then I would have orphaned </span>, but could not just delete </span> without screwing up other spans.


I could change "<span class="none2">" to "<span>" and neuter them, but I really hate to leave junk code in the file.


-- PS, I know what regex are,and have written some simple ones, but parsing HTML is a bit hairy.

Last edited by AlanHK; 07-27-2014 at 04:23 AM.
AlanHK is offline   Reply With Quote
Old 07-27-2014, 05:18 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,640
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
If the spans are not nested the following simple regex should do the trick:

Find:<span class="none2">(.*?)</span>
Replace:\1
Doitsu is offline   Reply With Quote
Advert
Old 07-27-2014, 07:01 AM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,303
Karma: 12126963
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Doitsu View Post
If the spans are not nested the following simple regex should do the trick:
This needs to be stressed. Quite often span tags are NOT nested, and you might accidentally cause a lot of damage if you just do a large "Replace All".

(I have done it many times, and didn't notice until later when I was doing a few cleaning passes). Later wondering "why the heck is this entire paragraph in smallcaps?".

Always save versions of your EPUBs when doing larger edits like this.

For nested tags, you really just need something that can actually PARSE HTML, and not just Regex.
Tex2002ans is offline   Reply With Quote
Old 07-27-2014, 07:33 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
you could probably regex out only the ones that are adjacent to the P tags
find
<p class="calibre2"><span class="none2">(.*?)</span></p>
replace
<p class="calibre2">\1</p>

but take a backup first- this code will go wrong if you have nested spans!
cybmole is offline   Reply With Quote
Old 07-27-2014, 08:50 AM   #5
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 677
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by Doitsu View Post
If the spans are not nested the following simple regex should do the trick:

Find:<span class="none2">(.*?)</span>
Replace:\1

That should work, thanks.


Quote:
Originally Posted by Tex2002ans View Post
For nested tags, you really just need something that can actually PARSE HTML, and not just Regex.
Well, Sigil can parse HTML. It highlights the tag pairs, for instance. I was hoping there were some options hidden away that I could use to do this. Too bad it doesn't give users more HTML-aware s&r than generic regex.

Is there any HTML code editor that does stuff like this?
AlanHK is offline   Reply With Quote
Advert
Old 07-27-2014, 09:30 AM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
if you want a HTML editor try freeware notepad++, but don't expect it to understand ebook structure.
calibre editor is your other go-to solution as it is in ongoing development / you can post enhancement requests

NB the spans may look ugly but they are mostly harmless - the book will render OK if you just leave them be!
cybmole is offline   Reply With Quote
Old 07-27-2014, 10:41 AM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Find:
Code:
<span class="none2">((?:(?!<span).)*?)</span>
Replace:
Code:
\1
Using a negative lookahead we search for the LACK of a nested span, followed by any character, then repeat.

Matches nested tags as long as only the outer tag is a span. But you can be more specific if you want, by changing the lookahead.

http://regular-expressions.info/completelines.html

Last edited by eschwartz; 07-27-2014 at 10:50 AM.
eschwartz is offline   Reply With Quote
Old 07-27-2014, 12:56 PM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,441
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Nested Spans are a pain and if you START out with the code in Post 2, you will have a disaster because that is only safe with a simple (and IMHO unnecessary, except it is a conversion simplifier) span as you show.
process
Code:
<p class="calibre2 none2">blah blah blah</p>
should work the same.

I am going give a try to eschwartz's REGEX
theducks is online now   Reply With Quote
Old 07-27-2014, 01:33 PM   #9
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 677
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by cybmole View Post
if you want a HTML editor try freeware notepad++, but don't expect it to understand ebook structure.
That what I do want. I'm surprised after all these years there isn't something that does. (HTML, not just ebooks).

I use Ultraedit for my text. But I was hoping for more than a text editor that highlights.


Quote:
Originally Posted by theducks View Post
I am going give a try to eschwartz's REGEX
Googled it, "No results found for "eschwartz's REGEX" "

??
AlanHK is offline   Reply With Quote
Old 07-27-2014, 02:46 PM   #10
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If you wait long enough and use the calibre editor, he will have a scripting function that should be able to do multiple tests for different cases. The rest of us know from bitter experience with Sigil, since it has no global undo, how hard it is to consider all cases with regex(regular expressions). You can use the simple version proposed earlier, but the only way to do it safely to use it one find at a time so you can see when it vacuums up more text than you intended.

Look up 2-3 posts above and the regex that eschwartz proposed is there showing find and replace.
mrmikel is offline   Reply With Quote
Old 07-27-2014, 03:16 PM   #11
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 677
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by mrmikel View Post
If you wait long enough and use the calibre editor, he will have a scripting function that should be able to do multiple tests for different cases. The rest of us know from bitter experience with Sigil, since it has no global undo, how hard it is to consider all cases with regex(regular expressions).
Just my impression that Sigil is (was?) the more code-editing tool, while Calibre the more GUI.

Quote:
Originally Posted by mrmikel View Post
You can use the simple version proposed earlier, but the only way to do it safely to use it one find at a time so you can see when it vacuums up more text than you intended.
Since it's almost every paragraph in a book, that's a few thousand cases. One at a time isn't an option.

Anyway, I worked it out by first finding and fixing the spans I wanted to keep (as it happens, one) and then could delete the rest with a clear conscience.


Quote:
Originally Posted by mrmikel View Post
Look up 2-3 posts above and the regex that eschwartz proposed is there showing find and replace.
Duh. I thought it was some kind of software. I didn't register the names next to posts.
AlanHK is offline   Reply With Quote
Old 07-27-2014, 03:45 PM   #12
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,733
Karma: 75000000
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
You might also check out the forked version of ePub Clean plugin for calibre that has some support for removing SPANs
PeterT is offline   Reply With Quote
Old 07-27-2014, 04:14 PM   #13
signum
Zealot
signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.
 
Posts: 119
Karma: 64428
Join Date: Aug 2011
Device: none
If nested spans are a possibility, I like to use a search pattern similar to post #2, except I replace the stuff inside the parentheses with ([^<]*). This says to match any string of characters up to, but not including, a less than sign. If the immediately following characters are not </span>, the entire pattern fails and no replacement is done.Otherwise, the replacement stays the same. In my experience, this leaves only a handful of paragraphs to be dealt with in another way, often by hand.
signum is offline   Reply With Quote
Old 07-27-2014, 09:05 PM   #14
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by signum View Post
If nested spans are a possibility, I like to use a search pattern similar to post #2, except I replace the stuff inside the parentheses with ([^<]*). This says to match any string of characters up to, but not including, a less than sign. If the immediately following characters are not </span>, the entire pattern fails and no replacement is done.Otherwise, the replacement stays the same. In my experience, this leaves only a handful of paragraphs to be dealt with in another way, often by hand.
That is why I prefer using a negative lookahead -- it catches that too.
eschwartz is offline   Reply With Quote
Old 07-28-2014, 10:26 AM   #15
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@eschwartz--

Quote:
<span class="none2">((??!<span).)*?)</span>
1. Can you explain how the negative look ahead works, including breaking down the pieces of the RE?

2. Many times when I'm cleaning an epub, removing unneeded 'class=" ... " ' in <span class="..."> I'll eventually end up with a lot of <span>.....</span> constructs. It appears that your RE is better than the more simplistic RE I was using to just remove them

Thanks
phossler is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tags & Series RealRedhair Library Management 22 07-22-2014 09:28 AM
Calibre Tags & Aldiko Tags Not the Same Themus Calibre 3 03-21-2012 09:23 PM
Amazon Tags - Popular tags vs Unique tags. chrisanthropic Writers' Corner 6 09-20-2011 12:18 AM
FBReader tags on DR & PC sasilk iRex 0 01-23-2010 02:38 AM


All times are GMT -4. The time now is 12:43 PM.


MobileRead.com is a privately owned, operated and funded community.