Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-16-2014, 03:42 AM   #1
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 969
Karma: 1807784
Join Date: Jun 2011
Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Libra Color, Nxtpaper 11
Question regex behaving greedier?

I try to remove unwanted "span's" of the form:
Code:
<span class="font1 font2 something">interesting text</span>
and used to use the following regex in find:
Code:
find: <span(.*)>(.*)</span>
replace: \2
However, this no longer works as now *everything* is found up to the terminating <p><crlf> or until a second terminating </span> when this is found earlier just before the terminating <p><crlf>.

It seems that regex behaves now greedier (while dotall is *not* enabled) then before. What am I missing?
DrChiper is offline   Reply With Quote
Old 04-16-2014, 03:59 AM   #2
jbacelar
Interested in the matter
jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.
 
jbacelar's Avatar
 
Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
find: <span.*?>(.*?)</span>
replace: \1
jbacelar is offline   Reply With Quote
Advert
Old 04-16-2014, 08:31 AM   #3
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 969
Karma: 1807784
Join Date: Jun 2011
Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Libra Color, Nxtpaper 11
@jbacelar: Thanks, but no, that is not the solution. A combination of '*' (zero or more) and '?' (zero or one) together does not make sense here.
I just tried my original regex within calibre 1.31 and it works as intended. However, it does not seem to work in calibre v1.32, which is odd.
DrChiper is offline   Reply With Quote
Old 04-16-2014, 08:37 AM   #4
Skeeve
Zealot
Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.
 
Skeeve's Avatar
 
Posts: 142
Karma: 669192
Join Date: Nov 2013
Device: Kindle 4.1.1 no touch
I don't get what you want to expres... Could you post some examples?
Skeeve is offline   Reply With Quote
Old 04-16-2014, 08:46 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
* - means match zero or more matching as many as possible. i.e. be greedy

*? - means match zero or more matching as few as necessary, i.e. dont be greedy
kovidgoyal is offline   Reply With Quote
Advert
Old 04-16-2014, 12:04 PM   #6
Skeeve
Zealot
Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.
 
Skeeve's Avatar
 
Posts: 142
Karma: 669192
Join Date: Nov 2013
Device: Kindle 4.1.1 no touch
If I understood correct, he's saying that

Code:
<span(.*)>(.*)</span>
would also match

Code:
<span>something<p><crlf>
Skeeve is offline   Reply With Quote
Old 04-16-2014, 12:08 PM   #7
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 969
Karma: 1807784
Join Date: Jun 2011
Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Libra Color, Nxtpaper 11
I checked and it appears the text is/was very bad formatted in which </span> is sometimes missing, thus causing the weird behavior by selecting unexpectedly large sections. And that my original regex did worked in the past for other texts may just be pure luck

Oh well, at least I did learned a new regex trick and refreshed my knowledge about "Lazy quantifiers" versus "Greedy quantifiers". Thanks all!

Last edited by DrChiper; 04-16-2014 at 12:11 PM.
DrChiper is offline   Reply With Quote
Old 04-16-2014, 01:24 PM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Just remember; using regex to remove only certain span|div tags will usually blow up in your face wherever spans|divs are nested.
DiapDealer is online now   Reply With Quote
Old 04-17-2014, 04:14 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
The modify epub plugin is going in interesting places regarding junk spans. Might be worh a look...

Using a negative lookahead you can avoid nesting issues, there is an example by me in that thread.

Last edited by eschwartz; 04-17-2014 at 04:17 AM.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Paperwhite behaving strangely route66 Amazon Kindle 6 03-25-2013 01:42 AM
Authors Behaving Badly mr ploppy Writers' Corner 2 08-27-2012 06:55 PM
PRS-300 Reader behaving badly! docusk Sony Reader 14 04-03-2012 07:46 PM
PRS-300 PRS-300 behaving badly docusk Sony Reader 2 03-22-2012 12:46 PM
Kindle behaving oddly after reset ficbot Amazon Kindle 2 09-11-2010 03:10 PM


All times are GMT -4. The time now is 09:34 PM.


MobileRead.com is a privately owned, operated and funded community.