04-16-2014, 03:42 AM | #1 |
Bookish
Posts: 969
Karma: 1807784
Join Date: Jun 2011
Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Libra Color, Nxtpaper 11
|
regex behaving greedier?
I try to remove unwanted "span's" of the form:
Code:
<span class="font1 font2 something">interesting text</span> Code:
find: <span(.*)>(.*)</span> replace: \2 It seems that regex behaves now greedier (while dotall is *not* enabled) then before. What am I missing? |
04-16-2014, 03:59 AM | #2 |
Interested in the matter
Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
|
find: <span.*?>(.*?)</span>
replace: \1 |
Advert | |
|
04-16-2014, 08:31 AM | #3 |
Bookish
Posts: 969
Karma: 1807784
Join Date: Jun 2011
Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Libra Color, Nxtpaper 11
|
@jbacelar: Thanks, but no, that is not the solution. A combination of '*' (zero or more) and '?' (zero or one) together does not make sense here.
I just tried my original regex within calibre 1.31 and it works as intended. However, it does not seem to work in calibre v1.32, which is odd. |
04-16-2014, 08:37 AM | #4 |
Zealot
Posts: 142
Karma: 669192
Join Date: Nov 2013
Device: Kindle 4.1.1 no touch
|
I don't get what you want to expres... Could you post some examples?
|
04-16-2014, 08:46 AM | #5 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
* - means match zero or more matching as many as possible. i.e. be greedy
*? - means match zero or more matching as few as necessary, i.e. dont be greedy |
Advert | |
|
04-16-2014, 12:04 PM | #6 |
Zealot
Posts: 142
Karma: 669192
Join Date: Nov 2013
Device: Kindle 4.1.1 no touch
|
If I understood correct, he's saying that
Code:
<span(.*)>(.*)</span> Code:
<span>something<p><crlf> |
04-16-2014, 12:08 PM | #7 |
Bookish
Posts: 969
Karma: 1807784
Join Date: Jun 2011
Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Libra Color, Nxtpaper 11
|
I checked and it appears the text is/was very bad formatted in which </span> is sometimes missing, thus causing the weird behavior by selecting unexpectedly large sections. And that my original regex did worked in the past for other texts may just be pure luck
Oh well, at least I did learned a new regex trick and refreshed my knowledge about "Lazy quantifiers" versus "Greedy quantifiers". Thanks all! Last edited by DrChiper; 04-16-2014 at 12:11 PM. |
04-16-2014, 01:24 PM | #8 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Just remember; using regex to remove only certain span|div tags will usually blow up in your face wherever spans|divs are nested.
|
04-17-2014, 04:14 AM | #9 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
The modify epub plugin is going in interesting places regarding junk spans. Might be worh a look...
Using a negative lookahead you can avoid nesting issues, there is an example by me in that thread. Last edited by eschwartz; 04-17-2014 at 04:17 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Paperwhite behaving strangely | route66 | Amazon Kindle | 6 | 03-25-2013 01:42 AM |
Authors Behaving Badly | mr ploppy | Writers' Corner | 2 | 08-27-2012 06:55 PM |
PRS-300 Reader behaving badly! | docusk | Sony Reader | 14 | 04-03-2012 07:46 PM |
PRS-300 PRS-300 behaving badly | docusk | Sony Reader | 2 | 03-22-2012 12:46 PM |
Kindle behaving oddly after reset | ficbot | Amazon Kindle | 2 | 09-11-2010 03:10 PM |