Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-09-2023, 04:49 PM   #1
Vanguard3000
Groupie
Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 474196
Join Date: Jan 2011
Location: Ottawa
Device: Kobo Aura H2O
Regex help: Find instances spanning several paragraphs

I've come across this on several occasions and was curious if there's a good regex for it. A non-specific typical example would be to select an entire <div> or <blockquote>, which could contain any number of paragraphs, such as:

Code:
<div class="foo">
<p>aaa</p>
<p>bbb</p>
[...]
<p>ccc</p>
</div>
Currently, I'd need to do something like:

Code:
<div class="foo">    <p>(.*?)</p>    </div>
then:
Code:
<div class="foo">    <p>(.*?)</p>    <p>(.*?)</p>    </div>
and so on. Ideally I'd like to do:
Code:
<div class="foo">(.*?)</div>
but it doesn't work, I assume due to too much whitespace, returns, etc.

So, I feel like one of the regex settings ought to allow searches to skip whitespace or whatever and this is probably an easy fix, but I'm not sure what it might be.

Also, for the record, I do have the TagMechanic plugin which helps in most situations like this, but in some cases it would be nice for me to be able to iterate through all instances with a regular F&R process.
Vanguard3000 is offline   Reply With Quote
Old 05-10-2023, 07:14 AM   #2
The_book
Zealot
The_book began at the beginning.
 
Posts: 100
Karma: 10
Join Date: Aug 2019
Device: none
sigil regex has opinion 'dot all' for let dot match all characters, including '\n'. With this opinion you can search across lines.
But use `<div class="foo">(.*?)</div>` is not a good idea, for there are situations like `div ... div ... /div ... /div`
The_book is offline   Reply With Quote
Advert
Old 05-10-2023, 07:21 AM   #3
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,115
Karma: 18727091
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Yes, the dot all option works nicely in sigil.

Find: <div class="foo">\s*(.*?)\s*</div>

I add the \s* to get any space between the div tags and the paragraphs...I don't want them replicated in addition to whatever spacing I add in the Replace: line.

As The_book mentioned, be aware of nested div's...this will capture anything inside your foo div.
Turtle91 is offline   Reply With Quote
Old 05-10-2023, 10:13 AM   #4
Vanguard3000
Groupie
Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 474196
Join Date: Jan 2011
Location: Ottawa
Device: Kobo Aura H2O
Thanks, the dot all options seems to be what I was looking for. I think I had those regex settings largely the way they were by trial an error after getting some wierdness finding two <i></i> blocks in the same paragraph (though that's probably a minimal match issue).

As I mentioned, with regexes encompassing larger potential hits like this I typically iterate through them one at a time since as you say there is a lot of room for error. So between the hammer (TagMechanic) and the scalpel (dotall search) I think I should have my bases covered.

Thanks to you both!
Vanguard3000 is offline   Reply With Quote
Old 05-10-2023, 10:22 AM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,577
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Vanguard3000 View Post
So between the hammer (TagMechanic) and the scalpel (dotall search) I think I should have my bases covered.
As the author of TagMechanic, it is of course my opinion that you have those labels reversed!

But I'm not mad.
DiapDealer is offline   Reply With Quote
Advert
Old 05-10-2023, 11:03 AM   #6
Vanguard3000
Groupie
Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.Vanguard3000 ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 474196
Join Date: Jan 2011
Location: Ottawa
Device: Kobo Aura H2O
How so? Unless I'm overlooking a feature, the only way to limit what it finds is by selecting fewer files, right? So if I wanted to (for example) change all <div class="foo"></div> to blockquotes, it's all-or-nothing, at least within the files I've selected?

Whereas, with a dotall search I can do a find without replacing to be taken to a hit, replace, and immediately verify the results. I can also add to the regex if I want to do something like altering the line immediately below the div block.

Don't get me wrong - your TagMechanic plugin has been a massive help for me! I've recently embarked on a massive library overhaul and it's expedited this process immensely.
Vanguard3000 is offline   Reply With Quote
Old 05-10-2023, 01:46 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,577
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
TagMechanic parses html in order to edit, delete, and modify tags. It knows which closing tag goes with which opening tag so that it can make correct edits to snarls of nested tags. It will not get confused in nested situations. Nor will it get "greedy" as regex is prone to do. TagMechanic is a subtle, HTML aware tool that can be used with precision to change, cleanup or delete html tags. Regex is blunt instrument that knows nothing about the markup it is trying to match/replace.

I like regex--use it all the time. But that doesn't change the fact that turning it loose on the kind of xhtml that needs to be properly parsed to be safely edited is like throwing a bag of hammers at a box of nails and a pile lumber and hoping a chair gets made.

I learned a long time ago to use regex where it makes sense (and there's tons of places it does). But don't try to use it to parse markup. It will eventually let you down.

I'm not saying that regex might not be the better choice for your situation. I'm just saying that a precision tool used to make smart, safe, changes to convoluted/nested html cannot be accurately described as a "hammer."
DiapDealer is offline   Reply With Quote
Reply

Tags
find & replace, regex


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Find and replace first x instances of each word from a list of words Yadang Editor 0 12-30-2020 02:59 PM
New to regex find and replace! ksimpson1986 Sigil 2 11-06-2016 11:29 AM
Regex find and replace SanatyrZeo Sigil 5 10-29-2012 07:03 AM
epub to epub conversion problem with regex spanning multiple input files ctop Conversion 2 02-12-2012 01:56 AM
REGEX find and replace help please potestus Sigil 13 09-18-2010 04:14 PM


All times are GMT -4. The time now is 05:55 AM.


MobileRead.com is a privately owned, operated and funded community.