10-07-2024, 01:48 PM | #1 |
Evangelist
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
|
Regex: match opening div?
Is there a way using regex to match an opening and closing div?
For example, you have "<div class="whatever"> that you'd like to delete, along with the closing </div>, but there are other divs before this one's closing </div> (i.e., nested divs). <div class="whatever">(*.?)</div> will find the targeted opening div, but will match the first closing </div> it finds rather than the one associated with the opening. You could deal with this by using Diaps toolbox, but I'm curious if there's a way to do it with pure regex? |
10-12-2024, 01:07 AM | #2 |
Junior Member
Posts: 1
Karma: 10
Join Date: Apr 2024
Device: Paperwhite
|
What I understand is that you want to remove the <div class=“whatever”> tags and their </div> tag (and only these 2 tags) with just one regex. Is that it?
Here's a regex that does it: https://regex101.com/r/NDUBOz/1 With only one regex, it's complicated. The regex above uses the recursion of capture group 2 (?2)) to traverse the nested tags starting from the div tag with class "whatever". Group 3, which contains the recursion, is atomic (?>) to avoid catastrophic backtracking. The next step will be to beautiful again the files. A more elegant solution would have been to write a regex-function: the regex selects the group of divs and passes the selection to a Python function that already has the necessary functions to match the </div>. |
Advert | |
|
10-12-2024, 01:10 AM | #3 |
Bibliophagist
Posts: 40,549
Karma: 157444380
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
The other (and safer, IMHO) option is to use Diag's Editing Toolbag. This traverses the html code to allow safely removing divs and spans with nesting without the chances of munging your code with regex.
|
10-12-2024, 01:30 AM | #4 |
Running with scissors
Posts: 1,557
Karma: 14325282
Join Date: Nov 2019
Device: none
|
What DNSB said. There's also a version of it for Sigil as well as the Calibre editor.
I have a Kobo and download kindle books from amazon and use Calibre to convert them to EPUB. When it converts a MOBI to EPUB many of them end up using divs for the paragraphs instead of p tags. I use Diap's gizmo in Sigil to convert them to p tags. |
10-12-2024, 07:43 AM | #5 |
Zealot
Posts: 148
Karma: 1451628
Join Date: Jul 2021
Device: N/A
|
Nice regex, Biblos!
As everyone said (including the OP), it's easier to do that with the Diap's Editing Toolbag, even more since the recent v. 0.5.0 (no need anymore to enter into the code to add some tags not predefined in the list). But, as Biblos said, his regex may have a great utility when one wants to pass the whole expression inclosed in the targeted div to a regex-function, so one can make some treatments into the text, but only inside specific classes of specific tags. Example : remove italics in the text of the footnotes, transform bold into italic in the legend of the pictures, transform <i> to <i class="quotes"> in some author's quotes, or put the name of this author in small caps (or transform a word into another word) but only in those quotes, etc. I even think that this regex should go in the sticky post "Saved Search/Regex Functions" Last edited by lomkiri; 10-12-2024 at 07:49 AM. |
Advert | |
|
10-12-2024, 08:23 AM | #6 |
Member
Posts: 12
Karma: 10
Join Date: Oct 2008
Device: sony
|
Can someone please point me in the direction for Diag's Editing Toolbag for Sigil
|
10-12-2024, 08:55 AM | #7 | |
Evangelist
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
|
Quote:
|
|
10-12-2024, 08:58 AM | #8 | |
Evangelist
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
|
Quote:
|
|
10-12-2024, 09:01 AM | #9 | |
Member
Posts: 12
Karma: 10
Join Date: Oct 2008
Device: sony
|
Quote:
|
|
10-12-2024, 09:02 AM | #10 | |
Evangelist
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Possible RegEx error in Sigil (minimal match being ignored) | KarlG | Sigil | 7 | 08-18-2024 03:40 PM |
match an empty line with a regex? | lumpynose | Sigil | 5 | 05-29-2019 04:03 AM |
REGEX match everything before # | JLius | ePub | 2 | 01-08-2017 05:25 PM |
[Regex Search] Minimal match not possible? | nqk | Editor | 7 | 12-24-2014 04:19 AM |
Need help with a conversion regex - can't match newline | ereader123 | Calibre | 2 | 03-29-2010 11:58 AM |