Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 10-07-2024, 01:48 PM   #1
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Regex: match opening div?

Is there a way using regex to match an opening and closing div?

For example, you have "<div class="whatever"> that you'd like to delete, along with the closing </div>, but there are other divs before this one's closing </div> (i.e., nested divs). <div class="whatever">(*.?)</div> will find the targeted opening div, but will match the first closing </div> it finds rather than the one associated with the opening.

You could deal with this by using Diaps toolbox, but I'm curious if there's a way to do it with pure regex?
foosion is offline   Reply With Quote
Old 10-12-2024, 01:07 AM   #2
Biblos
Junior Member
Biblos began at the beginning.
 
Biblos's Avatar
 
Posts: 1
Karma: 10
Join Date: Apr 2024
Device: Paperwhite
What I understand is that you want to remove the <div class=“whatever”> tags and their </div> tag (and only these 2 tags) with just one regex. Is that it?
Here's a regex that does it:
https://regex101.com/r/NDUBOz/1

With only one regex, it's complicated. The regex above uses the recursion of capture group 2 (?2)) to traverse the nested tags starting from the div tag with class "whatever". Group 3, which contains the recursion, is atomic (?>) to avoid catastrophic backtracking. The next step will be to beautiful again the files.

A more elegant solution would have been to write a regex-function: the regex selects the group of divs and passes the selection to a Python function that already has the necessary functions to match the </div>.
Biblos is offline   Reply With Quote
Advert
Old 10-12-2024, 01:10 AM   #3
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 40,549
Karma: 157444380
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
The other (and safer, IMHO) option is to use Diag's Editing Toolbag. This traverses the html code to allow safely removing divs and spans with nesting without the chances of munging your code with regex.
DNSB is offline   Reply With Quote
Old 10-12-2024, 01:30 AM   #4
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,557
Karma: 14325282
Join Date: Nov 2019
Device: none
What DNSB said. There's also a version of it for Sigil as well as the Calibre editor.

I have a Kobo and download kindle books from amazon and use Calibre to convert them to EPUB. When it converts a MOBI to EPUB many of them end up using divs for the paragraphs instead of p tags. I use Diap's gizmo in Sigil to convert them to p tags.
hobnail is offline   Reply With Quote
Old 10-12-2024, 07:43 AM   #5
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 148
Karma: 1451628
Join Date: Jul 2021
Device: N/A
Nice regex, Biblos!

As everyone said (including the OP), it's easier to do that with the Diap's Editing Toolbag, even more since the recent v. 0.5.0 (no need anymore to enter into the code to add some tags not predefined in the list).

But, as Biblos said, his regex may have a great utility when one wants to pass the whole expression inclosed in the targeted div to a regex-function, so one can make some treatments into the text, but only inside specific classes of specific tags.

Example : remove italics in the text of the footnotes, transform bold into italic in the legend of the pictures, transform <i> to <i class="quotes"> in some author's quotes, or put the name of this author in small caps (or transform a word into another word) but only in those quotes, etc.

I even think that this regex should go in the sticky post "Saved Search/Regex Functions"

Last edited by lomkiri; 10-12-2024 at 07:49 AM.
lomkiri is offline   Reply With Quote
Advert
Old 10-12-2024, 08:23 AM   #6
dearleuk
Member
dearleuk began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2008
Device: sony
Can someone please point me in the direction for Diag's Editing Toolbag for Sigil
dearleuk is offline   Reply With Quote
Old 10-12-2024, 08:55 AM   #7
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by dearleuk View Post
Can someone please point me in the direction for Diag's Editing Toolbag for Sigil
https://www.mobileread.com/forums/sh...d.php?t=270639
foosion is offline   Reply With Quote
Old 10-12-2024, 08:58 AM   #8
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by Biblos View Post
What I understand is that you want to remove the <div class=“whatever”> tags and their </div> tag (and only these 2 tags) with just one regex. Is that it?
Here's a regex that does it:
https://regex101.com/r/NDUBOz/1

With only one regex, it's complicated. The regex above uses the recursion of capture group 2 (?2)) to traverse the nested tags starting from the div tag with class "whatever". Group 3, which contains the recursion, is atomic (?>) to avoid catastrophic backtracking. The next step will be to beautiful again the files.

A more elegant solution would have been to write a regex-function: the regex selects the group of divs and passes the selection to a Python function that already has the necessary functions to match the </div>.
Very impressive! I didn't think it was possible.
foosion is offline   Reply With Quote
Old 10-12-2024, 09:01 AM   #9
dearleuk
Member
dearleuk began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2008
Device: sony
Quote:
Originally Posted by foosion View Post
Thanks very much
dearleuk is offline   Reply With Quote
Old 10-12-2024, 09:02 AM   #10
foosion
Evangelist
foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.foosion is an enigma wrapped up in a mystery.
 
Posts: 449
Karma: 41524
Join Date: Sep 2011
Device: Kobo Libra 2 & Clara BW
Quote:
Originally Posted by DNSB View Post
The other (and safer, IMHO) option is to use Diag's Editing Toolbag. This traverses the html code to allow safely removing divs and spans with nesting without the chances of munging your code with regex.
For others, there's a general discussion of regex and Diap's tools for Sigil and Calibre editor for nested divs and other purposes, see https://www.mobileread.com/forums/sh...65#post4457665 especially after the first few posts.
foosion is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Possible RegEx error in Sigil (minimal match being ignored) KarlG Sigil 7 08-18-2024 03:40 PM
match an empty line with a regex? lumpynose Sigil 5 05-29-2019 04:03 AM
REGEX match everything before # JLius ePub 2 01-08-2017 05:25 PM
[Regex Search] Minimal match not possible? nqk Editor 7 12-24-2014 04:19 AM
Need help with a conversion regex - can't match newline ereader123 Calibre 2 03-29-2010 11:58 AM


All times are GMT -4. The time now is 08:06 AM.


MobileRead.com is a privately owned, operated and funded community.