![]() |
#721 |
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Mar 2019
Location: Slovenia
Device: PocketBoot Inkpad 3
|
Any idea on how to capture uppercase words with special diacritic characters, like Ū Ṃ Ḥ Ū etc.?
I tried the following, but it doesn't work. I want to capture uppercase words with 2 or more characters. Code:
([[:upper:]]{2,}) |
![]() |
![]() |
![]() |
#723 |
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Mar 2019
Location: Slovenia
Device: PocketBoot Inkpad 3
|
@BeckyEbook, thank you!
|
![]() |
![]() |
![]() |
#724 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27,660
Karma: 195154104
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Also remember that \p{Lu} and \p{Ll} can be used to match any uppercase (and consequently, lowercase) letter in any language without requiring the *UCP switch (in Sigil's PCRE regex engine).
\p{L} matches any letter (Unicode or otherwise) and \P{L} matches anything NOT a letter. So (\p{Lu}{2,}) should theoretically do the same thing (not near a machine to verify syntax). See the Unicode Categories section of https://www.regular-expressions.info/unicode.html for more categories. |
![]() |
![]() |
![]() |
#725 |
Connoisseur
![]() Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
oh.... wow. 49 pages over the course of ten years?! well, this Regex newbie's got a lot of reading homework, it seems.
|
![]() |
![]() |
Advert | |
|
![]() |
#726 |
Connoisseur
![]() Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Okay, after reading the <i>, <em> or <span> for italics thread from 2020, and then reading the Extended <head> chapter: NOT necessary? 2017 thread linked therein [and paying particular attention to Tex2002ans posting about the underlying purposes for <em> and <i> <em>therein</em> (
![]()
I've figured out that Code:
<span class="italics">([^>]+)</span> I'm happy to do the legwork and the trial-and-error to learn what works. I guess my search skills also need an update, too, because the results I am turning up don't seem to work for me. ![]() [edit] Okay, I THINK I found it, but it was hit-or miss, because it seemed that everything was for Javascript/C##/VB.net/PHP/ruby/etc. ![]() Code:
<em>\g<1></em> ![]() Okay, next question: is this a kludge and there's a better way? or is this correct? Thanks, y'all! [/edit] Last edited by CubGeek; 08-18-2022 at 03:22 PM. |
![]() |
![]() |
![]() |
#727 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,123
Karma: 18727091
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
That's pretty advanced stuff!
I go pretty easy...and it seems to work so far... find: <i>(.*?)</i> replace: <em>\1</em> or find: <span class="italics>(.*?)</span> replace: <em>\1</em> etc. |
![]() |
![]() |
![]() |
#728 | |
Connoisseur
![]() Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#729 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() ![]() The easiest way to do it is to use DiapDealer's fantastic "TagMechanic" plugin. I explained how to install Sigil plugins in this 2021 post. And I gave step-by-step instructions on how to use TagMechanic here: That will help mass convert your <span class="italics"> -> <i> or <em>. It will be much safer than trying to use Regular Expressions, because regex can't safely handle complicated cases of <span>s inside of <span>s. Quote:
Replace: <i>\1</i> You see the parentheses you wrapped around your stuff? That's called a "Capture Group". Explanation of the Find Let's break it down into each piece:
It's saying:
Now when you're Replacing, you can use \1 to get "Group #1". Explanation of the Replace
- - - Side Note: If you have more complicated regex, you can get up to 9 capture groups! \1, \2, \3, [...], \9 But at that point, it's probably smarter to split your search/replaces into smaller pieces. - - - Side Note #2: If you want some more Regex tricks, I just wrote a post a few months ago here: which linked to some of my other posts over the years. I break down + color-coordinate many of the ones I use. ![]() Quote:
Easier/Safer to use Tag Mechanic though. :P Quote:
![]() where I explained differences between <i> + <em> even further. ![]() Last edited by Tex2002ans; 08-18-2022 at 11:12 PM. |
||||
![]() |
![]() |
![]() |
#730 | ||||
Connoisseur
![]() Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Quote:
Quote:
![]() ![]() ![]() Quote:
![]() Quote:
![]() ![]() |
||||
![]() |
![]() |
![]() |
#731 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Code:
<p class="normal"><span class="normal">This is an <span class="italics">example</span>.<sup><span class="tiny">1</span></sup></span></p> Regular Expressions would get completely confused with the 3 different </span>s, where TagMechanic would be able to figure out which </span> connects with which one. ![]() Of course, with clean code, this wouldn't be a problem, but in real life there's always these crazy examples that creep up... and it comes to bite you in the butt later when you already accidentally did a "Replace All" 3 hours ago! ![]() Quote:
![]() ![]() You can also use those in FINDs as well! For example, one of the tricks I use is: Double Word Check Find: (\b[a-z]+) (\1\b) Replace: \1 This grabs a lowercase word + looks for it again:
How does it work? It uses a few tricks:
Shove all that in GROUP 1.
Shove all that in GROUP 2. Now, when you replace, you're only replacing with GROUP 1, meaning that duplicated word never makes it:
![]() - - - Usage Note: You do have to be careful of false positives though, so NEVER do a "Replace All". Always do a one-by-one check. There shouldn't ever be too many "doubles" within your book, but they're an extremely common typo that's very hard to catch. (Usually the human brain just skips right over them.) - - - Quote:
Glad to see someone benefited from all those in-depth discussions. ![]() Last edited by Tex2002ans; 08-19-2022 at 02:12 PM. |
|||
![]() |
![]() |
![]() |
#732 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 74,913
Karma: 131375774
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Use <i> and <b> and forget <em> and <strong> ever existed.
|
![]() |
![]() |
![]() |
#733 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27,660
Karma: 195154104
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Drop it Jon. Your preferences are not really relevant to the conversation at hand.
|
![]() |
![]() |
![]() |
#734 |
Connoisseur
![]() Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
After reading threads that spanned (ha! <span>ned!
![]() |
![]() |
![]() |
![]() |
#735 | |
Connoisseur
![]() Posts: 52
Karma: 10
Join Date: Sep 2021
Location: Upstate NY, USA
Device: iPad Pro, Kindle basic
|
Quote:
So, if my learning how to properly show varying types of emphasis to help convey nuances for someone who's relying on a screen-reader or similar (on the very infinitesimal chance they access something that I put together) then it was time well-spent. ![]() |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Examples of Subgroups | emonti8384 | Lounge | 32 | 02-26-2011 06:00 PM |
Accessories Pen examples | Gunnerp245 | enTourage Archive | 15 | 02-21-2011 03:23 PM |
Stylesheet examples? | Skitzman69 | Sigil | 15 | 09-24-2010 08:24 PM |
Examples | kafkaesque1978 | iRiver Story | 1 | 07-26-2010 03:49 PM |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |