08-21-2023, 06:18 AM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
Search for unicode character (ranges)
Hello,
i've got an epub with multiple weird unicode characters and wanted to use a regex to get rid of it in the epub editor. Example of text in the epub: 𝒷.𝓬𝑶𝐦 According to my research these character should represent the following unicode characters: \u1D4B7 \u002E \u1D4EC \u1D476 \u1D426 But no matter what I try, the search function never matches those characters. I've opened a text file within the epub editor, put "\u1D4B7" into the search part and changed the modus to "Regex". When searching, nothing is found. If I search for "[\u1D400-\u1D4FF]", then all normal characters are listed as match (a-zA-Z). What is the logic behind this? My intention was, to search for something like this:"[\u1D400-\u1D4FF\u002E]{4,20}" and replace it with nothing. Can you please give me a hint, how to accomplish this? Regards Azraelo |
08-21-2023, 07:22 AM | #2 |
Evangelist
Posts: 498
Karma: 2267928
Join Date: Nov 2015
Device: none
|
Try \U instead.
|
Advert | |
|
08-21-2023, 09:43 AM | #3 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
If i try "\U1D4B7", then I get the error message:
calibre, version 6.21.0 FEHLER: Ungültige RegEx: <p>Der reguläre Ausdruck, den Sie eingegeben haben, ist ungültig: <pre>\U1D4B7</pre> mit Fehler: incomplete escape \U1D4B7 at position 2 So this doesn't really help |
08-21-2023, 02:37 PM | #4 |
Bibliophagist
Posts: 39,837
Karma: 154147706
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Did you try escaping the \? i.e. \\U1D4B7 since most regex implementations use the backslash as a special character. Though I'm not sure if calibre allows use of that format for a Unicode character.
Alternatively, just copy/paste the bscr ( �� ) into the search box. Interesting while I can see the character while entering the message, it does not show when I post the message and if I quote the OP's message it still doesn't show. Last edited by DNSB; 08-21-2023 at 02:43 PM. |
08-21-2023, 04:59 PM | #5 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
|
Advert | |
|
08-21-2023, 05:01 PM | #6 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
|
08-21-2023, 05:03 PM | #7 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
If I just quote it's OK
If I copy and paste it makes entire post bad? 𝒷.𝓬𝑶𝐦 OK in preview |
08-21-2023, 05:04 PM | #8 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Baffling!
. . |
08-21-2023, 05:05 PM | #9 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Baffling!
𝒷.𝓬𝑶𝐦 𝒷.𝓬𝑶𝐦 This time "Go Advanced" |
08-21-2023, 05:06 PM | #10 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
So it's messed up unless you "Go Advanced" to see preview.
A site bug. |
08-21-2023, 05:28 PM | #11 |
Wizard
Posts: 1,336
Karma: 6700864
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
It is strange how searching for "Mathematical Script Small B" \u1D4B7 won't work.
But \u00A0 for a No Break Space will work. (Calibre Win10) |
08-21-2023, 05:41 PM | #12 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
Unfortunately searching for "\\U1D4B7" didn't provide any results.
I have created this ebook from a website which randomly adds its domain into the text and relaces the characters with similar looking unicode characters. This is also done randomly, so the combination is never the same. So just copying those characters directly for searching won't work for for more than one occurence. Due to this I want to use the unicode range to catch those strange unicode characters using regex to remove them all at once. |
08-21-2023, 05:50 PM | #13 |
the rook, bossing Never.
Posts: 12,274
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Spelling Check?
|
08-21-2023, 06:23 PM | #14 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jun 2023
Device: Kobo Clara HD
|
Nice idea, didn't think of this yet.
Unfortunately, there are hundreds of combinations of the url and the spelling check doesn't seem to provide a batch edit function for similar words. If I wanted to use the editor, I would still have to hundreds of entries manually. |
08-21-2023, 06:54 PM | #15 | |
null operator (he/him)
Posts: 20,955
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
pdf to epub regex unicode character match not working | marcio_oliveira | Conversion | 2 | 09-11-2021 03:16 PM |
Aura Supported Unicode ranges | kuvera | Kobo Reader | 3 | 06-12-2015 04:44 PM |
Can't match Unicode character | atordo | Recipes | 2 | 06-15-2012 03:20 PM |
Problem with Unicode Character 'Word Joiner' (U+2060) | psztk | Conversion | 0 | 10-14-2011 01:18 PM |
Glyph Substitution of Unicode character | vdevan | OpenInkpot | 2 | 07-18-2009 05:54 PM |