Saved Search/Regex Functions - Page 4

DyckBook · 10-19-2021, 04:02 PM

Quote:

Originally Posted by kovidgoyal

That's because builtin function are not simple standalone functions you can learn from, they use other calibre code, however if you want to see them look in function_replace.py in the calibre source code.

Thanks Kovid, I'll work on that.

greenskye · 04-02-2022, 02:10 PM

I'm looking for a method to convert numbers that use the european comma separated format (ex. 1.000,95) to the US version (ex 1,000.95)

Is this achievable with regex or via a search function?

lomkiri · 04-02-2022, 05:28 PM

Assuming all numbers are in european format (no one in US format):

Code:

find:
\d[,.\d]{2,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 
    return match.group(0).replace('.', '§').replace(',', '.').replace('§',',')

Warning: if you have a mixture of numbers in both formats (US and european), they will be switched. In that case, you'll have to refine the selection to catch only european ones.

Note: all number with 3 or more positions will be catched (e.g. 1,2, or 1.2). If you want to be more selective, change "{2,} for what you want minus 1, e.g. {4,} if you want to catch starting from 5 positions (1.000 or 12,45)

Note: Integers as 100 or 234000 will be catched, but they won't be transformed.

Warning : numbers followed by 3 dots will be wrongly transformed : "They were 20..." will give "They were 20,,,"
It's wise to change them to ellipsis (…) prior to apply the conversion:
(\d)\.{3} ==> \1\u2026

greenskye · 04-04-2022, 02:20 PM

Spoiler:

Thanks so much, it worked great!

I ended up using

Code:

find 1: (\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])

find 2: \$(\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])

The updated regex fixed the problem with trailing "." matches. I did a replace all with "find 1", and then ran it again with "find 2" to revert any US currencies accidentally caught. Couldn't figure out how to exclude them in first place (kept matching part of the currency number)

lomkiri · 04-04-2022, 05:55 PM

Quote:

Originally Posted by greenskye

The updated regex fixed the problem with trailing "." matches

Mmmh, yes, of course, I forgot this case :-/. Good you thought about it :-)

Quote:

I did a replace all with "find 1", and then ran it again with "find 2" to revert any US currencies accidentally caught. Couldn't figure out how to exclude them in first place (kept matching part of the currency number)

This one seems to be ok to exclude groups beginning with $ (and not selecting inside tags) :

Code:

(\$(?:\d{1,3}[.,])+)(*SKIP)(*F)|(<[^<>]*)(*SKIP)(*F)|(?:\d{1,3}[.,])+\d{1,}

Another way would have been to catch the currency in the regex, then it's easy to make the selection inside the function:

Code:

\$?(\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    m = match.group(0)
    if m[0] == '$':
        return m
    else:        
        return m.replace('.', '§').replace(',', '.').replace('§',',')

mobilis · 09-12-2022, 01:34 AM

I am making an ebook from saved and pdfunite'd pdf pages, and there are scads of things like this:

13/72

14/93

I want to remove.

How can I?

theducks · 09-12-2022, 09:33 AM

Quote:

Originally Posted by mobilis

I am making an ebook from saved and pdfunite'd pdf pages, and there are scads of things like this:

13/72

14/93

I want to remove.

How can I?

REGEX is your buddy (

there are a few REGEX tutorials here are MR. That is how I learned. BTW Calibre use PCRE flavor of REGEX)

Code:

<p class="calibre1">\d+\/\d+</p>

\d+ says 1 or more digits together match
\/ is just an escaped / (might not be needed, but dos not hurt)
'escaped' items remove their special meaning and treat them as they LOOK
I left the rest to only be an 'exact match' to be a trigger.

eg The cup was 3/4 full. would not match.

mobilis · 09-14-2022, 02:56 AM

Quote:

Originally Posted by theducks

REGEX is your buddy (

there are a few REGEX tutorials here are MR. That is how I learned. BTW Calibre use PCRE flavor of REGEX)

Code:

<p class="calibre1">\d+\/\d+</p>

\d+ says 1 or more digits together match
\/ is just an escaped / (might not be needed, but dos not hurt)
'escaped' items remove their special meaning and treat them as they LOOK
I left the rest to only be an 'exact match' to be a trigger.

eg The cup was 3/4 full. would not match.

THANK YOU!!

alekseiminko · 02-19-2023, 06:57 PM

where is it possible to make an autocorrect in caliber when reading with a voice that she did not read the article (abbreviated), but the article?

alekseiminko · 02-20-2023, 02:58 PM

classical substitutions (simple replacement of one line with another), or the use of regular expressions (RegExp) and the emphasis when reading by voice and the expansion of abbreviations when reading by voice, for example vs-versus

JSWolf · 02-20-2023, 03:11 PM

Quote:

Originally Posted by alekseiminko

where is it possible to make an autocorrect in caliber when reading with a voice that she did not read the article (abbreviated), but the article?

You already asked this in another thread.

alekseiminko · 02-20-2023, 04:37 PM

please drop the link

DNSB · 02-20-2023, 07:18 PM

Quote:

Originally Posted by alekseiminko

classical substitutions (simple replacement of one line with another), or the use of regular expressions (RegExp) and the emphasis when reading by voice and the expansion of abbreviations when reading by voice, for example vs-versus

You would have to edit the book to make those changes such as versus for vs. As for changing emphasis, good luck with that. There are good reasons that most authors prefer using people to create audiobooks since even the best of the current automated readers are not all that great.

alekseiminko · 02-20-2023, 08:03 PM

in the Librera program on Android, there is such a function in voice reading, text-to-speech substitution is used to change the way the engine pronounces certain words, to skip certain characters when reading or to set the correct stress marks.

kovidgoyal · 02-20-2023, 09:56 PM

No there is no such function.

04-02-2022, 05:28 PM	#48
lomkiri Groupie Posts: 167 Karma: 1497966 Join Date: Jul 2021 Device: N/A	Assuming all numbers are in european format (no one in US format): Code: find: \d[,.\d]{2,}(?![^<>{}][>}]) function: def replace(match, number, file_name, metadata, dictionaries, data, functions, args, *kwargs): return match.group(0).replace('.', '§').replace(',', '.').replace('§',',') Warning: if you have a mixture of numbers in both formats (US and european), they will be switched. In that case, you'll have to refine the selection to catch only european ones. Note: all number with 3 or more positions will be catched (e.g. 1,2, or 1.2). If you want to be more selective, change "{2,} for what you want minus 1, e.g. {4,} if you want to catch starting from 5 positions (1.000 or 12,45) Note: Integers as 100 or 234000 will be catched, but they won't be transformed. Warning : numbers followed by 3 dots will be wrongly transformed : "They were 20..." will give "They were 20,,," It's wise to change them to ellipsis (…) prior to apply the conversion: (\d)\.{3} ==> \1\u2026 Last edited by lomkiri; 04-03-2022 at 01:57 PM.*

09-12-2022, 01:34 AM	#51
mobilis drowned in old books Posts: 42 Karma: 62 Join Date: May 2012 Location: United States Device: Kindle Paperwhite	I am making an ebook from saved and pdfunite'd pdf pages, and there are scads of things like this: <p class="calibre1">13/72</p> <p class="calibre1">14/93</p> I want to remove. How can I?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
About saved searches and regex	Carpatos	Editor	22	09-30-2020 10:56 PM
Regex-Functions - getting user input	CalibUser	Editor	8	09-09-2020 04:26 AM
Difference in Manual Search and Saved Search	phossler	Editor	4	10-04-2015 12:17 PM
Help - Learning to use Regex Functions	weberr	Editor	1	06-13-2015 01:59 AM
Limit on length of saved regex?	ElMiko	Sigil	0	06-30-2013 03:32 PM

04-02-2022, 02:10 PM	#47
greenskye Member Posts: 16 Karma: 10 Join Date: Feb 2010 Device: none	I'm looking for a method to convert numbers that use the european comma separated format (ex. 1.000,95) to the US version (ex 1,000.95) Is this achievable with regex or via a search function?

02-19-2023, 06:57 PM	#54
alekseiminko Member Posts: 10 Karma: 10 Join Date: Jan 2020 Device: laptop	where is it possible to make an autocorrect in caliber when reading with a voice that she did not read the article (abbreviated), but the article?

02-20-2023, 02:58 PM	#55
alekseiminko Member Posts: 10 Karma: 10 Join Date: Jan 2020 Device: laptop	classical substitutions (simple replacement of one line with another), or the use of regular expressions (RegExp) and the emphasis when reading by voice and the expansion of abbreviations when reading by voice, for example vs-versus

02-20-2023, 04:37 PM	#57
alekseiminko Member Posts: 10 Karma: 10 Join Date: Jan 2020 Device: laptop	please drop the link

02-20-2023, 08:03 PM	#59
alekseiminko Member Posts: 10 Karma: 10 Join Date: Jan 2020 Device: laptop	in the Librera program on Android, there is such a function in voice reading, text-to-speech substitution is used to change the way the engine pronounces certain words, to skip certain characters when reading or to set the correct stress marks.

02-20-2023, 09:56 PM	#60
kovidgoyal creator of calibre Posts: 45,304 Karma: 27111242 Join Date: Oct 2006 Location: Mumbai, India Device: Various	No there is no such function.

Advert

Advert