06-18-2016, 05:00 AM | #1 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
function mode: surprising count
Hi
I tried some functions within an EPUB. One which is included in the help file (hyphen) and another which was custom-built for me by a MR Spanish friend. The two are working very fine and are quite helpful. However, they seem to suffer from the same small defect: when the function has been done, the Calibre editor displays a small window announcing the job done, and offers to show me where the replacements did take place. It also announces the number of occurrences replaced. This is here where the "error" appears. I found each time for each function that the reported number of occurrences was wrong. In both cases, it looks like the counting is out of touch with reality. - For the hyphens, it reported over 2000 occurrences replaced when it really replaced about 30 (the text had been proof-read and I did introduce voluntary mistakes). - For the other function, it consistently reported 1386 occurrences (even if the search was expanded), but when I checked later with a plain regex it did in fact replace 1802 occurrences. Last edited by roger64; 06-18-2016 at 05:10 AM. |
06-18-2016, 05:17 AM | #2 |
creator of calibre
Posts: 44,529
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I need an example file and find expression and function with which to reproduce the issue.
|
Advert | |
|
06-18-2016, 05:19 AM | #3 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
OK I'll do that but maybe tomorrow. Sorry
|
06-18-2016, 07:45 AM | #4 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
I had to prepare it for another book. I join a reduced version of a French book. As you can check first, it contains now: - 0 nnbsp (\u202F) - 371 hyphen-minus (-) + 10 wrong ones set in chapter2: total: 381 1. - The first function is about nnbsp search: Code:
>[^\n<]*?< Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return match.group().replace("'","’").replace(">— ",">—@").replace(">—",">—@").replace(" !","@!").replace("!","@!").replace(" ?","@?").replace("?","@?").replace(" ;","@;").replace(";","@;").replace(" :","@:").replace(":","@:").replace("« ","«@").replace("«","«@").replace(" »","@»").replace("»","@»").replace("@@","@").replace("@","\u202f") But if you check with a regex there are now 650 nnbsp. 2. - The second function is hyphen (coupled with French dictionary) search: Code:
>[^<>]+< Code:
import regex from calibre import replace_entities from calibre import prepare_string_for_xml def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): def replace_word(wmatch): # Try to remove the hyphen and replace the words if the resulting # hyphen free word is recognized by the dictionary without_hyphen = wmatch.group(1) + wmatch.group(2) if dictionaries.recognized(without_hyphen): return without_hyphen return wmatch.group() # Search for words split by a hyphen text = replace_entities(match.group()[1:-1]) # Handle HTML entities like & corrected = regex.sub(r'(\w+)\s*-\s*(\w+)', replace_word, text, flags=regex.VERSION1 | regex.UNICODE) return '>%s<' % prepare_string_for_xml(corrected) # Put back required entities But only 15 were done: the 10 wrong in chapter 2 have been corrected and the following 5: frou-frou(2), en-tête, par-dessus, porte-manteaux. These last five should not have been corrected but this is another story. Last edited by roger64; 06-18-2016 at 10:17 AM. |
06-18-2016, 07:55 AM | #5 |
creator of calibre
Posts: 44,529
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Ah I see your source of confusion. The number reported is the number of locations in the text that matched the search expression, not the number of replacements. There is no way to know what replacement means in function mode.
So if your search expression matches 10 places int he text and in each place your function does 2 replacements inside the matched text, the number reported will be 10, not 20. Even in a normal search and replace, for example if you search for a and replace with a The number reported will be some thousands, but the number of actual replacements will be zero, because replacing a by a does not actually change anything. |
Advert | |
|
06-18-2016, 10:24 AM | #6 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your explanation. Let's say that I have been confused by a confusing report
What's the purpose of reporting such a - huge - number if it's useless? I understand that if I replace a with a, I will get an absurd result to an absurd question. But this is not the same as here. Furthermore "remplacé 1 occurrence" can just be understood that one occurrence has been replaced. And, if you deal with large numbers, there is no way to check easily that the reporting figure does not relate at all to the effective amount of replacements. Last edited by roger64; 06-18-2016 at 10:38 AM. Reason: edit done now! ;) |
06-18-2016, 10:53 AM | #7 |
creator of calibre
Posts: 44,529
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Answer me this:
If you replace abc with xby have you made one replacement or two? i.e. have you replaced a with x and c with y or have you replaced abc with xby? Now look back at what I said, it is not possible to know how many replacements are performed in function mode and the number is the number of matches of the search expression. You should be able to connect the rest of the dots yourself. |
06-18-2016, 03:12 PM | #8 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
I hope I get it right this time: as far as functions are concerned, displayed search matches figures like the one in the screenshot above bear no relation to the number of replacements that the user is looking for using this function.
Thank you for providing such a powerful tool. Last edited by roger64; 06-18-2016 at 03:14 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Missing "function mode" for editor's Search & Replace | atux | Editor | 3 | 01-17-2016 09:40 AM |
Function Mode - Not a feature request | phossler | Editor | 18 | 11-28-2014 06:38 PM |
Function mode: Feature Request | jbacelar | Editor | 2 | 11-22-2014 03:36 AM |
Function mode in editor S&R -- coming soon | eschwartz | Editor | 12 | 11-21-2014 09:26 AM |
Error in function mode in editor S&R | jbacelar | Editor | 3 | 11-21-2014 06:34 AM |