Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 06-18-2016, 05:00 AM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
function mode: surprising count

Hi

I tried some functions within an EPUB. One which is included in the help file (hyphen) and another which was custom-built for me by a MR Spanish friend. The two are working very fine and are quite helpful.

However, they seem to suffer from the same small defect: when the function has been done, the Calibre editor displays a small window announcing the job done, and offers to show me where the replacements did take place. It also announces the number of occurrences replaced. This is here where the "error" appears.

I found each time for each function that the reported number of occurrences was wrong. In both cases, it looks like the counting is out of touch with reality.
- For the hyphens, it reported over 2000 occurrences replaced when it really replaced about 30 (the text had been proof-read and I did introduce voluntary mistakes).
- For the other function, it consistently reported 1386 occurrences (even if the search was expanded), but when I checked later with a plain regex it did in fact replace 1802 occurrences.

Last edited by roger64; 06-18-2016 at 05:10 AM.
roger64 is offline   Reply With Quote
Old 06-18-2016, 05:17 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,529
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I need an example file and find expression and function with which to reproduce the issue.
kovidgoyal is online now   Reply With Quote
Advert
Old 06-18-2016, 05:19 AM   #3
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
OK I'll do that but maybe tomorrow. Sorry
roger64 is offline   Reply With Quote
Old 06-18-2016, 07:45 AM   #4
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

I had to prepare it for another book. I join a reduced version of a French book. As you can check first, it contains now:
- 0 nnbsp (\u202F)
- 371 hyphen-minus (-) + 10 wrong ones set in chapter2: total: 381

1. - The first function is about nnbsp
search:
Code:
>[^\n<]*?<
function text:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    return match.group().replace("'","’").replace(">— ",">—@").replace(">—",">—@").replace(" !","@!").replace("!","@!").replace(" ?","@?").replace("?","@?").replace(" ;","@;").replace(";","@;").replace(" :","@:").replace(":","@:").replace("« ","«@").replace("«","«@").replace(" »","@»").replace("»","@»").replace("@@","@").replace("@","\u202f")
It announces for me: 536 replacements done
But if you check with a regex there are now 650 nnbsp.

2. - The second function is hyphen (coupled with French dictionary)

search:
Code:
>[^<>]+<
function text
Code:
import regex
from calibre import replace_entities
from calibre import prepare_string_for_xml

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):

    def replace_word(wmatch):
        # Try to remove the hyphen and replace the words if the resulting
        # hyphen free word is recognized by the dictionary
        without_hyphen = wmatch.group(1) + wmatch.group(2)
        if dictionaries.recognized(without_hyphen):
            return without_hyphen
        return wmatch.group()

    # Search for words split by a hyphen
    text = replace_entities(match.group()[1:-1])  # Handle HTML entities like &amp;
    corrected = regex.sub(r'(\w+)\s*-\s*(\w+)', replace_word, text, flags=regex.VERSION1 | regex.UNICODE)
    return '>%s<' % prepare_string_for_xml(corrected)  # Put back required entities
After it ran, it reports 996 replacements.
But only 15 were done: the 10 wrong in chapter 2 have been corrected and the following 5: frou-frou(2), en-tête, par-dessus, porte-manteaux. These last five should not have been corrected but this is another story.
Attached Files
File Type: epub Gloriette v2.epub (426.1 KB, 189 views)

Last edited by roger64; 06-18-2016 at 10:17 AM.
roger64 is offline   Reply With Quote
Old 06-18-2016, 07:55 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,529
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah I see your source of confusion. The number reported is the number of locations in the text that matched the search expression, not the number of replacements. There is no way to know what replacement means in function mode.

So if your search expression matches 10 places int he text and in each place your function does 2 replacements inside the matched text, the number reported will be 10, not 20.

Even in a normal search and replace, for example if you search for

a

and replace with

a

The number reported will be some thousands, but the number of actual replacements will be zero, because replacing a by a does not actually change anything.
kovidgoyal is online now   Reply With Quote
Advert
Old 06-18-2016, 10:24 AM   #6
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks for your explanation. Let's say that I have been confused by a confusing report

What's the purpose of reporting such a - huge - number if it's useless?

I understand that if I replace a with a, I will get an absurd result to an absurd question. But this is not the same as here. Furthermore "remplacé 1 occurrence" can just be understood that one occurrence has been replaced. And, if you deal with large numbers, there is no way to check easily that the reporting figure does not relate at all to the effective amount of replacements.
Attached Thumbnails
Click image for larger version

Name:	occurrence.png
Views:	199
Size:	14.6 KB
ID:	149474  

Last edited by roger64; 06-18-2016 at 10:38 AM. Reason: edit done now! ;)
roger64 is offline   Reply With Quote
Old 06-18-2016, 10:53 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,529
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Answer me this:

If you replace abc with xby have you made one replacement or two? i.e. have you replaced a with x and c with y or have you replaced abc with xby?

Now look back at what I said, it is not possible to know how many replacements are performed in function mode and the number is the number of matches of the search expression. You should be able to connect the rest of the dots yourself.
kovidgoyal is online now   Reply With Quote
Old 06-18-2016, 03:12 PM   #8
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
I hope I get it right this time: as far as functions are concerned, displayed search matches figures like the one in the screenshot above bear no relation to the number of replacements that the user is looking for using this function.

Thank you for providing such a powerful tool.

Last edited by roger64; 06-18-2016 at 03:14 PM.
roger64 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Missing "function mode" for editor's Search & Replace atux Editor 3 01-17-2016 09:40 AM
Function Mode - Not a feature request phossler Editor 18 11-28-2014 06:38 PM
Function mode: Feature Request jbacelar Editor 2 11-22-2014 03:36 AM
Function mode in editor S&R -- coming soon eschwartz Editor 12 11-21-2014 09:26 AM
Error in function mode in editor S&R jbacelar Editor 3 11-21-2014 06:34 AM


All times are GMT -4. The time now is 11:19 PM.


MobileRead.com is a privately owned, operated and funded community.