03-04-2016, 12:02 PM | #16 |
Well trained by Cats
Posts: 30,443
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
03-05-2016, 09:12 AM | #17 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
This approach is very promising and interesting as it allows to correct some recurrent OCR mistakes (after a dictionary check). In French, we have a list of hundreds of words that deserve to be checked out of OCR. With such an approach, of course, you can't avoid false positives, which means we need not to correct them but just to highlight them, to be able later to speed up a manual checking. I give one example, if you find "trame" which is a correct but fairly rare word, 98% of the time, it should be written "traine". It makes sense to highlight it. However, I do not know how to write a single entry. Could some kind soul write a code paragraph example of this Calibre function allowing me to highlight "trame"? |
Advert | |
|
03-05-2016, 09:55 AM | #18 |
creator of calibre
Posts: 44,542
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Simply modify the function to wrap the word in some markup instead of correcting it, so instead of replacing the word trame with its correction, replace it with <span class="mistake">trame</span>
Then you can use another search to jump to these words one by one and correct them or not, as you like. |
03-05-2016, 08:14 PM | #19 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
@kovidgoyal
Thanks for your neat idea which is so simple that I understood it. Now to try it. Last edited by roger64; 03-05-2016 at 08:17 PM. |
03-11-2016, 08:02 AM | #20 |
Casual Member
Posts: 5
Karma: 10
Join Date: Mar 2016
Location: UK
Device: Kindle paperwhite
|
Posting
smack: If that's aimed at me, I'd be happy to oblige, Regex Functions seems an apt tag. BUT I'm new to this, and the suggestion goes way over my head, can you step through it ?
|
Advert | |
|
03-11-2016, 01:39 PM | #21 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
YOU just repeat your post, in that sticky thread.
@theducks has already performed the moderator administration duties I asked for. |
03-11-2016, 01:50 PM | #22 | |
null operator (he/him)
Posts: 20,997
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
03-11-2016, 02:14 PM | #23 |
Well trained by Cats
Posts: 30,443
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
03-11-2016, 03:34 PM | #24 |
null operator (he/him)
Posts: 20,997
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Moderator Notice
Merge done |
05-08-2017, 11:54 PM | #25 |
Fanatic
Posts: 526
Karma: 32158
Join Date: Feb 2012
Device: Onyx Boox Leaf
|
Dear you guys,
Please help me with a regex function to move found searches to "endnotes.html" file in the book. I picked a code from this forum, which looks like this. Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): endnotes = open('D:\endnotes.txt', 'a') notes = match.group()+'\n' endnotes.write(notes) return '' replace.file_order = 'spine' What I would like to do is have them written directly to the endnotes.html (or even create the file if not there already). Thanks. |
02-23-2018, 09:17 AM | #26 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Tagging selected foreign words
If we manage to wrap this Code:
<span xml:lang="xx" lang="xx">foreign</span> The requisite is to write first a list.txt of these selected words -which can be easily selected and copied from the spellchecker panel of unrecognized words (to whom they all belong). For the purpose of this thread we suppose that such a list is available. Help requested I am looking for a function that I could launch on an ePub with the Calibre editor and which would use sequentially this list to wrap the above spans around each occurrence of these foreign names. Last edited by roger64; 02-23-2018 at 09:26 AM. |
02-09-2020, 09:00 PM | #27 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Search and replace excluding the headers
I am trying to insert old nums in an ePub. For this I need to wrap a span around these characters like: Code:
(\d+) <span class="smcp" >\1</span> Could a function do it? Last edited by roger64; 02-09-2020 at 09:17 PM. |
02-10-2020, 02:17 AM | #28 | |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Code:
(<p>.*?)(\d+)(.*?</p>) Code:
\1<span class="smcp" >\2</span>\3 Code:
(<p>.*?[^\>\d])(\d+)(.*?</p>) |
|
02-10-2020, 07:05 AM | #29 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
@davidfor
Thanks very much for your help. I'll try it. |
07-02-2020, 08:34 PM | #30 |
Nameless Being
|
Regex-function code
In regex-function mode "create/edit" brings up a dialogue box with two sections: 1. function name and 2. Code.
Function name has a drop down list of about a dozen functions. I thought that choosing a built-in function would populate the code panel with the appropriate Python code so I could learn from it, but it does not change from the default: def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return '' There are a few examples in the User Manual, but not all of the built-in functions, which would be handy. Where might I find the code for those functions and others? Is there, perhaps a thread the saved searches thread? If this is some secret that a novice shouldn't know, please let me know. I don't want to accidentally cross the beams and annihilate the universe. |
Tags |
conversion, errors, function, ocr, spelling |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
About saved searches and regex | Carpatos | Editor | 22 | 09-30-2020 11:56 PM |
Regex-Functions - getting user input | CalibUser | Editor | 8 | 09-09-2020 05:26 AM |
Difference in Manual Search and Saved Search | phossler | Editor | 4 | 10-04-2015 01:17 PM |
Help - Learning to use Regex Functions | weberr | Editor | 1 | 06-13-2015 02:59 AM |
Limit on length of saved regex? | ElMiko | Sigil | 0 | 06-30-2013 04:32 PM |