11-18-2015, 02:46 PM | #136 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser
I think i found it. I change the (\w+) to (\w*|\s) and it seems it works Code:
#Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w*|\s)(ο\'\)|ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) |
11-18-2015, 06:57 PM | #137 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
Can someone check (and confirm) if the plugin bypass the first line of IncorectWords (and in custom)?
For example if the IncorrectWords are like Code:
WTiat|What It fix it when you have the IncorrectWords like Code:
WTiat|What WTiat|What |
Advert | |
|
11-18-2015, 09:11 PM | #138 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
I used to scan my books and did the same thing on notepad++.
It's a lot of work, but the errors differ from device to device, and ocr to ocr. |
11-19-2015, 01:27 AM | #139 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser
I change some things in the CorrectTexts... It works better but I get some unnecessary fixes (4 in a total of 200) but i can use the customised word list to fix them. (like ω, φυλάξοι that are spell correct but it's not the words the text have, I attach the custom text file). All the Fix finds now the misspelled character in the whole word. I also add a fix for φ that are as "η>|«ρ|ηι|<ρ|4>|ιρ" after the OCR. Code:
def IsFixP(m): """ FIXES Π This function examines a word to see whether is required to fix the Π character that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the Π fixed """ FixP=m.group(1)+"Π"+m.group(3) FixP2=m.group(1)+m.group(2)+m.group(3) if spell(FixP2): return(m.group(1)+m.group(2)+m.group(3)) elif spell(FixP): print("FixP: ",FixP2, " changed to ", FixP) return (m.group(1)+'Π'+m.group(3)) else: return(m.group(1)+m.group(2)+m.group(3)) def IsFixE(m): """ FIXES έ This function examines a word to see whether is required to fix the έ character that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the Π fixed """ FixE=m.group(1)+"έ"+m.group(2) FixE2=m.group(1)+"ύ"+m.group(2) if spell(FixE2): return(m.group(1)+"ύ"+m.group(2)) elif spell(FixE): print("FixE: ",FixE2, " changed to ", FixE) return(m.group(1)+"έ"+m.group(2)) else: return(m.group(1)+"ύ"+m.group(2)) def IsFixO(m): """ This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterw that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the ώ fixed """ FixO=m.group(1)+"ώ"+m.group(3) FixO2=m.group(1)+m.group(2)+m.group(3) if spell(FixO2): return(m.group(1)+m.group(2)+m.group(3)) elif spell(FixO): print("FixΏ: ",FixO2, " changed to ", FixO) return(m.group(1)+"ώ"+m.group(3)) else: return(m.group(1)+m.group(2)+m.group(3)) def IsFixW(m): """ This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterς that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the ω fixed """ FixW=m.group(1)+"ω"+m.group(3) FixW2=m.group(1)+m.group(2)+m.group(3) if spell(FixW2): return(m.group(1)+m.group(2)+m.group(3)) #elif spell(FixW2): # return(m.group(1)+m.group(2)+m.group(3)) elif spell(FixW): print("FixΩ: ",FixW2, " changed to ", FixW) return(m.group(1)+"ω"+m.group(3)) else: return(m.group(1)+m.group(2)+m.group(3)) def IsFixF(m): """ This function examines a word to see whether is required to fix the ((ρ|χρ|η>|«ρ|ηι|<ρ|4>|ιρ) characterς that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the ω fixed """ FixF=m.group(1)+"φ"+m.group(3) FixF2=m.group(1)+m.group(2)+m.group(3) if spell(FixF2): return(m.group(1)+m.group(2)+m.group(3)) elif spell(FixF): print("FixΦ: ",FixF2, " changed to ", FixF) return(m.group(1)+"φ"+m.group(3)) else: return(m.group(1)+m.group(2)+m.group(3)) Code:
if useHunspellDict=="Yes": #Fixes Π in words that are misspelled CorrectText("Π fixes",r"(\w*|\s)(Ιΐ|1\ Ι|1\ Ι|1Ι|1I|ΓΙ|Γΐ|ΙΙ|II|Ι\ Ι|ΓΤ|ΙΊ|Ιί)[ ]?(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixP) #Fixes έ in words that are misspelled CorrectText("έ fixes",r"(\w+|\s)ύ(\w+|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixE) #Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w*|\s)(οί\)|νο'\)|α\)|οδ|οό|ιυ|άί|ο5|ο'\)|ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) #Fixes ω in words that are misspelled CorrectText("ω fixes",r"(\w*|\s)(οί\)|νο'\)|α\)|οδ|οό|ιυ|άί|ο5|ο'\)|ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW) #Fixes φ in words that are misspelled CorrectText("φ fixes",r"(\w*|\s)(\(ρ|χρ|η>|«ρ|ηι|<ρ|4>|ιρ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixF) Calib sometime in the future it's possible to make the greek fixes a different py file so i don't mess with your HTMLProcceror all the time? Last edited by gipsy; 11-19-2015 at 02:12 AM. Reason: add some more searches |
11-21-2015, 09:58 AM | #140 | ||
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Plugin updated to version 0.2.0.0.3
I have updated the plugin in the first post in this thread.
Updates The update fixes the bug reported by ovinio: Quote:
Other queries Quote:
@gipsy: To be honest, I don't see myself having time to think about reorganising the functions into at least two files and debugging the new files. |
||
Advert | |
|
11-30-2015, 02:34 PM | #141 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Updated to version V0.2.0.0.4
As Sigil version 0.9.1 has been released, I have reinstated the ability of this plugin to load a named css style sheet; I had disabled this feature because Sigil version 0.9.0 had a bug that caused this feature of the plugin to corrupt ePub books. The bug has been fixed in Sigil version 0.9.1.
Warning: Do not use this version of the plugin with Sigil 0.9.0 as the bug in Sigil 0.9.0 will corrupt your ePub. |
12-04-2015, 07:58 AM | #142 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
With Sigil 0.9.1 the 2.0.0.4 ver of the plugin doesn't run well.
It stucks to "The selected html files do not contain span tags". If i let it for some time and hit Close it make some changes and i get a Status: success. |
12-04-2015, 02:06 PM | #143 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@gipsy: Thanks for the bug report. I haven't experienced this problem. I will look into it.
|
12-04-2015, 03:02 PM | #144 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser maybe it has to do with the in-build dictionary.
I change the name for the greek (in HTMLProcessor.py) and it's not display that the dictionary is missing. With the name changed, the plugins runs smoothly. After that I changed the in-build dictionary to a different dictionary and it stuck again. EDIT: Test it in Windows 8.1 and Windows 10, i check and the 2.0.0.2 release of the plugin runs in those and with Sigil 0.9.1 Last edited by gipsy; 12-04-2015 at 03:04 PM. |
12-08-2015, 02:04 PM | #145 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@gipsy: I have not been able to reproduce the fault. Please let me now which options you are ticking in the plugin, also whether you are processing header tags and whether you are using any options for chapter headings.
Thanks. |
12-08-2015, 03:29 PM | #146 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser i uninstall the sigil and the plugin, also delete all related folders.
Reinstall them both. I only tick Process Greek characters only and Use in-build dictionary Greek* for the following testing. I don't Process header tags, or chapters i also don't Select any files for automatic-manual spell check.
I attach a test epub and the el_GR hunspell so someone test it in windows 7 for example. Last edited by gipsy; 12-08-2015 at 03:56 PM. |
12-09-2015, 02:04 PM | #147 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Updated to version V0.2.0.0.4A
@Gipsy: Thanks for providing the test code and the Greek components of the Hunspell dictionary. I traced the error to the code that "Fixes Π in words that are misspelled". There was a round bracket missing from the code. I have added this in and now the plugin works properly in the tests that I have been running.
I have updated the plugin in the first post for this thread. |
12-09-2015, 02:28 PM | #148 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
Works fine now CalibUser.
Thanks! |
12-21-2015, 02:37 AM | #149 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Here is a few suggested additions to your truncated words list for the apostrophes in wrong direction. I also made two other small changes also to suit my perferences, I added the (?i) to ignore the case - this is to catch the instances where the first letter has been capitalised and I prefer to look for punctuation / space combination at the end
Code:
[ ]?‘(?i)(ad|at|appen|ard|ave|bout|bye|cause|cept|cos|cuz|couse|eard|em|er|e|ee|ell|fraid|fore|im|is|isself|gainst|less|mongst|neath|nough|nother|nuff|ome|ow|ope|oney|orse|puter|round|scuse|spect|scaped|sides|specially|tween|taint|til|tis|twas|twere|twould|twill|un)([\p{P}|\s]) |
12-21-2015, 02:29 PM | #150 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Thanks, Steadyhands.
I will incorporate the code in the next version of the plugin. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM |
developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM |
Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM |
Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM |
Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |