09-23-2015, 01:22 PM | #61 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@gypsy: I will look at the code needed to cover <span style="font-variant:small-caps;"> when I get time.
Not sure about the hyphen problem as I don't understand Greek characters. The code works by taking each hyphenated word from the ePub, removing the hyphen and then checking whether or not the word without hyphen exists in the dictionary. If the word exists in the dictionary then the method used returns the non-hyphenated word. I don't know why this is not working for Greek words. The plugin provided only reads one dictionary otherwise I would suspect a conflict between English and Greek dictionaries. |
09-23-2015, 03:26 PM | #62 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser
I aim to minimize the spelling errors... Forget the hyphens... Let's say that in the epub we have the "acknovvledge" instead or "acknowledge"... I look at the code. and you have Code:
HyphenRemoved=m.group(1)+m.group(2) Code:
FixWord=m.group(1)+'w'+m.group(2) Thanks |
Advert | |
|
09-23-2015, 04:04 PM | #63 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
The style "font-variant:small-caps" is not recognised by all ePub readers. This style produces capitalised text that is slightly smaller than the main text. The plugin has been updated to include an option for processing span tags "Change to UPPER" that changes the text that has the style "font-variant:small-caps" to upper case. It also has another option for processing this style, "Change to small UPPER". This is described in more detail in the updated ePub manual for this plugin. The update for the plugin and the manual is in the first post in this thread.
@gipsy: You suggested correcting errors such as "acknovvledge" to "acknowledge" by spliting the mispelt word into two groups; it is more straightforward to use a regex expression to replace "acknovvledge" with "acknowledge". You could correct this error using this plugin by adding the code: CorrectText("Changed acknovvledge to acknowledge", "acknovvledge", "acknowledge") |
09-23-2015, 04:17 PM | #64 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
Not only with acknovvledge Calib :P That was a example in english.
In greek there are a few errors like that. That the word have a "ύ" instead of "έ" or a "ο" instead of "σ". In my last epub i had about 60+ errors like that when I spellcheck it. This is the code that search for the hyphens? Code:
CorrectText("Hyphens removed",r"(?s)(\w+)[ ]?-[ ]?(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsHyphenated) Thanks again EDIT: Another example in greek... I had the following Regex to fix some errors. Code:
Find:(ΓΙ|Γΐ|ΙΙ|II|I\ I|I\ Ι\ΓΤ|ΙΊ|Ιί) Replace:Π Last edited by gipsy; 09-23-2015 at 04:23 PM. |
09-24-2015, 01:47 AM | #65 |
Grand Sorcerer
Posts: 5,636
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
|
@gipsy: Since you rely so much on the PCRE regex flavor used by Sigil, you may want to look into Saved Searches (Tools > Saved Searches), if you can live with the fact that Saved Searches won't give you detailed feedback.
Create a group, e.g. Greek and add all the regexes that you need. Later on simply open the Saved Searches dialog box, select the group heading and click Replace All. @CalibUser: You may want to look into porting the regular expressions in your plugin to the new Python regex library, which offers some features that even PCRE lacks. Among them Levenshtein distance based fuzzy matches and case-insensitive Unicode matches. (The Python regex library will be included in the upcoming Sigil 0.8.9 release.) You may want to also consider loading all regex expressions from a Saved Search group. This'll allow users to easily customize your plugin. (Saved Searches are stored in sigil_searches.ini.) Last edited by Doitsu; 09-24-2015 at 01:49 AM. |
Advert | |
|
09-24-2015, 01:56 AM | #66 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@Doitsu I have the saved searches. But you must not replace them all because you get more spelling errors instead to reduce them! You must process it one by one.
But if you have the dictionary check in the fixes, you can replace them all and it saves you time |
09-24-2015, 04:42 PM | #67 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
I think I figure out how to make the regex work with the WordDictionary...
I test it with 4-5 words epub that I intentionally misspelled and it work. But i'm gonna test it some more times first in a full epub! @CalibUser is the code correct? Code:
def FixP(m): """ This function examines a word to see whether is required to fix the Π character that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the Π fixed """ FixP='Π'+m.group(2) if spell(FixP): print("FixP removed from: ", FixP) return(FixP) else: return(m.group(0)) #Fixes Π in words that are misspelled if dictExists == True: CorrectText("Π fixes",r"(ΓΙ|Γΐ|ΙΙ|II|I\ I|I\ Ι|ΓΤ|ΙΊ|Ιί)[ ]?(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", FixP) |
09-25-2015, 11:47 AM | #68 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@gipsy: If I have understood your code correctly, the search expression will find a word that is preceded by character(s) in the search pattern and send this word to function FixP(). Let's call this word test word.
FixP() puts 'Π'in front of test word to form a new word (call it a check word) and then looks in the dictionary to check whether or not the check word is in the dictionary. If the check word is in the dictionary then the function returns the check word, otherwise it returns the test word. Eg If the word being examined is IITest, your code will send IITest to Fix(). If ΠTest is in the dictionary then your code will return ΠTest otherwise it will return IITest. If this is what you wanted the code to do then it is correct. @Doitsu: Thanks for the link relating to the new regex library for Python. I like the idea of importing regex expressions from a saved search group into the plugin; I originally developed this plugin because I had several different groups of saved searches and I wanted to run them all together. I will be looking into all your suggestions, time permitting. |
10-01-2015, 03:06 AM | #69 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
I was trying to correct the Π fixes in my latest post here. Because i notice that in greek we have words that are correct with and without the the correction (for example "ΓΙΟΥ" "ΠΟΥ" are both correct words in greek).
The code from my latest post it changes the ΓΙΟΥ word to ΠΟΥ. I change the code to this: Code:
############FIXES Π########### def FixP(m): FixP=m.group(1)+m.group(2) def FixP2(m): FixP2='Π'+m.group(2) """ This function examines a word to see whether is required to fix the Π character that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the Π fixed """ if spell(FixP): return(FixP) else: print("FixP removed from: ", FixP2) return(FixP2) Code:
#Fixes Π in words that are misspelled if dictExists == True: CorrectText("Π fixes",r"(ΓΙ|Γΐ|ΙΙ|II|I\ I|I\ Ι|ΓΤ|ΙΊ|Ιί|ΓΊ)(\w+)[ ]?(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", FixP) CorrectText("Π fixes",r"(ΓΙ|Γΐ|ΙΙ|II|I\ I|I\ Ι|ΓΤ|ΙΊ|Ιί|ΓΊ)(\w+)[ ]?(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", FixP2) if not html == html_orig: bk.writefile(id, html) #If the text has changed then write the amended text to the book But when i try the plugin with the above code it doesn't return the FixP or FixP2, it leaves it blank (and it says that 3 Π's are corrected). Can you help me solve it please? I have about 4-5 fixes like this if i manage to find a solution. I attach a epub and a WordDictionary with test material :P Thanks |
10-01-2015, 01:15 PM | #70 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@gipsy: I will try to find some time this weekend to look at your file.
|
10-01-2015, 01:20 PM | #71 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I have made three more updates to the ePub Tidy tool. The latest version will:
1. Allow the user to use a customised list of words that need to be corrected. 2. Allow the user to rename <h...> tags in selected html sections and strip out all other tags 3. Changed the following html codes to a single character: ‘ ’ “ ” — The update is in the first post in this thread with an updated instruction manual to explain how to use the new version. |
10-01-2015, 02:51 PM | #72 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser
When i try to add it from Manage plugins in sigil i get a "Error: Plugin not a valid Sigil plugin." I simply extract it to plugin folder to check it :P |
10-01-2015, 02:58 PM | #73 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@gipsy: Odd - it worked on my Windows 7 PC. I will check it again. I spotted an error in your code:
def FixP(m): FixP=m.group(1)+m.group(2) You cannot use an equal sign to return the value of a function; I think your expression will equate to 'None'. To return the groups in your expression from a function, you need to use: return( m.group(1)+m.group(2) ) |
10-01-2015, 03:10 PM | #74 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
i was trying to do something like...
if m.group(1)+m.group(2) is spell then return m.group(1)+m.group(2) else if 'Π'+m.group(2) is spell then return Π+m.group(2) else return m.group(0) But i can only copy-paste coding, i don't have any knowledge :P I will try your suggestion |
10-01-2015, 03:11 PM | #75 |
Addict
Posts: 202
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
The plugin should work now - there was an error in the filename that did not match the XML file in the plugin
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM |
developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM |
Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM |
Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM |
Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |