Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-06-2014, 10:20 PM   #1
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Saved Search/Regex Functions

If anyone has useful Saved Searches they would like to share, you can share them in this thread.

Generic rules to fix common problems, for example.
Or just anything clever and cool which you are proud of and want to admire.



NOTE: To make it easier to read, it would be nice if all Search & Replace fields were wrapped in the
[CODE]content goes here[/CODE]
tags.

Also, you can export the saved search as a .json, and upload it here in a zipped folder.

Moderator Notice
This thread has been made a sticky, and unlike most other sticky threads, this one is open to all who have a useful saved Search/Replace they wish to share. Do not use this thread to ask any questions. Start a new thread. Posts that don't belong here will be deleted or moved, but you are encouraged to post if you have something to share.

Please add a descriptive title to each post and explain what your Saved Search accomplishes.

Last edited by DoctorOhh; 04-08-2014 at 02:14 AM. Reason: added some formatting/sharing guidelines
eschwartz is offline   Reply With Quote
Old 04-07-2014, 08:14 AM   #2
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Finding and joining broken paragraphs

([a-z])</p>whatever else is in the middle here<p>([a-z])

Replace with \1space\2

With case sensitive ticked.

Doesn't get absolutely everything, but can be used very quickly.

Potshots welcome from people who actually know regex welcome. I just guess and see if it works!
mrmikel is offline   Reply With Quote
Advert
Old 04-07-2014, 08:18 AM   #3
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Uncapitalized letters after period and quote

\.” [a-z]

Case sensitive ticked.

No easy replace in this case, but at least you can find it. Remove quote to just find sentences uncapitalized without the quote.
mrmikel is offline   Reply With Quote
Old 04-07-2014, 05:17 PM   #4
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Preventing line wraps around dashes in Spanish dialogues

As dashes are wrap points in HTML, dialogues in Spanish ebooks can look terrible.

Example in one line:
Code:
—Bla, Bla, Bla, —John said—. More bla, bla, bla.
Wrong:
Code:
—Bla, Bla, Bla, —John said
—. More bla, bla, bla.
Wrong:
Code:
—Bla, Bla, Bla, —
John said—. More 
bla, bla, bla.
Right:
Code:
—Bla, Bla, Bla, —John 
said—. More bla, bla, bla.

The next two following searches add a <span> around the partner word with a specified class. (In my example just <span class="nw">).

Then add the next CSS definition for this class:
Code:
.nw { white-space: nowrap; display: inline-block; text-indent: 0em;}
and you will have prevented the wrong wrapping in Spanish books.

Edit notes. Explanation of the workaround for RMSDK:
Spoiler:
The previous CSS class is a modification of my original one which only included the white-space: nowrap; code. But in latest versions of RMSDK the white-space property has stopped working. (It worked, and works, in my old Sony PRS-650).

But in an ebook I was recently reading I found that they prevented the wrapping inside formulas enclosing them in a <span> with display: inline-block; text-indent: 0em;. So I just decided to add this method to my previous one. And then it also works in newer versions of RMSDK (Kobo Aura H2O with firmware 3.15.0 as example).

The no-wrapping effect is actually obtained through the display: inline-block; part. But if this protected <span> started a new line it would inherit the text-indent value its parent <p> had. Because of that behaviour, the text-indent: 0em; setting is also added.



First S&R
Search:
Code:
\x20(—|–|&mdash;|&ndash;)([^ <]+)( |</p>|</div>)
Replace:
Code:
\x20<span class="nw">\1\2</span>\3
Second S&R
Search:
Code:
\x20([^ >]+)(—|–|&mdash;|&ndash;)(\.|\.\.\.|,|;|:|…|&hellip;)?\x20
Replace:
Code:
\x20<span class="nw">\1\2\3</span>\x20
Additional usage notes
Spoiler:
  • Yes, you need both S&R and in that order.
  • Do not forget about setting up the additional CSS style or it would be useless.
  • As you can see they look for dashes and just dashes (in unicode or in named entity flavour). Some horribly formatted books use minus signs that these searches won't catch.
  • Case Sensitive or Dot All settings are probably irrelevant but I've got them in OFF.
  • Because of the [^ <]+ and [^ >]+ parts of the Searches they are completely safe to use. I mean they won't catch and destroy code like:
    Code:
    —Bla, Bla, Bla, —<b>John</b> <i>said</i>—. More bla, bla, bla.
    They will just ignore it. You will never get something wrong like:
    Code:
    —Bla, Bla, Bla, <span class="nw">—<b>John</span></b> <i><span class="nw">said</i>—.</span> More bla, bla, bla.
    You'll have to manually fix this kind of situations.
  • Using them where dashes are used as sentence or word separators is also safe:
    Code:
    First sentence—Second sentence.
    This situation, pretty common in English books, is also ignored.
  • As hinted in other thread I've used \x20 for the starting and ending spaces needed in the regexes, in order to make them clearly visible.
  • Obviously there's no point in adding a <span> around the very first starting dash and word, and these searches don't do that.
  • Strange situation that I remember having found once or twice. If there's some kind of CSS setting directly on <span> tags then it will be also applied to the newly created tags. I remember suffering a
    Code:
    span {font-size: 1.3em;}
    which I had to override with
    Code:
    .nw {font-size: 1em; white-space: nowrap;}
    while not losing where it was being originally applied.

Last edited by arspr; 05-07-2015 at 03:53 PM. Reason: New .nw CSS definition - Workaround for RMSDK
arspr is offline   Reply With Quote
Old 05-21-2014, 11:46 AM   #5
Zajora
Junior Member
Zajora began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2014
Device: Kindle Keyboard
I have created a fair number of regex fixes. I make changes to them every so often, so I'll probably edit this post if I do. I'd use code tags, but they take up too much room. If a regex should be replaced with a space, it will say "space". If there is nothing, then (logically) it should be replaced with nothing.

Of course, there is no guarantee any of these will work properly. I always check them a number of times before doing replace all, since there are a TON of ways eBooks can have formatting that wrecks these regexes.

Scenario: Apostrophes have been replaced with double quotes
Match: (?<=\w)(“|”)(?=\w)
Replace:

Scenario: There is a linebreak in the middle of a character's dialogue
Match: (?<=“[^”]*)</p>\s*<p[^>]*>(?!“)
Replace: space

Scenario: A tag closes, is followed by 0+ spaces or newlines, is then reopened and is then followed by a lowercase letter
Match: </(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>(?=[a-z])
Match: (?<![".!?>*”“…~’])</(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>
Replace: space
Notes: The second one is an alternate, which I think is better, but I'm not 100% sure it covers all the cases of the former.

Scenario: "LL" Ligatures have been replaced with a single "L".
Match: (l (?=(y|s|ed|ey|ion|en|ar|ars|er|ow|et|owed|enge|age |enging|ected|egal|ections|ect|apse|ular|op|owing| ocks|ied|ier|ies|ing|ingly|ered|icit|est)(\W)))|(l (?![(–<-])(?=\W))|(?<=’)l(?=\W)|(?<= (wi|du|a|we|te|sma|ca|sti|fu|fa|chi|sha|wa|pha|se| bi|ha|ki|pu|ce|ba|ski|hi|fi|fe|he|ro|ta|i|sme|bri| sta|we))l(?=\W)
Replace: ll
Notes: This regex doesn't really work that well, but it's faster than doing it manually. I would recommend using the spellcheck afterwards and catching the most common ones. This regex is actually a bunch of individual ones chained together by ORs (|) so it's easier to see what's doing what.

Scenario: More than 1 space in a row
Match: (?<=\S) {2,}(?=\S)
Replace: space

Scenario: There are tags (which may be nested) that are either empty or just have a number in them
Match: (<[^/>]*>)+\s*\d*\s*(</[^>]*>)+
Replace:
Notes: This may remove things you'd like to keep, such as scenebreaks/whitespace, or the chapter links.

Scenario: There's a linebreak or spaces before a closing tag
Match: (?<![".!?>*”“…~’])</(?P<tag>\w+)>\s*<(?P=tag) [^/>]+>
Replace:

For future use:

Scenario:
Match:
Replace:

Last edited by Zajora; 05-21-2014 at 11:51 AM.
Zajora is offline   Reply With Quote
Advert
Old 05-22-2014, 09:03 PM   #6
Section8
Addict
Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.Section8 ought to be getting tired of karma fortunes by now.
 
Section8's Avatar
 
Posts: 256
Karma: 2092424
Join Date: Oct 2011
Location: Arlington, TX
Device: Kindle PW4, Moon+ Reader on a cheap Android tablet
I have a nook, and the only real regexes I've written are for fixing stylesheets to work around its margin bug: if "publisher defaults" are disabled, the nook doesn't handle the css "margin" setting. I've been using these to convert all 4 forms of "margin" to the equivalent margin-top, margin-right, etc. These were written for Sigil, but I *think* they work in the calibre editor.

First: find margin:
Find: margin *:

Convert margin: a (single value):
Find: margin *: *([^\s;]+)(\s*(;|}))
Replace: margin-top: \1; margin-right: \1; margin-bottom: \1; margin-left: \1\2

Convert margin: a, b (2 values)
Find: margin *: *([^\s;]+) +([^\s;]+)([\s]*(;|}))
Replace: margin-top: \1; margin-right: \2; margin-bottom: \1; margin-left: \2\3

Convert margin a, b, c (3 values)
Find: margin *: *([^\s;]+) +([^\s;]+) +([^\s;]+)([\s]*(;|}))
Replace: margin-top: \1; margin-right: \2; margin-bottom: \3; margin-left: \2\4

Convert margin a, b, c, d (4 values)
Find: margin *: *([^\s;]+) +([^\s;]+) +([^\s;]+) +([^\s;]+)(\s*(;|}))
Replace: margin-top: \1; margin-right: \2; margin-bottom: \3; margin-left: \4\5
Section8 is offline   Reply With Quote
Old 06-19-2014, 03:17 PM   #7
user743
Addict
user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.user743 has never been to obedience school.
 
Posts: 243
Karma: 44444
Join Date: Mar 2014
Device: Kindle PW2 special offers removed by Amazon for FREE
switch script links to html links.
Code:
<script>	AddIndex\("(.+?)", (".+?"), ".+?"\); </script>
<a href=\2>\1<a>
regex. not case sensitive. dot all.
change double quotes to single quotes if necessary.
user743 is offline   Reply With Quote
Old 12-23-2014, 03:46 AM   #8
dmonasse
Member
dmonasse began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Apr 2014
Location: Paris
Device: ipad 2, Ubuntu
A regex function to number a (mathematical) ebook

The search and replace tool with regex function is really fantastic. My little society is building mathematical ebooks from latex sources. One of my problems for converting such books is that latex auto-numbers chapters, sections, subsections and theorem-like assertions (theorems, propositions, lemmas, definitions, corollaries and so on). I would like to do such a numbering in my ebook.

A solution is the following:

1) Converting from latex, I put chapters, sections, subsections and assertions in a <div> tag with a html5 data-type attribute. For example, a latex section
Code:
\section{History of the Fermat-Wiles theorem}
is converted into
Code:
<div class="section" data-type="section">History of the Fermat-Wiles theorem</div>
and
Code:
\begin{theorem}Abracadabra\end{theorem}
is converted into
Code:
<div class="theorem" data-type="theorem">Abracadabra</div>
Nota: I can't use the class attribute to denote the type of the div because the conversion process from HTML to ePub by Calibre modifies these attributes and class="theorem" may be changed into class="pcalibre25". That's the reason for the data-type attribute.

2) After conversion from latex to html (not so easy!!!) and from html to epub (easy with Calibre), I number the whole book with the Calibre editor using the search and replace tool with regex function.
The search pattern I use is:
Code:
<div.*?data-type="(chapter|section|subsection|theorem|proposition|lemma|definition|corollary)"[^>]*>
and the regex function may be:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    if number==1: #initialization of the counts
        data['chapter']=0
        data['section']=0
        data['subsection']=0
        data['assertion']=0
    the_type=match.group(1)
    if the_type=='chapter': # begins a chapter, reinitialize the counts
        data['section']=0
        data['subsection']=0
        data['assertion']=0
        data['chapter']+=1
        return match.group()+"<span class='chapter_num'>Chapter "+str(data['chapter'])+".</span> "
    elif the_type=='section': # begins a section, reinitialize the subsection count
        data['subsection']=0
        data['section']+=1
        return match.group()+"<span class='section_num'>Section "+str(data['section'])+".</span>" 
    elif the_type=='subsection':
        data['subsection']+=1
        return match.group()+"<span class='subsection_num'>Subsection "+str(data['section'])+"."+str(data['subsection'])+".</span>"
    else: # this is an assertion
        data['assertion']+=1
        return match.group()+"<span class='assertion_num'>Assertion "+str(data['chapter'])+"."+str(data['assertion'])+".</span>"
    return ''

replace.file_order = 'spine'
Adapt the code according to your needs or wishes, this is only an example; it would be nicer to replace "Assertion" by "Theorem", "Proposition", "Lemma", "Corollary", "Definition" (very easy to do starting from the "the_type" variable). I obtain such a numbering:
Code:
Chapter 1
     Section 1
         Subsection 1.1
             Assertion 1.1
             Assertion 1.2
         Subsection 1.2
            Assertion 1.3
     Section 2
         Subsection 2.1
             Assertion 1.4
             Assertion 1.5
         Subsection 2.2
            Assertion 1.6
Chapter 2
     Section 1
         Subsection 1.1
             Assertion 2.1
             Assertion 2.2
         Subsection 1.2
            Assertion 2.3
     Section 2
         Subsection 2.1
             Assertion 2.4
             Assertion 2.5
Hope this may help. Any improvement will be welcome (even in my bad English syntax).
dmonasse is offline   Reply With Quote
Old 07-17-2015, 03:20 AM   #9
senhal
Connoisseur
senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.senhal knows what's going on.
 
senhal's Avatar
 
Posts: 82
Karma: 25684
Join Date: Sep 2014
Device: Kindle NT
I'm trying to write some regex in order to have something similar to pepito cleaner (openoffice plugin) with some other searches for typical OCR errors (some of them are intended for italian language).

So, import & test the attached regex and let me know; my goal is to correct and improve them
Any suggestion's welcome

Explanations:
  • Words inside [ ] in the saved search name suggest what the replace button will do, so [del] will delete something, [man] needs a manual intervention, [space] will replace something with a space and so on: just make a copy of your ebook and try...
  • The regex called "ADE verify" finds all the characters that Adobe Digital Editions doesn't show: if the search find something, you'll need embedding fonts for ADE full visualization.
  • Please forgive my english translations of the regex names: help me to improve them too
Attached Files
File Type: zip saved_search_v08.zip (6.3 KB, 1303 views)
senhal is offline   Reply With Quote
Old 03-03-2016, 01:22 PM   #10
Arjayem
Casual Member
Arjayem began at the beginning.
 
Arjayem's Avatar
 
Posts: 5
Karma: 10
Join Date: Mar 2016
Location: UK
Device: Kindle paperwhite
Lightbulb Scanning OCR Errors

Errors produced by scanning text seem to follow a predictable pattern such a seU for sell or iUness for illness or bom for born etc but never the less aren't corrected by the automatic scanning software. So, I created a function for the calibre editor to fix those I most commonly found. You'll also found I've corrected some American spellings, depending upon your dictionary these won't actually be wrong.

The code is based on the Calibre example that tidies up hyphens.

You'll need to enter the following find : >.*?<

Here's the function, because PYTHON uses intelligent (or not so) indenting you may need to play some to get PYTHON to swallow the code. :

Code:
import regex
from calibre import replace_entities
from calibre import prepare_string_for_xml

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):

    def replace_word(wmatch):
        # Check if the current word exits in the dictionary
        CheckThisSpelling = wmatch.group(1)
        if dictionaries.recognized(CheckThisSpelling) == True:   
            return wmatch.group()
        else:
        #	else try to correct it - remove American spelling
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("or", "our") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)         
            NewSpelling = CheckThisSpelling + '~'
            NewSpelling = NewSpelling.replace("or~", "our") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
            NewSpelling = CheckThisSpelling + '~'
            NewSpelling = NewSpelling.replace("ors~", "our") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)    
        #	else try to correct it - remove American spelling
            NewSpelling = CheckThisSpelling + '~'
            NewSpelling = NewSpelling.replace("er~", "re") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("er", "re") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            else:
              NewSpelling = NewSpelling.replace("ree", "re") 
              if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                    
            NewSpelling = CheckThisSpelling + '~'
            NewSpelling = NewSpelling.replace("ers~", "res") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling + '~'
            NewSpelling = NewSpelling.replace("nse~", "nce") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
        #	else try to correct it - remove American spelling
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("l", "ll") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("l", "ll",1) 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("l", "~",2) 
            NewSpelling = NewSpelling.replace("~", "l",1)
            NewSpelling = NewSpelling.replace("~", "ll",1)                       
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                                 
        #	else try to correct it - remove American spelling
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("ll", "l") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("ll", "l",1) 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("ll", "~",2) 
            NewSpelling = NewSpelling.replace("~", "ll",1)
            NewSpelling = NewSpelling.replace("~", "l",1)                       
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)               
         #
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("U", "li") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("U", "ll") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)            
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("h", "li") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("H", "li") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("h", "li",1) 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("H", "li",1) 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)  
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("h", "~",2) 
            NewSpelling = NewSpelling.replace("~", "h",1)
            NewSpelling = NewSpelling.replace("~", "li",1)              
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("H", "~",2) 
            NewSpelling = NewSpelling.replace("~", "H",1)
            NewSpelling = NewSpelling.replace("~", "li",1)   
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                         
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("im", "un") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("l", "ll") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
         #
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("imi", "um") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)              
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("m", "rn") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("m", "in") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2) 
         #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("m", "hi") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)  
          #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("mn", "um") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)           
          #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("nm", "run") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
          #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("nmi", "rum") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                                                                                           
          #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("bn", "lm") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                                                                                            
          #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("ii", "h") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                                                                                            
          #	else try to correct it 
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("ii", "u") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                                                                                            
         #	
         #	else try to correct it 
            if CheckThisSpelling == 'Fd':
                return " I'd" +  wmatch.group(2)  
            if CheckThisSpelling == 'Fve':
                return " I've" +  wmatch.group(2)
            if CheckThisSpelling == 'Fm':
                return " I'm" +  wmatch.group(2)
            if CheckThisSpelling == 'Fll':
                return " I'll" +  wmatch.group(2) 
            if CheckThisSpelling == 'youVe':
                return " you've" +  wmatch.group(2)
            if CheckThisSpelling == 'YouVe':
                return " You've" +  wmatch.group(2)                   
         #	
         #	else try to correct it 
            if CheckThisSpelling == 'wren\'t':
                return " weren't" +  wmatch.group(2)              

         #	
         #	else try to correct it 
            if CheckThisSpelling == '&':
                return ' ' + chr(38) +  wmatch.group(2)  
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace(">", "y") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("j&", "fi") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)
            NewSpelling = CheckThisSpelling
            NewSpelling = NewSpelling.replace("i&", "fi") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)  
            NewSpelling = NewSpelling.replace("l&", "fi") 
            if dictionaries.recognized(NewSpelling) == True:   
                return NewSpelling +  wmatch.group(2)                                      
                                                                              
        return wmatch.group()
        #return wmatch.group() + '1' + wmatch.group(1) + '2' + wmatch.group(2) + '3' + NewSpelling
    # Search for words 
    text = replace_entities(match.group()[1:-1])  # Handle HTML entities like &amp;
    corrected = regex.sub(r'\s*([\w\>\&[[a-z]\'[a-z]]]*)([\s*\.\?\,\"\;])', replace_word, text, flags=regex.VERSION1 | regex.UNICODE)
    return '>%s<' % prepare_string_for_xml(corrected)  # Put back required entities
GOOD LUCK & HOPE ITS OF SOME USE

Last edited by Arjayem; 03-04-2016 at 05:53 AM.
Arjayem is offline   Reply With Quote
Old 03-03-2016, 02:06 PM   #11
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,930
Karma: 76440364
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
You might like to wrap your code in [code] .... [/code] tags to preserve spacing and indentation.
PeterT is offline   Reply With Quote
Old 03-04-2016, 05:07 AM   #12
Arjayem
Casual Member
Arjayem began at the beginning.
 
Arjayem's Avatar
 
Posts: 5
Karma: 10
Join Date: Mar 2016
Location: UK
Device: Kindle paperwhite
The code sample came via notepad. I keep a copy in a txt file because I've wiped one version in Calibre using the remove button which is unforgiving and next to the edit button, a design feature that it would be nice to see addressed.
Arjayem is offline   Reply With Quote
Old 03-04-2016, 09:40 AM   #13
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,085
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
When posting code or similar, at the bottom of the window is the [Go Advanced] button to show more options.

One is the [#] icon which adds the CODE tags.

Just paste or type between them and it formats nicely
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	1194
Size:	33.9 KB
ID:	146858  
phossler is offline   Reply With Quote
Old 03-04-2016, 10:36 AM   #14
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,542
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
you can just type ANY tag pair if you know it. It even permits lowercase entry (it auto-raises on posting)

but I really wish MR (software section) forums that commonly get coding and error logs, default to 'Advanced' (or forum appropriate) tool buttons
theducks is online now   Reply With Quote
Old 03-04-2016, 11:55 AM   #15
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
We have a sticky thread you can post this in.




@theducks, would you mind fixing the thread title for that sticky? I think it predated Function-Replace mode.

"Saved Search" ==> "Saved Search/Regex Functions"

eschwartz is offline   Reply With Quote
Reply

Tags
conversion, errors, function, ocr, spelling


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
About saved searches and regex Carpatos Editor 22 09-30-2020 11:56 PM
Regex-Functions - getting user input CalibUser Editor 8 09-09-2020 05:26 AM
Difference in Manual Search and Saved Search phossler Editor 4 10-04-2015 01:17 PM
Help - Learning to use Regex Functions weberr Editor 1 06-13-2015 02:59 AM
Limit on length of saved regex? ElMiko Sigil 0 06-30-2013 04:32 PM


All times are GMT -4. The time now is 05:34 AM.


MobileRead.com is a privately owned, operated and funded community.