08-12-2014, 06:38 AM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Aug 2014
Device: Kindle Touch
|
About saved searches and regex
I'm a Sigil user, and I'm testing new Calibre editor in order to evaluate it for using as replacement for Sigil.
About saved searches, I found a bug at the interface. In this window (saved searches), if you have a long saved search, with a very long regex in "Find" field, when you select it at the saved searches list, the main window resizes automatically and becomes too width. If you try to resize the window again, it sometimes disappears, and you must close and reopen the edit program to make it visible again. And other problem is regex doesn't support PCRE expressions, like Sigil does. PCRE are much powerfull than python's "re" regexps. And there is some features in PCRE with not equivalent in "re". Is there any possibility to improve regexp, maybe with some library (like regex or python-pcre) to get PCRE support in the searches? Thanks for you program, and your support. Last edited by Carpatos; 08-12-2014 at 06:40 AM. |
08-12-2014, 06:42 AM | #2 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The only useful feature that exists in Sigil that does not exist in the editor's regex engine is the \U operator, and the editor will be getting a function mode for search and replace that will be far more powerful and general than that.
And note that the editor does not use python's re module. It uses https://pypi.python.org/pypi/regex |
Advert | |
|
08-12-2014, 08:13 AM | #3 |
Grand Sorcerer
Posts: 28,045
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I've certainly not found the regex module calibre's editor uses to be lacking any real functionality (with the exception of the \U \L \u \l \E replacement operators you mentioned) compared to Sigil's PCRE engine. I do still miss the \K operator at times, but the addition of variable-length lookbehinds makes it possible to replicate \K's versatility anyway.
|
08-12-2014, 08:49 AM | #4 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Incidentally, do the case change operators in Sigil's regex engine do an ascii swap [a-z] <-> [A-Z] or do they support full unicode case folding?
|
08-12-2014, 09:28 AM | #5 |
Grand Sorcerer
Posts: 28,045
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
They seem to at least do some basic unicode case folding. Don't know if it's full support, though.
In a search for the character 'á', all of the below captures it: Code:
(á) (\x{00E1}) (\p{Ll}) But I've no idea if it's limited to a certain subset of unicode (or if it handles all the ways á can be represented). |
Advert | |
|
08-12-2014, 10:17 AM | #6 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That's better than I expected I always assumed that ICU in Sigil was present only for WebKit, but perhaps the regex engine also uses it.
|
08-14-2014, 05:34 AM | #7 |
Junior Member
Posts: 3
Karma: 10
Join Date: Aug 2014
Device: Papyre 613
|
I'm an other user of Sigil trying to move to Calibre editor.
I'm testing the regex module because I have more than 300 regex (some of them quite complex) for correcting spanish texts. As it's refered in the initial post, there is a problem, when length of the search chain is long, with saved searches and warning window sizes. As I said, I have a lot of of regular expressions so in my opinion it would be interesting the possibility of grouping and nesting them in the saved searches window in order to keep them organized. Another thing I miss is that when I use a saved search expression will automatically appear in the main window find/search area. Sometimes is necessary to make some changes in one of them and now the only way is opening the saved search window, editing the expressión, copying and pasting and then modify it. In this find/search area, would be possible to add a "count all" button? Related with the regex engine, I've realized of two main differences: \K (but I can circumvent it using variable-length lookbehinds) and the conditional structure (?(condition)Then|Else); this one is an important limitation compared with PCRE. Properties \p are well supported, but \p{Lu} (uppercase letter) and \p{Ll} (lowercase letter) only works correctly if "case sensitive" option is checked (I don't know if this is the expected behaviour). I've tried (?f) with no success. As is mentioned in thread \U \L \E don't work, but they are Sigil commands no PCRE. Any way, it would be very interesting a similar option in Calibre editor because it's a very frecuent mistake in texts: lower case after a dot, and now replacement is not possible. In another context, sometimes scanned text includes & shy; (soft hyphen), this is a hidden hyphen that you can't see (at least in Sigil) and the only way to remove it is regex searching \xAD. Here the problem is not with the regex engine but with file preview panel where it appears as a dot. A similar behaviour it's with & #8203; (Zero-width space), that is also represented as a dot and it's another hidden character that is used in very very very long words in order to break the paragraph avoiding text exceed the screen boundary in readers. Here \x{200B} regex is not allowed. Thank you very much for your program and support. Last edited by Papirus; 08-14-2014 at 07:03 AM. |
08-14-2014, 08:38 AM | #8 | |||
Grand Sorcerer
Posts: 28,045
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Code:
(a)?b(?(1)c|d) Quote:
Quote:
Last edited by DiapDealer; 08-14-2014 at 09:46 AM. |
|||
08-14-2014, 10:52 AM | #9 | |
Dead account. Bye
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
|
Quote:
There's a young (and still short) thread about useful S&R settings in this very same subforum: https://www.mobileread.com/forums/sho...d.php?t=237181 |
|
08-14-2014, 12:58 PM | #10 | |
Junior Member
Posts: 3
Karma: 10
Join Date: Aug 2014
Device: Papyre 613
|
Quote:
Imagine a text and that you want to look for roman numbers in order to small caps them. ([ivxlcdm]+) could be a possibility with <small>\1</small> as replacement. The above expression will return roman numbers as well as different words that are formed with these characters. I.e. clic, CD, ill, id, lid, livid, isolated letters, etc. Probably in spanish there are some more frequent: Thousand (mil) and mainly My (mi) So if I use the above expresion of course I will get romans numbers but hundreds coincidences of non roman numbers too. The only way I know to bypass this is through conditional structure (?(condition)then|else). It would be more or less (in fact is necessary to refine the expression): Code:
(?i)(?(?=[clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid) |(?<=\PL)([ivxlcdm]+)(?=\PL)) |
|
08-14-2014, 01:00 PM | #11 | |
Junior Member
Posts: 3
Karma: 10
Join Date: Aug 2014
Device: Papyre 613
|
Quote:
|
|
08-14-2014, 01:10 PM | #12 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@Papirus: Thankfully, calibre's editor is written in python so you dont have to create that unreadable gunk to do something like change case for roman numerals. Using the upcoming function mode, all you need is
Find: (?i)\b([ivclm]+)\b Replace: Code:
def replace(match, context): word = match.group() if word.lower() not in {set of common words}: word = word.upper() return word |
08-14-2014, 01:39 PM | #13 | |
Grand Sorcerer
Posts: 28,045
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I think the disconnect here is terminology. The calibre editor's regex module supports if|then|else conditionals: that part is not up for debate. I think where you may be running into problems is that it doesn't support conditionals using lookarounds. So instead of a conditional like: Code:
(?(?=regex)then|else) Code:
(?=regex)then|(?!regex)else So in essence, your: Code:
(?i)(?(?=[clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid) |(?<=\PL)([ivxlcdm]+)(?=\PL)) Code:
(?i)(?=[clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid) |(?![clv]?i?d|[cm]m|[dm]i|clic|lcd|(m|ci)?v?il|[mdcl]|ill|livid)(?<=\PL)([ivxlcdm]+)(?=\PL) Last edited by DiapDealer; 08-14-2014 at 03:16 PM. |
|
08-14-2014, 01:42 PM | #14 | |
Grand Sorcerer
Posts: 28,045
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
08-14-2014, 01:54 PM | #15 |
creator of calibre
Posts: 44,572
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
In fact, just for giggles, since the function mode is actually arbitrarily powerful, you could do this:
Code:
from calibre.gui2.tweak_book import dictionaries def replace(match, context): word = match.group() if not dictionaries.recognized(word): word = word.upper() return word |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Saved searches : suggestions | Bertrand | Editor | 0 | 05-09-2014 06:58 AM |
copy saved searches | cybmole | Calibre Companion | 4 | 04-28-2014 08:20 AM |
Where are searches saved? | travger | Calibre | 2 | 08-26-2012 02:37 PM |
remove saved searches ? | cybmole | Calibre | 4 | 04-29-2011 04:12 PM |
Saved searches question | danwdoo | Library Management | 9 | 01-26-2011 02:23 AM |