![]() |
#1 |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
|
![]()
I am experimenting with the search & replace in calibre.
![]() Conversion of txt files often leaves too many line breaks, even with heuristic processing. When the </p><p class="whatever"> line breaks have small letters on both sides, they can always be removed and substituted with a space. '[a-z]</p><p class="whatever">[a-z]' finds all those repeats but when I replace with '[a-z] [a-z]' the conversion also removes and replaces every letter before and after the code with the string '[a-z]' which is not what I intended. I have to include '[a-z] in the search string to find the correct line breaks. If I don't, I'll delete every line break in the book, which imo makes it pretty much unreadable. Is there a way to include those letters before and after line break in the search string, but to exclude them from substitution? TIA. Last edited by G2B; 03-02-2018 at 11:55 AM. |
![]() |
![]() |
![]() |
#2 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
|
Quote:
Search for: Code:
([a-z])</p><p class="whatever">([a-z]) Code:
\1 \2 |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Book E d i t o r
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
|
Search for this:
([a-z])</p>\s+<p class="whatever">([a-z]) The \s+ in the middle is for the space between the two lines and is necessary to find what you're looking for. Replace with this: \1 \2 You might also want to do two searches and drop the ([a-z]) at the end, though, because you'll find there are a lot of lines that need to be connected where the second line doesn't start with a small letter. Last edited by deback; 03-09-2018 at 07:44 PM. |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,097
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
|
I recommend useing the editor for finding split paragraphs and line feeds, this is the base search and replace I use:
Search Code:
</p>\s*<p[^>]+>([a-z]) Code:
\1 For extra line feeds I use a regex-fuction. search: Code:
<p class="(.*?)">(.*?)</p>|<div class="(.*?)">(.*?)</div> Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): return match.group().replace('\n', ' ') bernie |
![]() |
![]() |
![]() |
#5 | |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
|
![]() Quote:
I tried that, and despite that your string only shows [a-z] small letters, the editor also takes me to sentence line breaks that start with [A-Z] capitalized letters. ![]() There seems to be something wring with my calibre. Heurisitic processing splits pages instead of removing line breaks. I have uninstalled and reinstalled, but same problems persist. ![]() Last edited by G2B; 03-08-2018 at 03:22 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,062
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
because you did not include something that limits which beginning term traps
I check for an alpha, optionally followed by: comma, either quote. This still fails in a few cases like Mr., where the alpha is upper case like A.M. I fix those few by hand <sample clipped from Sigils saved search FILE which includes additional escapes> Code:
70\Name=Cleanup/Joins/Join to lower 70\Find="([[:alpha:],][\"\x201d]*)</p>\\s*<p\\b[^>]*>([a-z\x201c\"])" 70\Replace=\\1 \\2 71\Name=Cleanup/Joins/Join to upper 71\Find="([[:alpha:],]\x201d*)</p>\\s*<p\\b[^>]*>([\"\x201c]*[A-Z])" 71\Replace=\\1 \\2 |
![]() |
![]() |
![]() |
#7 |
Book E d i t o r
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
|
|
![]() |
![]() |
![]() |
#8 |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regular Expressions Help | lauralein | Library Management | 1 | 11-12-2013 07:05 PM |
Regular Expressions help | deamonfruba | Library Management | 2 | 06-02-2012 02:09 AM |
Help with regular expressions | MostlyCarbon | Library Management | 0 | 02-04-2012 03:00 PM |
Help with regular expressions | jevonbrady | Library Management | 6 | 06-21-2011 10:16 AM |
Help with Regular Expressions | ghostyjack | Workshop | 2 | 01-08-2010 11:04 AM |