05-19-2014, 08:56 PM | #1 |
Lovin' the e-life!!!
Posts: 31
Karma: 2044130
Join Date: Jul 2009
Location: Pacific Northwet
Device: iPad Mini
|
cleaning up extra returns formatting mess
I know this is a frequently discussed problem: Extra returns and returns in the middle of paragraphs
I'm a speed reader and choppy pages are a pet peeve! I searched and found several good suggestions for cleaning these up (i.e. using Find & Replace in Word, using ^p for finding paragraph returns.) This works fantastically in most cases; however, what "replace code" can I use when the paragraph return symbol is a little crooked down arrow and not the normal paragraph symbol used in Word. I converted an .lrf to an .rtf file in Calibre, and all the returns, including the extra ones, use the arrow symbol. I tried cut&paste, but it doesn't register. Is there a secret code for the arrow, like ^p for paragraph? |
05-20-2014, 12:57 AM | #2 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Why not try the Edit Book feature in calibre? Once you get the hang of it, you will never look back. It is infinitely more controllable.
|
Advert | |
|
05-20-2014, 02:58 AM | #3 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
I think it is &crarr but maybe remembering wrong. Try either ^l or ^m
Helen |
05-20-2014, 03:44 AM | #4 | |
Wizard
Posts: 3,455
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
Quote:
You are not looking for MSWord, you are looking for "Regular Expressions". Word has only limited abilities comparing to other tools, like Calibre. With the Regular expressions you can say: Begin group one '\(' find one character that is not from this list: [.?!"'] End of group one '\)' followed by an end-of-line-symbol \n Replace with the contents of the group one followed by a space. '\1 ' In regular expression syntax that is something like substitute/\([^.?!"']\)\n/\1 / Unfortunately there are several dialects of RE, you will have to look it up in documentation. For example "begin a group that I will later refer to as '\1' (or '\2' and so on if it is second or a third group) is sometimes '\(' and sometimes just '(' You see, most of the linebreaks that do not follow: [.?!"] are not at the end of paragraph. This is very quick and dirty, but can clean an OCRed book from unwanted line breaks with 99% accuracy Regular expressions can look very intimidating if you just look at a complex one, but they are well worth learning. Calibre and many other advanced tools support them and you can start with a very simple ones and gradually write more and more complex REs. They will still be relatively difficult to read, because the metacharacter set is very dense so they can fit inside "search" and "replace" fields, but much easier to write after a bit of practice. |
|
05-20-2014, 04:29 AM | #5 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Moved to the "Workshop" forum.
|
Advert | |
|
05-20-2014, 09:23 AM | #6 | |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
@momtodogs: If you want to clean Word documents from OCR mess, I can advise you to look at my addin (see signature). It will catch a lot of OCR mistakes and either repairs them automatically or manually. It has various steps, one of them is a large list of S&R requests (an example list is available). It would also catch things like smarten punctuation and missing dialogue marks (and many other things). As a bonus, you can even export it to ePUB. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Check Library, Extra Authors, Extra Titles | copyrite | Calibre | 2 | 08-03-2012 02:35 PM |
Calibre extra unwanted carriage returns in PCB file conversion | gragradownunder | Conversion | 0 | 05-12-2011 07:57 AM |
Why define a paragraph as a span with no different or extra formatting? | bfollowell | ePub | 7 | 03-16-2011 11:30 PM |
Stripping extra line returns | jwhayn | Sony Reader | 3 | 02-27-2010 07:46 PM |