Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 05-19-2014, 08:56 PM   #1
momtodogs
Lovin' the e-life!!!
momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.momtodogs ought to be getting tired of karma fortunes by now.
 
momtodogs's Avatar
 
Posts: 31
Karma: 2044130
Join Date: Jul 2009
Location: Pacific Northwet
Device: iPad Mini
Question cleaning up extra returns formatting mess

I know this is a frequently discussed problem: Extra returns and returns in the middle of paragraphs

I'm a speed reader and choppy pages are a pet peeve!

I searched and found several good suggestions for cleaning these up (i.e. using Find & Replace in Word, using ^p for finding paragraph returns.)

This works fantastically in most cases; however, what "replace code" can I use when the paragraph return symbol is a little crooked down arrow and not the normal paragraph symbol used in Word.

I converted an .lrf to an .rtf file in Calibre, and all the returns, including the extra ones, use the arrow symbol. I tried cut&paste, but it doesn't register.

Is there a secret code for the arrow, like ^p for paragraph?
momtodogs is offline   Reply With Quote
Old 05-20-2014, 12:57 AM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Why not try the Edit Book feature in calibre? Once you get the hang of it, you will never look back. It is infinitely more controllable.
eschwartz is offline   Reply With Quote
Advert
Old 05-20-2014, 02:58 AM   #3
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
I think it is &crarr but maybe remembering wrong. Try either ^l or ^m

Helen
speakingtohe is offline   Reply With Quote
Old 05-20-2014, 03:44 AM   #4
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,455
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by momtodogs View Post
I searched and found several good suggestions for cleaning these up (i.e. using Find & Replace in Word, using ^p for finding paragraph returns.)

This works fantastically in most cases; however, what "replace code" can I use when the paragraph return symbol is a little crooked down arrow and not the normal paragraph symbol used in Word.

I converted an .lrf to an .rtf file in Calibre, and all the returns, including the extra ones, use the arrow symbol. I tried cut&paste, but it doesn't register.

Is there a secret code for the arrow, like ^p for paragraph?
You go to the MSWord, you activate the search and replace dialog, you press options to get much larger dialog panel and then you select the "special" button at the bottom. It has the list of codes, including ^l - for the manual linebreak.

You are not looking for MSWord, you are looking for "Regular Expressions". Word has only limited abilities comparing to other tools, like Calibre. With the Regular expressions you can say:
Begin group one '\('
find one character that is not from this list: [.?!"']
End of group one '\)'
followed by an end-of-line-symbol \n
Replace with the contents of the group one followed by a space. '\1 '
In regular expression syntax that is something like
substitute/\([^.?!"']\)\n/\1 /
Unfortunately there are several dialects of RE, you will have to look it up in documentation. For example "begin a group that I will later refer to as '\1' (or '\2' and so on if it is second or a third group) is sometimes '\(' and sometimes just '('



You see, most of the linebreaks that do not follow: [.?!"] are not at the end of paragraph.
This is very quick and dirty, but can clean an OCRed book from unwanted line breaks with 99% accuracy

Regular expressions can look very intimidating if you just look at a complex one, but they are well worth learning. Calibre and many other advanced tools support them and you can start with a very simple ones and gradually write more and more complex REs. They will still be relatively difficult to read, because the metacharacter set is very dense so they can fit inside "search" and "replace" fields, but much easier to write after a bit of practice.
kacir is offline   Reply With Quote
Old 05-20-2014, 04:29 AM   #5
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Moved to the "Workshop" forum.
HarryT is offline   Reply With Quote
Advert
Old 05-20-2014, 09:23 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by kacir View Post
You are not looking for MSWord, you are looking for "Regular Expressions". Word has only limited abilities comparing to other tools, like Calibre. With the

...<snip>...

You see, most of the linebreaks that do not follow: [.?!"] are not at the end of paragraph.
This is very quick and dirty, but can clean an OCRed book from unwanted line breaks with 99% accuracy

<snip>...
You are apparently not aware of the 'use wildcard' option within Word S&R. That enables RegEx types of search. The example you give is possible within Word without a problem. Heck, even the syntax is almost identical. If you know RegEx, you can work with Wildcard search in Word...

@momtodogs: If you want to clean Word documents from OCR mess, I can advise you to look at my addin (see signature). It will catch a lot of OCR mistakes and either repairs them automatically or manually. It has various steps, one of them is a large list of S&R requests (an example list is available). It would also catch things like smarten punctuation and missing dialogue marks (and many other things). As a bonus, you can even export it to ePUB.
Toxaris is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Check Library, Extra Authors, Extra Titles copyrite Calibre 2 08-03-2012 02:35 PM
Calibre extra unwanted carriage returns in PCB file conversion gragradownunder Conversion 0 05-12-2011 07:57 AM
Why define a paragraph as a span with no different or extra formatting? bfollowell ePub 7 03-16-2011 11:30 PM
Stripping extra line returns jwhayn Sony Reader 3 02-27-2010 07:46 PM


All times are GMT -4. The time now is 12:46 PM.


MobileRead.com is a privately owned, operated and funded community.