Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-10-2017, 10:18 PM   #1
tankervin
Junior Member
tankervin began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2017
Device: none
Removing Line breaks using regex in PDF when converting

I have a PDF file with unnecessary line breaks when converting to EPUB. Heuristic processing doesn't work to remove them even if I set to 1. So I thought of using RegEx to replace those breaks with "blank"

Example 1
paying their own</p>
<p class="calibre1">money

Example 2
wrong.</p>
<p class="calibre1">“Who did this....

I can write a regex to get lines without '.'
[^\.]</p>\n<p class="calibre1">

but all this does is highlight the first character in the found string as well (i.e. the "n" from "own" in the first example)

Is there any way to select the string but without removing that last character?
tankervin is offline   Reply With Quote
Old 01-11-2017, 03:00 AM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Have you tried something like:
</p>$^<p class="calibre1">
If I have got my RE correct this should look for a line ending with the <p> tag and the next line starting with the open paragraph tag. You would want to replace this all by a single space.
itimpi is offline   Reply With Quote
Advert
Old 01-11-2017, 03:49 AM   #3
tankervin
Junior Member
tankervin began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2017
Device: none
The thing is - I only want to replace the lines that don't end with a fullstop '.'.
tankervin is offline   Reply With Quote
Old 01-12-2017, 04:23 PM   #4
nabsltd
Evangelist
nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.nabsltd ought to be getting tired of karma fortunes by now.
 
Posts: 423
Karma: 6913952
Join Date: Aug 2013
Location: Hamden, CT
Device: Kindle Paperwhite (11th gen), Scribe
Quote:
Originally Posted by tankervin View Post
The thing is - I only want to replace the lines that don't end with a fullstop '.'.
Try:
Code:
([^.])</p>$^<p class="calibre1">
Then, replace it with "\1 ". The "\1" puts that last character in the replace.
nabsltd is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Line breaks when converting to pdf maffia Conversion 2 05-05-2015 03:27 AM
Removing paragraph breaks present after every line in EPUB? Snakey Calibre 6 12-17-2010 11:08 AM
Removing unnecessary line breaks in books. Wintersdark Calibre 17 09-04-2010 04:34 AM
Removing Line-breaks / Preserving Paragraphs ahi Workshop 5 06-08-2009 02:22 AM
Removing extra line breaks plemming Calibre 0 07-31-2008 07:50 PM


All times are GMT -4. The time now is 09:06 PM.


MobileRead.com is a privately owned, operated and funded community.