Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-12-2010, 08:03 AM   #1
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Problem with regular expressions

I'm having some trouble writing a regular expression to delete page headers in the conversion options. The page header I'm trying to delete basically looks like
Code:
<p class="calibre1">
Title</p><p class="calibre1">
Page 42 of 230</p>
so I figured the regexp needed should look like
Code:
Title</p><p class="calibre1">\nPage [0-9]* of [0-9]*
to match the part from "Title" to the total page number, which is what I want to remove. Now, this works fine if I just use the part up to "\n" or the part after it, which matches the first or the second line I want removed, respectively. But as soon as I try to cobble the two lines together, I don't get any match. I've tried every variation of \n,\s and so forth that I could think of, including slapping some * and ? behind it and fooling around with groups, nothing seems to work.
Seeing as I've never used regular expressions before and just skimmed over the Calibre user manual to piece it together, I'm sure there's something I'm missing, but I cant figure out what it is. What I can figure out is that I somehow don't get how to match a newline. Could anyone help?

Last edited by Manichean; 01-12-2010 at 08:07 AM.
Manichean is offline   Reply With Quote
Old 01-12-2010, 11:31 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,564
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try

Title</p><p class="calibre1">[^<]+</p>
kovidgoyal is online now   Reply With Quote
Advert
Old 01-26-2010, 09:55 AM   #3
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Unfortunately, that doesn't work. Same problem, it just gets confused about the linebreak.
I thought about maybe passing a flag that the string it should match is on multiple lines, but I don't know how to do this and currently, I'm too busy to figure it out. I'll post again once I find a solution.
Manichean is offline   Reply With Quote
Old 02-02-2011, 11:21 PM   #4
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
I messed with this a little. I don't know exactly what you are looking for but here is what I have. This should only match on a number followed by a </p> followed by an end of the line.
Search:
of \d+</p>$
Replace
<\p>

So this will find the last line of your three lines with the page number followed by the <\p> at the end of a line. Then replace only the <\p>. It looks like this before:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
sides, and above the jacket collar behind, uncombed. Both beards were short and scant.
<p class="calibre1">
Title</p><p class="calibre1">
Page 42 of 230</p>

The man from the east wore a standard straight sword, the plastic
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

And now after:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<p class="calibre1">
Title</p><p class="calibre1">
Page 42 </p>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<

This just removes the "of XXX" page numbering part.
Is that what you were after?

Archon
Archon is offline   Reply With Quote
Old 02-03-2011, 12:39 AM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Did you try this:
Code:
Title</p><p class="calibre1">\s*Page\s*[0-9]+\s*of\s*[0-9]+
Is it showing up correctly as matching in the regex wizard, but not act removing it during conversion? Usually when this happens it's one of two things - there are also non-breaking spaces hiding amongst the real spaces, or there is a bug/limitation where Calibre is showing you html in the wizard that's not exactly the same as the html that is provided to the Search and Replace feature during conversion.

Edit:
Note if non-breaking spaces are your problem you can create a character class to include them. Instead of \s*, use this: [\su00a0]*

Last edited by ldolse; 02-03-2011 at 12:43 AM.
ldolse is offline   Reply With Quote
Advert
Old 02-03-2011, 04:40 AM   #6
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
How about a nice simple
Quote:
Title</p>.*(?=</p>)
Which should match 'Title</p>' and everything upto but not including the next '</p>'
Perkin is offline   Reply With Quote
Old 02-03-2011, 04:57 AM   #7
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
You people do realize that this thread is about a year old? I solved that issue quite some time ago. (The solution was me stopping to be stupid, by the way.)
Manichean is offline   Reply With Quote
Old 02-03-2011, 06:31 AM   #8
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
I thought it was odd, that you, who done the regex faq/guide couldn't manage it.
I did look at date, and thought orig post was December.
Perkin is offline   Reply With Quote
Old 02-03-2011, 06:34 AM   #9
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Quote:
You people do realize that this thread is about a year old? I solved that issue quite some time ago. (The solution was me stopping to be stupid, by the way.)
It's never too late to help a brother out. :-)

BTW what was your solution (besides stopping being stupid as you say)?

Maybe we could all learn from your experience.

Archon
Archon is offline   Reply With Quote
Old 02-03-2011, 06:42 AM   #10
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
The problem was that I didn't use the regex wizard to test it, basically. I tried to use Notepad++, which doesn't allow for multiline regex matching. (I only found that out while writing the guide, actually.) The reason I did that was that I felt Notepad++ would be faster than Calibre, and I didn't fully understand the wizard. Also, had I known about character classes, especially \s, I might have found a solution sooner.
Manichean is offline   Reply With Quote
Old 02-03-2011, 03:27 PM   #11
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Thanks for your wisdom.

I will pass that along to my PeeCee using mates.

Archon
Archon is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 07:28 PM
Regular expressions, Calibre and you- an introduction (Archived) Manichean Conversion 80 11-11-2010 08:37 AM
Help with Regular Expressions ghostyjack Workshop 2 01-08-2010 12:04 PM
Regular Expressions help needed Phil_C Workshop 20 10-03-2009 01:14 AM
BookDesigner v5 and regular expressions ShineOn Sony Reader 11 08-25-2008 05:06 PM


All times are GMT -4. The time now is 11:43 AM.


MobileRead.com is a privately owned, operated and funded community.