01-12-2010, 08:03 AM | #1 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Problem with regular expressions
I'm having some trouble writing a regular expression to delete page headers in the conversion options. The page header I'm trying to delete basically looks like
Code:
<p class="calibre1"> Title</p><p class="calibre1"> Page 42 of 230</p> Code:
Title</p><p class="calibre1">\nPage [0-9]* of [0-9]* Seeing as I've never used regular expressions before and just skimmed over the Calibre user manual to piece it together, I'm sure there's something I'm missing, but I cant figure out what it is. What I can figure out is that I somehow don't get how to match a newline. Could anyone help? Last edited by Manichean; 01-12-2010 at 08:07 AM. |
01-12-2010, 11:31 AM | #2 |
creator of calibre
Posts: 44,564
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try
Title</p><p class="calibre1">[^<]+</p> |
Advert | |
|
01-26-2010, 09:55 AM | #3 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Unfortunately, that doesn't work. Same problem, it just gets confused about the linebreak.
I thought about maybe passing a flag that the string it should match is on multiple lines, but I don't know how to do this and currently, I'm too busy to figure it out. I'll post again once I find a solution. |
02-02-2011, 11:21 PM | #4 |
Zealot
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
I messed with this a little. I don't know exactly what you are looking for but here is what I have. This should only match on a number followed by a </p> followed by an end of the line.
Search: of \d+</p>$ Replace <\p> So this will find the last line of your three lines with the page number followed by the <\p> at the end of a line. Then replace only the <\p>. It looks like this before: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sides, and above the jacket collar behind, uncombed. Both beards were short and scant. <p class="calibre1"> Title</p><p class="calibre1"> Page 42 of 230</p> The man from the east wore a standard straight sword, the plastic <<<<<<<<<<<<<<<<<<<<<<<<<<<<< And now after: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <p class="calibre1"> Title</p><p class="calibre1"> Page 42 </p> <<<<<<<<<<<<<<<<<<<<<<<<<<<< This just removes the "of XXX" page numbering part. Is that what you were after? Archon |
02-03-2011, 12:39 AM | #5 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Did you try this:
Code:
Title</p><p class="calibre1">\s*Page\s*[0-9]+\s*of\s*[0-9]+ Edit: Note if non-breaking spaces are your problem you can create a character class to include them. Instead of \s*, use this: [\su00a0]* Last edited by ldolse; 02-03-2011 at 12:43 AM. |
Advert | |
|
02-03-2011, 04:40 AM | #6 | |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
How about a nice simple
Quote:
|
|
02-03-2011, 04:57 AM | #7 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
You people do realize that this thread is about a year old? I solved that issue quite some time ago. (The solution was me stopping to be stupid, by the way.)
|
02-03-2011, 06:31 AM | #8 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
I thought it was odd, that you, who done the regex faq/guide couldn't manage it.
I did look at date, and thought orig post was December. |
02-03-2011, 06:34 AM | #9 | |
Zealot
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
Quote:
BTW what was your solution (besides stopping being stupid as you say)? Maybe we could all learn from your experience. Archon |
|
02-03-2011, 06:42 AM | #10 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
The problem was that I didn't use the regex wizard to test it, basically. I tried to use Notepad++, which doesn't allow for multiline regex matching. (I only found that out while writing the guide, actually.) The reason I did that was that I felt Notepad++ would be faster than Calibre, and I didn't fully understand the wizard. Also, had I known about character classes, especially \s, I might have found a solution sooner.
|
02-03-2011, 03:27 PM | #11 |
Zealot
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
Thanks for your wisdom.
I will pass that along to my PeeCee using mates. Archon |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom Regular Expressions for adding book information | bigbot3 | Calibre | 1 | 12-25-2010 07:28 PM |
Regular expressions, Calibre and you- an introduction (Archived) | Manichean | Conversion | 80 | 11-11-2010 08:37 AM |
Help with Regular Expressions | ghostyjack | Workshop | 2 | 01-08-2010 12:04 PM |
Regular Expressions help needed | Phil_C | Workshop | 20 | 10-03-2009 01:14 AM |
BookDesigner v5 and regular expressions | ShineOn | Sony Reader | 11 | 08-25-2008 05:06 PM |