10-14-2009, 03:44 AM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Oct 2009
Device: prs 505
|
Remove Footer
Hi,
I have been thinkering about this now for two days and couldn't find a clue. My problem is I want to convert a bought pdf to epub (so sorry no sample - but I try to find one). The page numbers of the pdf show up just somewhere in between the text so I want them gone. In the regex window (the one that does come up after clicking the magic wand next to "remove footer") the page numbers show up as Code:
45 </p><p> Code:
\d+ *</p><p> Code:
[0-9]+ *</p><p> Code:
\d+ *\<\/p\>\<p\> Just as test I tried Code:
\d+ Also I tried Code:
ebook-convert book.pdf .epub --debug-pipeline So does anybody have an idea what I could try? Thanks. btw. I'm using calibre 0.6.17 on OSX Snow Leopard. |
10-14-2009, 05:59 AM | #2 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
The header removal does appear to be broken. Can you open a ticket so I don't forget to fix it?
Also, --debug-pipeline has been renamed to --debug. |
Advert | |
|
10-14-2009, 06:47 AM | #3 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
I take that back, it is working. Make sure you have the remove footer checkbox checked above the regex.
|
10-14-2009, 10:08 AM | #4 |
Junior Member
Posts: 9
Karma: 10
Join Date: Oct 2009
Device: prs 505
|
Ah that --debug helps a lot.
If I look at the produced directories i think the regex window shows the "parsed" subdirectory. But the regexp acts on the "input". Also I noticed in the regexp I need to replace all the " " with "\xA0" |
10-14-2009, 10:48 AM | #5 |
Junior Member
Posts: 9
Karma: 10
Join Date: Oct 2009
Device: prs 505
|
Ok now I'm happy. I found it in the "input" produced by debug.
The regexp that worked for me: Code:
\d+\xA0*<br> |
Advert | |
|
10-14-2009, 05:18 PM | #6 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
|
10-14-2009, 05:33 PM | #7 |
Junior Member
Posts: 9
Karma: 10
Join Date: Oct 2009
Device: prs 505
|
That would be great.
Thanks for the quick pointers also. |
11-19-2009, 03:21 PM | #8 |
Enthusiast
Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
Hello,
i've found this topic after looking for something about the same. i've tried it yesterday at work with an older Version (i think it was 0.6.22) and it worked without any problems, but today, at an other PC i tried to use the same regexp but it wont work (even with the same pdf file) in the input-section with debug it shows up like this: Code:
11 <br> Code:
(\d+\xA0*<br>) |
11-19-2009, 04:28 PM | #9 | |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
|
|
11-19-2009, 04:38 PM | #10 |
Enthusiast
Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
so now the regexp is applyed to the "parsed"-output of the debug-folder?
in my case, the new regexp should be Code:
(\d+\s*</p><p>) Code:
11 </p><p> |
11-19-2009, 04:48 PM | #11 |
creator of calibre
Posts: 44,276
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
EDIT: Never mind
|
11-20-2009, 03:31 AM | #12 |
Enthusiast
Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
i still don't get it to work, ...
in the debug-input-folder it shows me the following string: Code:
11 <br> Code:
11 </p><p> Code:
\d+\xA0*<br> I tryed to use \d+\s+, which works, is applyed somewhere between the input and the parsed section, but isn't quite what i wantet, since it removes every digit in the file followed by a whitespace. When exactly is the regex applyed to the file? Right on the input-file, or are there some steps in the pipline that change that code bevore applying the regex? |
11-22-2009, 01:29 PM | #13 |
Enthusiast
Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
Isn't anyone else having trouble removing page numbers from their pdfs?
Bevore applying the regex to the input-section, what happens to the tag "<br>"? it seems like its not there anymore when the regex is applyed. how do i have to change my previous regex (used in 0.6.16) to get it to work with the newer Versions of calibre (what is a great program in my eyes)? Last edited by matthias; 11-23-2009 at 05:55 AM. |
11-23-2009, 06:15 AM | #14 |
Enthusiast
Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
i have been able to resolve my problem using:
Code:
(\d+\s+<p>) To me it looks like the regex is applyed somewhere in between converting the input to the parsed, but for sure before closing the tags (what in my eyes isn't very user-friendly or at least confusing, since noone seems to know it, and its neighter shown in the input nor the parsed-folder) Also, i think the preconfigured regex (immediately after the installation) should be adjusted to the new practice, because i don't think any footer or header-removement will be done with the standard-regex anymore. i hope someone of the developers will take care of this problem, or at least explain what the thougs behind this are, so we can understand better when exactly the regex is applyed. |
11-23-2009, 03:02 PM | #15 | |
Junior Member
Posts: 1
Karma: 10
Join Date: Nov 2009
Device: iPod touch
|
Quote:
I've tried "<br>" and "</p><p>" in my expression and they both didn't work. Using only "<p>" works. |
|
Tags |
calibre pdf footer remove |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Structure Detection - Remove Header (or Footer) Regex | DarkKipper | Conversion | 69 | 11-09-2013 12:21 PM |
Regex help to remove HTML footer | neonbible | Calibre | 4 | 09-09-2010 09:42 AM |
footer removal help | icy | Calibre | 7 | 08-27-2010 01:21 PM |
remove PDF footer containing variable? | irisclara | Calibre | 10 | 03-06-2010 10:53 PM |
RFE: Remove remove tags in bulk edit | magphil | Calibre | 0 | 08-11-2009 10:37 AM |