10-03-2010, 02:43 PM | #1 |
Connoisseur
Posts: 94
Karma: 192
Join Date: Jul 2010
Location: Toronto
Device: Kindle 3, Kobo Mini, Kobo Touch
|
[Old Thread] Removing page numbers.
I have an epub book that has page numbers hardcoded into it. Is it possible to have Calibre remove them automatically when I convert it to mobi for Kindle use?
It's a one or two digit number on its own paragraph, but it also breaks existing paragraph flow. I guess the file was converted from PDF at some point? Last edited by ChaoZ; 10-03-2010 at 03:45 PM. |
10-03-2010, 04:16 PM | #2 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
you could try converting it back to a pdf and using a program to crop the pages.
google crop pdf for utilties to do this Works sometimes and sometimes not Helen |
Advert | |
|
10-03-2010, 04:29 PM | #3 |
Guru
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Open up the epub in Sigil, do a search for
Code:
<p>(\d+)</p> replace with blank string. if your sure there's only page numbers in paragraphs, you can do a replace all, but to be safe, step through, by replacing each one individually, then if there is something odd you'll pick it up. (such as date / year - numbers out of order from page order is a giveaway) |
10-03-2010, 05:00 PM | #4 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Calibre has the option to remove headers and/or footers in the conversion process. See the structure detection- part of the conversion settings.
|
10-03-2010, 06:58 PM | #5 |
Connoisseur
Posts: 94
Karma: 192
Join Date: Jul 2010
Location: Toronto
Device: Kindle 3, Kobo Mini, Kobo Touch
|
I don't think it's formatted as a header or a footer though.
I broke open the epub using the Tweak option and saw it was actually a paragraph tag. I also noticed what seems like a bad OCR job. Looks like the file is just bad. |
Advert | |
|
10-03-2010, 07:10 PM | #6 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
It doesn't matter how it is formatted, if it can be described by a regexp, Calibre can remove it. But be careful, you could easily remove something you don't want to remove.
|
05-28-2013, 04:21 PM | #7 |
Wizard
Posts: 1,035
Karma: 11227259
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
Is there a Regex crack who could help?
I have some ebooks where, in the current of the text, appear page numbers (probably referring to the original printed version), sometimes even with hyperlink referring to the original TOC. I would like to delete them, but have no clue on regex matters. In one particular book, the numbers appear in squared brackets, such as [Pg 4]. Those numbers have up to three digits. The tags are like this: <span class="pagenum"><a class="pcalibre pcalibre1" id="Page_4">[Pg 4]</a></span>. Is there a way of removing them by one single regex command in Sigil or Calibre? Thanks in advance! And please, remember, I'm completely ignorant in this field. Hope someone reads this, the thread beeing quite old. |
10-14-2014, 09:52 PM | #8 | |
Junior Member
Posts: 1
Karma: 10
Join Date: Oct 2014
Device: none
|
I know that this thread is reallly old, and Leonatus's request is also really old, but heres a regex for you and a break down
Quote:
\w* matches zero or more word characters (alphanumeric and _) [A-Za-z_ \d]* matches zero or more characters between A-Z, a-z or _ (underscore) or a digit or a space everything else is pretty much an exact match. the random backslashes are escape characters (so the following char isn't interpreted as a regex part) e.g \/ escapes / char, and \[ escapes [ char http://www.regexr.com/ is a great website for quickly and easily building regexs, and they have a reference sidebar so you can look up syntax etc. Last edited by WhiteAbeLincoln; 10-14-2014 at 09:54 PM. |
|
10-20-2014, 04:02 PM | #9 |
Wizard
Posts: 1,035
Karma: 11227259
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
|
WhiteAbeLincoln, thanks heartily for your help, having passed some time, though. By hasard, the problem that I referred above came to me again just last week, and I removed the items manually. Next time, however, I shall be pleased to test your proposal. Luckily, I'm no longer thus ignorant in the business as I wrote a year ago - but still far away from beeing expert.
|
Tags |
epub, page numbers |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing page numbers and spaces from Mobi files | rabbischle | Conversion | 4 | 06-10-2011 04:03 AM |
RegEx: Removing Page Numbers that have Spaces | captainslow | Conversion | 2 | 02-27-2011 05:14 PM |
Removing headers/page numbers | greycobalt | Calibre | 3 | 10-10-2010 02:57 PM |
Removing Page Numbers | ManosHandsOfFate | Calibre | 6 | 09-28-2010 01:12 PM |
Removing page numbers? | Cap.T | Calibre | 1 | 02-21-2010 10:57 AM |