04-24-2015, 08:04 AM | #916 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
i would infer from the class name- mbpagebreak - that this is obsolete code, as all readers now page break anyway on start of next html file.
( googling the term seems to confirm that it's more likely to have been inserted by conversion tools than by original publishers; someone more familiar with the kindlegen process as applied to author uploads would know if amazon themselves add it ) so having this trailing at end of each chapter is redundant and can also generate spurious blank pages, depending on how the reader apps deal with an empty div. i'd class it as redundant, like you do un-needed spans, and strip it because it is clutter that has no effect on how the epub is rendered. I am pretty sure I have also seen it in some epubs that have not been generated/processed by calibre |
04-24-2015, 08:29 AM | #917 |
Grand Sorcerer
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
I think the pagebreak constructs that look like
Code:
<div class="mbppagebreak" id="calibre_pb_16"></div> Code:
<mbp:pagebreak/> Last edited by jackie_w; 04-24-2015 at 08:34 AM. |
Advert | |
|
04-24-2015, 08:40 AM | #918 | |
Wizard
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
For instance, I've got several Kindle books that pack multiple chapters into one HTML file. I wouldn't want to remove those internal page-breaks, because then the chapters wouldn't get separated, and that starts interfering with the intended presentation. ("But you can split chapters by having calibre look for headers!" "Not if the author uses paragraphs with big fonts instead of header tags." "Why would anyone do that?" "Got me... but it happens. Go figure.") Then there's the question of how reliable the class name is. Maybe it's a native EPUB where the author manually placed a page-break element at the bottom of each document, only he called his class "pbr" because it's nice and short. Same effect, different code. The plugin's simply not built to handle "if the document starts or ends in an empty element whose only function is to generate a page break, remove that element." It's not smart enough for that; that takes human intervention, just as "remove any blank-space paragraphs at the end of a document" does. (Yeah, that happens, too. I've even seen one publisher whose chapters gain an extra containing DIV as the book progresses. Chapter One might have two, Chapter Two three, up until Chapter Ninety-Seven having 98 of the damned things. Ebook code is easy to screw up.) However, if I can verify that the class name is stable, I could possibly remove such elements at the top or bottom of the BODY element. Maybe. It depends, on those and other factors, and my natural inclination is to not use automation to interfere unless I can be confident in the results. I hate changing that stuff by hand, too, but I hate it much less than trying to figure out where things used to be. So: While I may be able to do something, I'm not committing to it without having the chance to investigate. Could happen, could not, too soon to tell. Shake the 8-ball, ask again later. |
|
04-24-2015, 10:37 AM | #919 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
I think jackie_w has nailed it,
the class name and it's location and original purpose is stable, and it would only exist where a book has been sold in legacy mobi format, not in azw, and has then been converted. so arguably it is a calibre conversion artifact and can be detected as such ? i.e. I'd settle for removal only of matches for <div class="mbppagebreak" id="calibre_pb_\d+"></div> that's probably how I zap them in sigil: I had a quick look at my recent sigil find/replace but any like that have dropped off my recently used list |
04-24-2015, 10:48 AM | #920 | |
Wizard
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
Now, if it's confined to right after <body *> or right before </body>, that's a different story - but what you describe is not limited in that way, and therefore I will not do it. I may or may not elect to build any sort of mbppagebreak processing into the plugin, but I have already decided that much. |
|
Advert | |
|
04-24-2015, 11:24 AM | #921 | |
Well trained by Cats
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
04-24-2015, 11:28 AM | #922 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
ok - all I can say is that I've been doing that regex remove manually for 2- 3 years, over 100 books I ma sure - & I have never seen that construction anywhere except at the end of a file. it maybe that calibre always breaks after one of those, so that is logically impossible for a calibre conversion to leave only the middle of a html file ( on default structure detect settings anyway )
but ok , tweak to: find <div class="mbppagebreak" id="calibre_pb_\d+"></div> </body> replace </body> |
04-24-2015, 12:07 PM | #923 | |
Wizard
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
|
|
04-24-2015, 03:39 PM | #924 |
a toy panda
Posts: 2,567
Karma: 26020474
Join Date: Mar 2014
Location: Onboard the Queen Anne's Revenge
Device: Various Android dvices
|
Tested a PD book from: https://www.mobileread.com/forums/sho...d.php?t=259583 and got the <div class="mbp_pagebreak" ...> Inserted into the epub after converting the mobi to epub. Yet unpacking the azw3 file, the resulting epub does not have this inserted.
So my guess it's formatting inserted during the conversion by calibre, and can safely be removed. |
04-24-2015, 04:06 PM | #925 |
Grand Sorcerer
Posts: 28,044
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The problem with assuming that it's calibre added stuff that can be safely removed, is that an ebook could have been edited by someone AFTER the calibre conversion which added the mbppagebreak div stuff. Where files were split/and merged any number of different unforeseen ways (or code copied and pasted to somewhere where the pagebreak IS performing a wanted function in the middle of a file).
I agree that only the ones immediately following the <body> tag, or immediately preceding the </body> tag can be safely removed wholesale. Last edited by DiapDealer; 04-24-2015 at 04:30 PM. |
04-24-2015, 04:31 PM | #926 |
Well trained by Cats
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I would include any empty (non-text/Image) tag pairs in those locations
IMHO Margins should be used to supply top or bottom whitespace I see no purpose in a end-of-file anchor either. |
04-24-2015, 04:41 PM | #927 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
It would be a non issue, display wise, if all renderers worked to same rules. But some add a blank line when they hit an empty tag pair, and some don't. I Sam not sure if ANY actually perform a page break!
|
04-24-2015, 05:00 PM | #928 |
Grand Sorcerer
Posts: 28,044
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
04-24-2015, 05:34 PM | #929 |
....
Posts: 1,547
Karma: 18068960
Join Date: May 2012
Device: ....
|
That is the approach that I would go along with, for the reasons that were given, and would do the complete job in most cases.
|
04-24-2015, 05:48 PM | #930 | ||
Wizard
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
|
Quote:
Quote:
The second book I opened, copyright 2014 and converted in January 2015, repeatedly uses <div class="mbp_pagebreak"/> in the middle of its two text documents to separate chapters. (The first document is frontmatter and a serial story, and the second is an unrelated story with backmatter.) There is an additional instance at the top of the second document. I am strongly tempted to label this a Calibre issue, an artifact of the conversion process that should be handled by adjusting that feature. That doesn't do anything about any existing conversions, though, so I haven't completely (ahem) closed the book on it yet. If I do include processing for this, it'll definitely be tied to "is a BODY tag adjacent?" and will handle cases - such as this one - where there's no "calibre_pb_\d+" ID attribute present. That does make things more complicated, though, and further feedback is welcome. Meanwhile, I've received a copy of the page-count plugin and information on the checkbox tweaks, so I can look into that. If they're as minor as they sound, I have no qualms about porting them over. Optional feature, doesn't break anything - sounds like a win. |
||
Tags |
modify epub |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1214 | 11-14-2024 12:05 PM |
[GUI Plugin] Manage Series | kiwidude | Plugins | 167 | 07-28-2024 04:07 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
Modify ePub plugin dev thread | kiwidude | Development | 346 | 09-02-2013 06:14 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |