Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 04-24-2015, 08:04 AM   #916
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i would infer from the class name- mbpagebreak - that this is obsolete code, as all readers now page break anyway on start of next html file.

( googling the term seems to confirm that it's more likely to have been inserted by conversion tools than by original publishers; someone more familiar with the kindlegen process as applied to author uploads would know if amazon themselves add it )

so having this trailing at end of each chapter is redundant and can also generate spurious blank pages, depending on how the reader apps deal with an empty div.

i'd class it as redundant, like you do un-needed spans, and strip it because it is clutter that has no effect on how the epub is rendered.

I am pretty sure I have also seen it in some epubs that have not been generated/processed by calibre
cybmole is offline   Reply With Quote
Old 04-24-2015, 08:29 AM   #917
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
I think the pagebreak constructs that look like
Code:
<div class="mbppagebreak" id="calibre_pb_16"></div>
are what you get when you convert an old-style MOBI using calibre. I used the 'Kindle Unpack' plugin to unpack a few of my old Amazon original MOBIs at random and the source code looked like this in all of them
Code:
<mbp:pagebreak/>
ETA: As the unpacked markup code is all in a single file I'm assuming calibre uses them to decide where to split into multiple files.

Last edited by jackie_w; 04-24-2015 at 08:34 AM.
jackie_w is offline   Reply With Quote
Advert
Old 04-24-2015, 08:40 AM   #918
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by cybmole View Post
so having this trailing at end of each chapter is redundant and can also generate spurious blank pages, depending on how the reader apps deal with an empty div.

i'd class it as redundant, like you do un-needed spans, and strip it because it is clutter that has no effect on how the epub is rendered.
Oh, I quite understand the reasoning, and even agree with the principle. The trouble is that it's dangerous to make assumptions without thorough investigation.

For instance, I've got several Kindle books that pack multiple chapters into one HTML file. I wouldn't want to remove those internal page-breaks, because then the chapters wouldn't get separated, and that starts interfering with the intended presentation.

("But you can split chapters by having calibre look for headers!"
"Not if the author uses paragraphs with big fonts instead of header tags."
"Why would anyone do that?"
"Got me... but it happens. Go figure.")

Then there's the question of how reliable the class name is. Maybe it's a native EPUB where the author manually placed a page-break element at the bottom of each document, only he called his class "pbr" because it's nice and short. Same effect, different code.

The plugin's simply not built to handle "if the document starts or ends in an empty element whose only function is to generate a page break, remove that element." It's not smart enough for that; that takes human intervention, just as "remove any blank-space paragraphs at the end of a document" does. (Yeah, that happens, too. I've even seen one publisher whose chapters gain an extra containing DIV as the book progresses. Chapter One might have two, Chapter Two three, up until Chapter Ninety-Seven having 98 of the damned things. Ebook code is easy to screw up.)

However, if I can verify that the class name is stable, I could possibly remove such elements at the top or bottom of the BODY element. Maybe. It depends, on those and other factors, and my natural inclination is to not use automation to interfere unless I can be confident in the results. I hate changing that stuff by hand, too, but I hate it much less than trying to figure out where things used to be.

So: While I may be able to do something, I'm not committing to it without having the chance to investigate. Could happen, could not, too soon to tell. Shake the 8-ball, ask again later.
Rev. Bob is offline   Reply With Quote
Old 04-24-2015, 10:37 AM   #919
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
I think jackie_w has nailed it,
the class name and it's location and original purpose is stable, and it would only exist where a book has been sold in legacy mobi format, not in azw, and has then been converted.

so arguably it is a calibre conversion artifact and can be detected as such ?
i.e. I'd settle for removal only of matches for

<div class="mbppagebreak" id="calibre_pb_\d+"></div>

that's probably how I zap them in sigil: I had a quick look at my recent sigil find/replace but any like that have dropped off my recently used list
cybmole is offline   Reply With Quote
Old 04-24-2015, 10:48 AM   #920
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by cybmole View Post
I'd settle for removal only of matches for

<div class="mbppagebreak" id="calibre_pb_\d+"></div>
I wouldn't, for the reason I gave above: that would catch matches in the middle of a document, instead of only at the beginning and/or end. I've already explained why that's a terrible idea. You can do so manually if you wish, but I will not.

Now, if it's confined to right after <body *> or right before </body>, that's a different story - but what you describe is not limited in that way, and therefore I will not do it. I may or may not elect to build any sort of mbppagebreak processing into the plugin, but I have already decided that much.
Rev. Bob is offline   Reply With Quote
Advert
Old 04-24-2015, 11:24 AM   #921
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cybmole View Post
i would infer from the class name- mbpagebreak - that this is obsolete code, as all readers now page break anyway on start of next html file.

( googling the term seems to confirm that it's more likely to have been inserted by conversion tools than by original publishers; someone more familiar with the kindlegen process as applied to author uploads would know if amazon themselves add it )

so having this trailing at end of each chapter is redundant and can also generate spurious blank pages, depending on how the reader apps deal with an empty div.

i'd class it as redundant, like you do un-needed spans, and strip it because it is clutter that has no effect on how the epub is rendered.

I am pretty sure I have also seen it in some epubs that have not been generated/processed by calibre
CAUTION That code can also exist MID file. (probably from the I want only 1 file crowd)
theducks is offline   Reply With Quote
Old 04-24-2015, 11:28 AM   #922
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
ok - all I can say is that I've been doing that regex remove manually for 2- 3 years, over 100 books I ma sure - & I have never seen that construction anywhere except at the end of a file. it maybe that calibre always breaks after one of those, so that is logically impossible for a calibre conversion to leave only the middle of a html file ( on default structure detect settings anyway )
but ok , tweak to:
find
<div class="mbppagebreak" id="calibre_pb_\d+"></div>
</body>

replace
</body>
cybmole is offline   Reply With Quote
Old 04-24-2015, 12:07 PM   #923
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by cybmole View Post
but ok , tweak to:
find
<div class="mbppagebreak" id="calibre_pb_\d+"></div>
</body>

replace
</body>
Or, in other words:

Quote:
Originally Posted by Rev. Bob View Post
Now, if it's confined to right after <body *> or right before </body>, that's a different story - but what you describe is not limited in that way, and therefore I will not do it.
Rev. Bob is offline   Reply With Quote
Old 04-24-2015, 03:39 PM   #924
PandathePanda
a toy panda
PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.PandathePanda ought to be getting tired of karma fortunes by now.
 
PandathePanda's Avatar
 
Posts: 2,567
Karma: 26020474
Join Date: Mar 2014
Location: Onboard the Queen Anne's Revenge
Device: Various Android dvices
Tested a PD book from: https://www.mobileread.com/forums/sho...d.php?t=259583 and got the <div class="mbp_pagebreak" ...> Inserted into the epub after converting the mobi to epub. Yet unpacking the azw3 file, the resulting epub does not have this inserted.

So my guess it's formatting inserted during the conversion by calibre, and can safely be removed.
PandathePanda is offline   Reply With Quote
Old 04-24-2015, 04:06 PM   #925
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,044
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The problem with assuming that it's calibre added stuff that can be safely removed, is that an ebook could have been edited by someone AFTER the calibre conversion which added the mbppagebreak div stuff. Where files were split/and merged any number of different unforeseen ways (or code copied and pasted to somewhere where the pagebreak IS performing a wanted function in the middle of a file).

I agree that only the ones immediately following the <body> tag, or immediately preceding the </body> tag can be safely removed wholesale.

Last edited by DiapDealer; 04-24-2015 at 04:30 PM.
DiapDealer is offline   Reply With Quote
Old 04-24-2015, 04:31 PM   #926
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I would include any empty (non-text/Image) tag pairs in those locations
IMHO Margins should be used to supply top or bottom whitespace

I see no purpose in a end-of-file anchor either.
theducks is offline   Reply With Quote
Old 04-24-2015, 04:41 PM   #927
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
It would be a non issue, display wise, if all renderers worked to same rules. But some add a blank line when they hit an empty tag pair, and some don't. I Sam not sure if ANY actually perform a page break!
cybmole is offline   Reply With Quote
Old 04-24-2015, 05:00 PM   #928
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,044
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by cybmole View Post
I Sam not sure if ANY actually perform a page break!
What do you mean? Almost all renderers will perform a page break if page-break-(before|after: always) is assigned to the class. Or did you mean something else?
DiapDealer is offline   Reply With Quote
Old 04-24-2015, 05:34 PM   #929
AnotherCat
....
AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.AnotherCat ought to be getting tired of karma fortunes by now.
 
Posts: 1,547
Karma: 18068960
Join Date: May 2012
Device: ....
Quote:
Originally Posted by DiapDealer View Post
...I agree that only the ones immediately following the <body> tag, or immediately preceding the </body> tag can be safely removed wholesale.
That is the approach that I would go along with, for the reasons that were given, and would do the complete job in most cases.
AnotherCat is offline   Reply With Quote
Old 04-24-2015, 05:48 PM   #930
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by PandathePanda View Post
Tested a PD book from: https://www.mobileread.com/forums/sho...d.php?t=259583 and got the <div class="mbp_pagebreak" ...> Inserted into the epub after converting the mobi to epub. Yet unpacking the azw3 file, the resulting epub does not have this inserted.

So my guess it's formatting inserted during the conversion by calibre, and can safely be removed.
Quote:
Originally Posted by DiapDealer View Post
The problem with assuming that it's calibre added stuff that can be safely removed, is that an ebook could have been edited by someone AFTER the calibre conversion which added the mbppagebreak div stuff. Where files were split/and merged any number of different unforeseen ways (or code copied and pasted to somewhere where the pagebreak IS performing a wanted function in the middle of a file).
Okay, now that I've had a chance to look through some of my Kindle -> EPUB conversions...

The second book I opened, copyright 2014 and converted in January 2015, repeatedly uses <div class="mbp_pagebreak"/> in the middle of its two text documents to separate chapters. (The first document is frontmatter and a serial story, and the second is an unrelated story with backmatter.) There is an additional instance at the top of the second document.

I am strongly tempted to label this a Calibre issue, an artifact of the conversion process that should be handled by adjusting that feature. That doesn't do anything about any existing conversions, though, so I haven't completely (ahem) closed the book on it yet.

If I do include processing for this, it'll definitely be tied to "is a BODY tag adjacent?" and will handle cases - such as this one - where there's no "calibre_pb_\d+" ID attribute present. That does make things more complicated, though, and further feedback is welcome.

Meanwhile, I've received a copy of the page-count plugin and information on the checkbox tweaks, so I can look into that. If they're as minor as they sound, I have no qualms about porting them over. Optional feature, doesn't break anything - sounds like a win.
Rev. Bob is offline   Reply With Quote
Reply

Tags
modify epub


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1214 11-14-2024 12:05 PM
[GUI Plugin] Manage Series kiwidude Plugins 167 07-28-2024 04:07 PM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 09:39 AM
Modify ePub plugin dev thread kiwidude Development 346 09-02-2013 06:14 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 01:27 PM


All times are GMT -4. The time now is 06:30 AM.


MobileRead.com is a privately owned, operated and funded community.