01-20-2009, 09:55 PM | #1 |
Member
Posts: 22
Karma: 10
Join Date: Dec 2008
Device: Sony PRS-700
|
Any way to force page breaks when converting HTML to EPUB
I am new to this and thank you in advance for any patient explanations.
Reading the forums, I know that there's a raging debate about whether we need the page anymore with ebooks. Some celebrate that we can liberate text from the page and need maintain only those formatting elements necessary to understand how words and sections and headers related to each other. In essence, the book becomes an electronic scroll. However, a few of us believe that the innovation of the page-based codex, which began replacing the scroll, makes finding information within the text more efficient. Specifically, the codex makes communication about specific content with other readers easier and I've seen several posts by academics here saying they need to reference back to page numbers for when communicating with non-ebook readers. I'm in this second camp. I'm scanning pages primarily of text (and a few tables and pictures) to Abbyy FineReader and saving its OCR output as HTML. The HTML output looks great on my Sony PRS-700 when I use Calibre to convert it to ePUB. However, it would make me so happy if there was a way to force the reader to paginate according to breaks in the HTML rather than...arbitrarily. I have no idea how the reader manages pagination of the text. I know that its possible to insert a page break in an RTF and the Reader will break the page accordingly for a Calibre conversion to ePub. Is there any way to use Calibre to tell the Reader to break pages at <hr>, and nowhere else? As it is, the Reader averages turning 10 pages of epub -- each ending with an <hr>, noting an intended page break -- into 11 or 12 pages. If there are other ideas of how to edit my html to make the Reader understand my pagination desires, I'm all ears. The only alternative I know to maintain pagination is to use pdf reflow, but the results are much less attractive than html/epub. thanks. https://www.mobileread.com/forums/ima...sadd1/help.gif |
01-20-2009, 10:10 PM | #2 |
creator of calibre
Posts: 44,381
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can force page breaks at (almost) any point in the HTMl, but you cannot prevent page breaks from happening if the content between two of your forced page breaks is longer than the screen length, if you think about it for a minute, you'll understand why.
To force page breaks at <hr> tags use Code:
<style type="text/css"> hr {page-break-after:always;} </style> |
Advert | |
|
01-20-2009, 10:16 PM | #3 |
Member
Posts: 22
Karma: 10
Join Date: Dec 2008
Device: Sony PRS-700
|
Thanks for the quick reply. And yes, that does make some sense. Can you give me a basic understanding of what the "screen length" is?
Also, are there any ways -- pre-calibre conversion, or within calibre -- to force a longer screen length or reformat txt to fit within this proscribed screen length? All of my ePubs end up about 115% longer in page count than the originals I feed in. It's driving me nuts. |
01-20-2009, 10:46 PM | #4 |
creator of calibre
Posts: 44,381
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
In a reflowabe format, there is a logical page, i.e. the contents of the documents between two hard (forced) page breaks. This logical page will be split into any number of physical pages that depend on the size of the screen of the device used to view the file as well as the font size being used.
lines per screen = number of pixels in the physical screen in the vertical direction/number of pixels per line in the vertical direction You cant change the numerator. You can change the denominator by changing the font size, but since the reader can also change font sizes, your setting will only make sense at one size |
01-20-2009, 11:01 PM | #5 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
If so, check out Adobe's "EPUB Best Practices Guide," most recent version available in EPUB format from their Digital Publishing Technology website. It's the one place I've seen discussed Adobe's EPUB-extension "page map" facility, which lets you provide an explicit mapping of where AdobeDE determines numbered page boundaries to be. |
|
Advert | |
|
01-20-2009, 11:22 PM | #6 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Holy hell. I just realized -- and tested to confirm -- that this facility can actually be used to completely remove the marginal page numbers. The specifics are a bit trickier than one might desire, but it's doable.
|
01-20-2009, 11:26 PM | #7 |
creator of calibre
Posts: 44,381
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Wow, break out the champagne. I'm guessing you just map the entire book to a single page?
|
01-20-2009, 11:39 PM | #8 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Well, that's the tricky part... It seems that if a page-map is present, ADE won't display any flows which don't have any pages associated with them. So there has to be at least one page per file, but they can all have name="" with no problems. But then doing that means that you only have as many pages as you have flows, both for the purposes of the "x of y" status bar and using the number buttons for page-wise navigation. Kind of weird. So maybe the solution is to duplicate the default "1024 bytes == 1 page" manually, but with all the page names set to blank?
|
01-20-2009, 11:49 PM | #9 | |
creator of calibre
Posts: 44,381
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
01-21-2009, 03:25 AM | #10 |
book creator
Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
|
Well, I HATE those ADE page numbers forcing you to make at least a 15 px margin on the right side that looks WAY too big in Webkit- based readers. I would personally kiss llasram's feet if he had found the solution to get rid of those and I do not say such things lightly.
|
01-21-2009, 05:01 AM | #11 | |
creator of calibre
Posts: 44,381
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
01-21-2009, 07:26 AM | #12 | ||
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
Quote:
I'd be kind of leery of that myself... The OPS spec says that reader systems "should not" execute <script/>s, so using scripting to get essentially default behavior seems like a bad idea in the long run. |
||
01-21-2009, 12:00 PM | #13 |
Junior Member
Posts: 3
Karma: 10
Join Date: Jan 2009
Device: none
|
scanning pages primarily
I'm scanning pages primarily of text (and a few tables and pictures) to Abbyy FineReader and saving its OCR output as HTML. The HTML output looks great on my Sony PRS-700 when I use Calibre to convert it to ePUB. However, it would make me so happy if there was a way to force the reader to paginate according to breaks in the HTML rather than...arbitrarily. I have no idea how the reader manages pagination of the text. I know that its possible to insert a page break in an RTF and the Reader will break the page accordingly for a Calibre conversion to ePub.
|
01-21-2009, 12:36 PM | #14 |
creator of calibre
Posts: 44,381
Karma: 23766374
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I suspect that's one part of the OPS spec that's going to change. It's rather ridiculuous to not supoprt javascript and in a few years when portable devices are powerful enough to handle javascript, it will make absolutely no sense.
|
01-21-2009, 12:59 PM | #15 | |
book creator
Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
|
Quote:
You can assign a pagebreak to headers of paragraphs, by creating an inline or outline CSS sheet and using the same tag, thus sparing yourself to have to type that all the time. Is that what you wanted to know? |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Force page breaks in epubs | alexvallette | ePub | 11 | 09-06-2010 07:53 AM |
bookmark issues converting HTML to EPUB | isabellkirsten | Calibre | 0 | 04-09-2010 11:47 PM |
Remove page info from HTML when converting? | JMikeD | Calibre | 5 | 04-04-2010 08:40 PM |
converting multi-page HTML to Mobipocket | shinew | Calibre | 13 | 02-21-2009 01:33 PM |
Problem converting a webpage html to LRF, what program should I use? Long page turns | seajewel | Workshop | 1 | 08-01-2008 06:32 AM |