Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 03-17-2024, 12:33 PM   #1
electrotype
Enthusiast
electrotype began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2009
Device: none
Single page vs. multiple pages (epub creation)

I'm trying to create an epub. The original source of the text is a PDF, but I downloaded an "epub" version generated by archive.org.

Unfortunately, the generated epub is far from perfect! The content is not split where it should be, etc. So I'm trying to clean everything up and create the final epub myself.

I have already read about epub format and I have successfully created one. But I still have one question.

In the process of collecting all the original text (I am a developer so I created some code for this), I ended up with a single large piece of text. I could split the text into multiple pages, but I'm wondering if this is really useful?

Can't I just have one big "content.html" file, with "id" attributes on some "<p>" elements so that the "toc.ncx" can point to them? Are there advantages to actually splitting the content into several separate files instead of a big one?
electrotype is offline   Reply With Quote
Old 03-17-2024, 01:18 PM   #2
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 72,551
Karma: 309960766
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
The advantage of having multiple files is that more ereaders will handle the ePub successfully. ereaders tend to be more limited in memory than most computers.
pdurrant is offline   Reply With Quote
Advert
Old 03-17-2024, 01:25 PM   #3
electrotype
Enthusiast
electrotype began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2009
Device: none
I see! Thanks for the information.
electrotype is offline   Reply With Quote
Old 03-17-2024, 01:44 PM   #4
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 72,551
Karma: 309960766
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
There are disadvantages too. Most ereaders enforce a page break between files.
pdurrant is offline   Reply With Quote
Old 03-17-2024, 01:47 PM   #5
electrotype
Enthusiast
electrotype began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2009
Device: none
Yes, I guessed that. You have to split the files at the right positions. Thanks again.
electrotype is offline   Reply With Quote
Advert
Old 03-18-2024, 08:35 AM   #6
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,386
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
My experience is that the Internet Archive "ebooks" are worthless. They are generated automatically from un-proofed OCR text only marginally good for searching.

So I deleted them all and only download PDFs (after checking they are really PD) and read them on a tablet.

You are better doing your own OCR of the PDF and proofing it. Do put page breaks at chapters, sections or other natural breaks in your wordprocessor. Later those will start new files in the epub. A new file is the only reliable page break and works for epub converted to mobi, azw3/KF8, dual mobi and KFX.
Quoth is offline   Reply With Quote
Old 03-18-2024, 11:38 AM   #7
electrotype
Enthusiast
electrotype began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2009
Device: none
Quote:
Originally Posted by Quoth View Post
You are better doing your own OCR of the PDF and proofing it.
Good to know. The text of the one I downloaded isn't too bad, it's just not formatted properly. I do find some OCR errors, but not too many.

Anyway, my opinion is that if you really want to do a good job, you need to read the final epub from A to Z once it's created, fix mistakes, and then create a final version. Some books really deserve that work.

I plan to create epubs for other books too, so I might try what you suggest! I'll test some OCR applications to see how they work. Thanks.
electrotype is offline   Reply With Quote
Old 03-18-2024, 02:46 PM   #8
slm
Fool
slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.slm ought to be getting tired of karma fortunes by now.
 
Posts: 424
Karma: 3585252
Join Date: Feb 2003
Device: Kindle: Voyage,PW1,KOA, Kobo: Clara Colour, Nook GLP, Pocketbook verse
Quote:
Originally Posted by Quoth View Post
My experience is that the Internet Archive "ebooks" are worthless. They are generated automatically from un-proofed OCR text only marginally good for searching.

So I deleted them all and only download PDFs (after checking they are really PD) and read them on a tablet.

You are better doing your own OCR of the PDF and proofing it. Do put page breaks at chapters, sections or other natural breaks in your wordprocessor. Later those will start new files in the epub. A new file is the only reliable page break and works for epub converted to mobi, azw3/KF8, dual mobi and KFX.
Just for the record--and to offset the quoted view--my experience with reading many Internet Archive "epubs" OCR'd from pdfs over many years is that about two-thirds of them are perfectly OK and about 10% are completely unusable. Note that this is just for casual reading.
slm is online now   Reply With Quote
Old 03-18-2024, 05:08 PM   #9
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,386
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
About 100% of ones I downloaded were poor and more than 1/2 unusable.

Perhaps it depends on the source and content and when the scanned book was printed. Also check they aren't "pirated".

Many other people have remarked that you are better off downloading the PDF.
Quoth is offline   Reply With Quote
Old 03-18-2024, 05:18 PM   #10
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,367
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by electrotype View Post
I'll test some OCR applications to see how they work. Thanks.
Might be of use...
https://www.mobileread.com/forums/sh...93#post4341993
Karellen is offline   Reply With Quote
Old 03-18-2024, 07:36 PM   #11
electrotype
Enthusiast
electrotype began at the beginning.
 
Posts: 46
Karma: 10
Join Date: Dec 2009
Device: none
Quote:
Originally Posted by Karellen View Post
Thanks, bookmarked.
electrotype is offline   Reply With Quote
Reply

Tags
content, creation, epub, pages


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Texts on the right-most page were chewed off [multiple pages in paged mode] llinfeng Viewer 4 05-21-2021 01:42 PM
How to convert multiple epub chapters into a single azw3 book cliffsloane555 Conversion 2 11-08-2020 08:53 PM
Multiple JPG images in SVG on single epub page dbb1480 Sigil 7 05-20-2016 10:57 AM
Page feeder does single-sided scans only. How to integrate pages... u238110 Workshop 8 07-14-2014 04:25 PM
Prev Page button skips back multiple pages Yetchtoo Amazon Kindle 9 03-08-2010 01:48 PM


All times are GMT -4. The time now is 12:31 AM.


MobileRead.com is a privately owned, operated and funded community.