08-02-2019, 04:29 PM | #1 |
Member
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
|
Can't convert from zip/html to epub
I have a .zip that contains a bunch of .html files. When I add it as a book in Calibre and convert to .epub, it just converts index.html to epub and ignores the rest. When I go to cmd and enter
Code:
ebook-convert index.html book.epub Code:
IgnoreFile(u'blah.html is a binary file',) The .zip is from Runeberg, just go here and download the one with "All HTML files". |
08-02-2019, 05:27 PM | #2 |
Grand Sorcerer
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
I'm guessing you added the downloaded .zip file to the calibre GUI as a book format.
Instead, unzip the .zip file on your PC then add just the index.html file as a book to calibre GUI. calibre will use index.html to pull in all the other .html files and create its own zip book format. Once you've done that do a calibre zip-to-whatever conversion. I just did a zip-to-epub conversion and it converted OK for me - even if the epub does look a bit primitive due to no styling. Last edited by jackie_w; 08-02-2019 at 05:32 PM. |
08-02-2019, 09:46 PM | #3 |
Well trained by Cats
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
If the zip is not converting properly, I suspect the contents were added from different paths than index.html specified, so it can't find stuff where it thinks it is supposed to be.
1)fix the paths in index or 2)Add each file to an editor session, setting the order (if needed) in the file list Re-link images as needed |
08-03-2019, 01:04 AM | #4 |
Bibliophagist
Posts: 40,617
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
I downloaded the file, converted it to epub and opened in it Sigil. Which was not very happy with it since quite a few files were not in the manifest's spine section. Looking at that segment in content.opf showed 4 files. I used the Modify Epub plugin to add unmanifested files to the manifest and that cleared up those errors.
The easier answer was to unpack the .zip file into a temp directory and then add the index.html file to calibre. Calibre will then parse that file and drag in the other files which are referenced. This file opened with a couple of minor errors from k9.html where there is a chunk of text wrapped in <blockquote></blockquote> tags without a block tag. Simply adding a <p> and </p> corrected that issue. |
08-03-2019, 07:35 AM | #5 |
Member
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
|
Thanks for the help, guys. The files were put in the ePub in a seemingly random order; this was easy (though tedious) to fix through editing, but can I get the conversion to sort them properly? The files appear in the correct order order the way they get sorted by name in Windows (e.g. k9 comes before k10) and they also appear in the correct order in index.html, but Calibre decided the most logical order was k54 -> k53 -> k52c -> k0a -> k0b -> k1a -> ... -> k7 -> k33c -> k43b -> k34 -> ...
|
08-03-2019, 12:16 PM | #6 |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You will want to change the setting to add the files in depth first rder to breadth first order, see the note at https://manual.calibre-ebook.com/faq...specific-order
|
08-03-2019, 04:39 PM | #7 |
Member
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
|
Didn't work, and when I said "seemingly random order" I guess I shouldn't have hedged my statement. I imported index.html into Calibre twice with depth-first, then twice with breadth-first, and the resulting ePub had a different order every time. I then converted the last entry again to confirm that the problem lies not with adding the .html but with converting to ePub.
The resulting five ePubs started (after index) with k33c, k53, k33c, k54, k53... and I converted once again and got k52c, so there's definitely a pattern here. I think one I deleted started with k34. Also, the two that started with k33c had different orders after that. |
08-03-2019, 11:42 PM | #8 |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I fond it extremely hard to believe that converting to EPUB would randomize file order, if that were the case there would literally be millions of bug reports about it. But feel free to attach a file that shows this behavior on conversion and I will take a look.
|
08-04-2019, 06:08 AM | #9 |
Member
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
|
Alright, I unzipped the first zip (nilsholg-html.zip) and added index.html to Calibre; the result was the second zip which I converted to ePub twice with different results (and with a weirdly large size difference). This was with the breadth-first setting.
|
08-04-2019, 07:49 AM | #10 |
creator of calibre
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
these is because these are not HTML files, but HTML fragments, for example k0a.html contains:
Code:
<h1>Den kristna dagvisan</h1> <p>Den signade dag, som vi nu här se <br>av himmelen till oss nedkomma, <br>han blive oss säll, han låte sig te <br>oss allom till glädje och fromma. <br>Ja, Herren, den högste, oss alla i dag <br>för synder och sorger bevare. <p>Men såsom en fågel mot himmelens höjd <br>sig lyfter på lediga vingar, <br>han lovar sin gud, är glad och förnöjd, <br>när han över jorden sig svingar, <br>så lyfter sig själen i hjärtelig fröjd <br>till himlen med lovsång och böner. <p>Ack, låtom oss lova och bedja vår Gud, <br>när stunderna växla och skrida, <br>så skola vi stärkas att hålla hans bud <br>och vaka och tåligen lida. <br>Ja, låtom oss verka med allvar och flit, <br>så länge oss dagen förunnas. <p>sv. Ps. 424: 1, 5, 6 So you need to either fix the files to be proper html yourself, just adding an opening <html> tag to the start of the file should be enough, or edit the opf file inside the calibre produced zip file and add all the extra html files to the <spine> section and then convert. |
08-06-2019, 11:19 AM | #11 |
Member
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
|
Alright, thanks for the help. I made a quick and dirty java program for editing content.opf in the zip file. I'll drop it here just in case someone with the same problem happens to find this thread.
Spoiler:
|
Tags |
calibre, convert, epub, html, zip |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Produced ePub from ZIP(html), seen as blank pages in KoboTouch | Fackerman | Conversion | 1 | 07-29-2012 11:03 PM |
Convert EPUB to HTML Zip extra meta text | meme | Conversion | 2 | 05-28-2012 02:34 PM |
Convert HTML to MOBI (HTML recognized as ZIP file) | pdubois | Conversion | 1 | 01-25-2011 01:55 PM |
Complex HTML archive (ZIP), how to convert | Mixx | Calibre | 10 | 09-28-2010 01:29 PM |
Convert from HTML (zip) no longer working | alhscw | Calibre | 2 | 08-03-2010 02:07 PM |