Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 08-02-2019, 04:29 PM   #1
Blrp
Member
Blrp began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
Can't convert from zip/html to epub

I have a .zip that contains a bunch of .html files. When I add it as a book in Calibre and convert to .epub, it just converts index.html to epub and ignores the rest. When I go to cmd and enter
Code:
ebook-convert index.html book.epub
it just says
Code:
IgnoreFile(u'blah.html is a binary file',)
a bunch of times and gives me the same result as before.

The .zip is from Runeberg, just go here and download the one with "All HTML files".
Blrp is offline   Reply With Quote
Old 08-02-2019, 05:27 PM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
I'm guessing you added the downloaded .zip file to the calibre GUI as a book format.

Instead, unzip the .zip file on your PC then add just the index.html file as a book to calibre GUI. calibre will use index.html to pull in all the other .html files and create its own zip book format. Once you've done that do a calibre zip-to-whatever conversion.

I just did a zip-to-epub conversion and it converted OK for me - even if the epub does look a bit primitive due to no styling.

Last edited by jackie_w; 08-02-2019 at 05:32 PM.
jackie_w is offline   Reply With Quote
Old 08-02-2019, 09:46 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,454
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
If the zip is not converting properly, I suspect the contents were added from different paths than index.html specified, so it can't find stuff where it thinks it is supposed to be.

1)fix the paths in index
or
2)Add each file to an editor session, setting the order (if needed) in the file list
Re-link images as needed
theducks is offline   Reply With Quote
Old 08-03-2019, 01:04 AM   #4
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 40,617
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
I downloaded the file, converted it to epub and opened in it Sigil. Which was not very happy with it since quite a few files were not in the manifest's spine section. Looking at that segment in content.opf showed 4 files. I used the Modify Epub plugin to add unmanifested files to the manifest and that cleared up those errors.

The easier answer was to unpack the .zip file into a temp directory and then add the index.html file to calibre. Calibre will then parse that file and drag in the other files which are referenced. This file opened with a couple of minor errors from k9.html where there is a chunk of text wrapped in <blockquote></blockquote> tags without a block tag. Simply adding a <p> and </p> corrected that issue.
DNSB is offline   Reply With Quote
Old 08-03-2019, 07:35 AM   #5
Blrp
Member
Blrp began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
Thanks for the help, guys. The files were put in the ePub in a seemingly random order; this was easy (though tedious) to fix through editing, but can I get the conversion to sort them properly? The files appear in the correct order order the way they get sorted by name in Windows (e.g. k9 comes before k10) and they also appear in the correct order in index.html, but Calibre decided the most logical order was k54 -> k53 -> k52c -> k0a -> k0b -> k1a -> ... -> k7 -> k33c -> k43b -> k34 -> ...
Blrp is offline   Reply With Quote
Old 08-03-2019, 12:16 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You will want to change the setting to add the files in depth first rder to breadth first order, see the note at https://manual.calibre-ebook.com/faq...specific-order
kovidgoyal is offline   Reply With Quote
Old 08-03-2019, 04:39 PM   #7
Blrp
Member
Blrp began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
Didn't work, and when I said "seemingly random order" I guess I shouldn't have hedged my statement. I imported index.html into Calibre twice with depth-first, then twice with breadth-first, and the resulting ePub had a different order every time. I then converted the last entry again to confirm that the problem lies not with adding the .html but with converting to ePub.

The resulting five ePubs started (after index) with k33c, k53, k33c, k54, k53... and I converted once again and got k52c, so there's definitely a pattern here. I think one I deleted started with k34. Also, the two that started with k33c had different orders after that.
Blrp is offline   Reply With Quote
Old 08-03-2019, 11:42 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I fond it extremely hard to believe that converting to EPUB would randomize file order, if that were the case there would literally be millions of bug reports about it. But feel free to attach a file that shows this behavior on conversion and I will take a look.
kovidgoyal is offline   Reply With Quote
Old 08-04-2019, 06:08 AM   #9
Blrp
Member
Blrp began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
Alright, I unzipped the first zip (nilsholg-html.zip) and added index.html to Calibre; the result was the second zip which I converted to ePub twice with different results (and with a weirdly large size difference). This was with the breadth-first setting.
Attached Files
File Type: zip nilsholg-html.zip (492.8 KB, 252 views)
File Type: zip index - Unknown.zip (506.3 KB, 259 views)
File Type: epub index - Unknown.epub (638.1 KB, 220 views)
File Type: epub index - Unknown2.epub (530.6 KB, 232 views)
Blrp is offline   Reply With Quote
Old 08-04-2019, 07:49 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,566
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
these is because these are not HTML files, but HTML fragments, for example k0a.html contains:

Code:
<h1>Den kristna dagvisan</h1>

<p>Den signade dag, som vi nu här se
<br>av himmelen till oss nedkomma,
<br>han blive oss säll, han låte sig te
<br>oss allom till glädje och fromma.
<br>Ja, Herren, den högste, oss alla i dag
<br>för synder och sorger bevare.

<p>Men såsom en fågel mot himmelens höjd
<br>sig lyfter på lediga vingar,
<br>han lovar sin gud, är glad och förnöjd,
<br>när han över jorden sig svingar,
<br>så lyfter sig själen i hjärtelig fröjd
<br>till himlen med lovsång och böner.

<p>Ack, låtom oss lova och bedja vår Gud,
<br>när stunderna växla och skrida,
<br>så skola vi stärkas att hålla hans bud
<br>och vaka och tåligen lida.
<br>Ja, låtom oss verka med allvar och flit,
<br>så länge oss dagen förunnas.

<p>sv. Ps. 424: 1, 5, 6
calibre will look for an <html> tag and if it does not find it, will assume the file is not html and not add it to the spine. However since index.html is added to the spine, by virtue of being the top level file, the EPUB conversion will follow links in it and auto-fix the HTML framgments making them proper files. However this happens in random order.

So you need to either fix the files to be proper html yourself, just adding an opening <html> tag to the start of the file should be enough, or edit the opf file inside the calibre produced zip file and add all the extra html files to the <spine> section and then convert.
kovidgoyal is offline   Reply With Quote
Old 08-06-2019, 11:19 AM   #11
Blrp
Member
Blrp began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jul 2014
Device: none
Alright, thanks for the help. I made a quick and dirty java program for editing content.opf in the zip file. I'll drop it here just in case someone with the same problem happens to find this thread.

Spoiler:
Code:
import java.io.BufferedWriter;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStreamWriter;
import java.io.StringReader;
import java.io.StringWriter;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class EditContentOPF {
	
	static String readFileInZip(String pathToZip, String pathInZip, String charsetName) throws IOException {
		try (ZipFile zip = new ZipFile(pathToZip)) {
			ZipEntry entry = zip.getEntry(pathInZip);
			if (entry == null) throw new RuntimeException("content.opf not found");
			try (InputStream is = zip.getInputStream(entry)) {
				try (@SuppressWarnings("resource") Scanner s = new Scanner(is, charsetName).useDelimiter("\\A")) { // eclipse bug? try-with should not have resource leak
					return s.hasNext() ? s.next() : "";
				}
			}
		}
	}
	
	static String editContentFileText(String text) throws ParserConfigurationException, SAXException, IOException, TransformerException {
		DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
		builder.setErrorHandler(null);
		Document document = builder.parse(new InputSource(new StringReader(text)));
		Element packageNode = document.getDocumentElement();
		
		// get list of items to reference in spine
		Node manifest = packageNode.getElementsByTagName("manifest").item(0);
		NodeList manifestItems = ((Element)manifest).getElementsByTagName("item");
		List<Element> manifestHtmlItems = new ArrayList<>();
		for (int i = 0; i < manifestItems.getLength(); i++) {
			Element item = (Element)manifestItems.item(i);
			String href = item.getAttribute("href");
			if (href.substring(href.lastIndexOf('.')).equals(".html"))
				manifestHtmlItems.add(item);
		}
		
		// we expect index.html to be first in the manifest
		Element indexItem = (Element)manifestHtmlItems.get(0);
		if (!indexItem.getAttribute("href").equals("index.html"))
			throw new RuntimeException("index.html is not first");
		String indexId = indexItem.getAttribute("id");
		manifestHtmlItems.remove(0);
		
		// we expect index.html to be the only item referenced in the spine
		Node spine = packageNode.getElementsByTagName("spine").item(0);
		NodeList spineChildren = ((Element)spine).getChildNodes();
		List<Element> spineNonTextChildren = new ArrayList<>();
		for (int i = 0; i < spineChildren.getLength(); i++) {
			if (spineChildren.item(i).getNodeType() != Node.TEXT_NODE)
				spineNonTextChildren.add((Element)spineChildren.item(i));
		}
		if (spineNonTextChildren.size() != 1)
			throw new RuntimeException("unexpected number of nodes in spine");
		if (!spineNonTextChildren.get(0).getAttribute("idref").equals(indexId))
			throw new RuntimeException("index.html is not referenced in spine");
		
		// add references in spine
		for (Element item : manifestHtmlItems) {
			String id = item.getAttribute("id");
			if (id.isEmpty())
				throw new RuntimeException("item has no id, href=" + item.getAttribute("href"));
			Element itemref = document.createElement("itemref");
			itemref.setAttribute("idref", id);
			spine.appendChild(itemref);
		}
		
		// turn document back into string
		StringWriter writer = new StringWriter();
		TransformerFactory.newInstance().newTransformer().transform(new DOMSource(document), new StreamResult(writer));
		return writer.toString();
	}
	
	static void writeToFileInZip(String pathToZip, String pathInZip, String text, String charsetName) throws IOException {
		String[] lines = text.split("\\R");
		try (FileSystem fs = FileSystems.newFileSystem(Paths.get(pathToZip), null)) {
			Path fullPath = fs.getPath(pathInZip);
			Files.delete(fullPath);
			try (BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(Files.newOutputStream(fullPath), charsetName))) {
				for (String line : lines) {
					bw.write(line);
					bw.newLine();
				}
			}
		}
	}
	
	public static String input() {
		try (Scanner sc = new Scanner( // this yields a scanner of System.in that, when closed, does not close System.in
				new FilterInputStream(System.in) {
					@Override
					public void close() throws IOException {}
				})) {
			return sc.nextLine();
		}
	}
	
	public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, TransformerException {
		System.out.print("enter zip path: ");
		String pathToZip = input();
		String charsetName = "utf-8";
		String pathInZip = "content.opf";
		
		String contentFileText = readFileInZip(pathToZip, pathInZip, charsetName);
		
		String editedContentFileText = editContentFileText(contentFileText);
				
		writeToFileInZip(pathToZip, pathInZip, editedContentFileText, charsetName);
	}
}
Blrp is offline   Reply With Quote
Reply

Tags
calibre, convert, epub, html, zip

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Produced ePub from ZIP(html), seen as blank pages in KoboTouch Fackerman Conversion 1 07-29-2012 11:03 PM
Convert EPUB to HTML Zip extra meta text meme Conversion 2 05-28-2012 02:34 PM
Convert HTML to MOBI (HTML recognized as ZIP file) pdubois Conversion 1 01-25-2011 01:55 PM
Complex HTML archive (ZIP), how to convert Mixx Calibre 10 09-28-2010 01:29 PM
Convert from HTML (zip) no longer working alhscw Calibre 2 08-03-2010 02:07 PM


All times are GMT -4. The time now is 05:47 AM.


MobileRead.com is a privately owned, operated and funded community.