Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-24-2010, 01:29 PM   #1
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Issue importing html zip archives and metadata parsing

Hi,

I am using the latest calibre (downloaded and installed today). I seem to be having trouble when importing a zip archive that has the following contents

zip archive contents:

book.html
style.css
img/*.jpeg


When I import it as a .zip archive I get no metadata read from the html file at all.

When I rename the .zip to .htmlz, I again get no metadata read from the html file at all.

If I unzip it manually and then import book.html, everything works just fine (the metadata is recognized).

I am designing a file conversion import plugin and I was trying to pass the output of the file plugin as a zip archive and wanted to manually test what happens when I do that.

Is there some format or special file names I need to use in creating a zip archive so that upon importing it the html file is parsed properly for metadata.

Thanks,

KevinH
KevinH is offline   Reply With Quote
Old 12-24-2010, 01:31 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
add an opf file to the zip with the metadata.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-24-2010, 02:55 PM   #3
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Hi,

Okay I added the following metadata.opf and all of my metadata was properly parsed except for the cover.

Is there something I am doing wrong with my metadata.opf when it comes to setting a cover image upon import:

The contents of the zip archive are:

book.html
style.css
cover.jpg
metadata.opf
img/*.jpg

Here is my generated metadata.opf file:

Code:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="guid_id">
   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
      <dc:identifier opf:scheme="GUID" id="guid_id">4f3807e13649d56d9cfa5e91beca6765</dc:identifier>
      <dc:identifier opf:scheme="ASIN">B001U3YDJK</dc:identifier>
      <dc:identifier opf:scheme="oASIN">0253342112</dc:identifier>
      <dc:title>Tank Driver: With the 11th Armored from the Battle of the Bulge to VE Day</dc:title>
      <dc:creator opf:role="aut">J. Ted Hartman;Ted J. Hartman</dc:creator>
      <dc:language>en</dc:language>
      <dc:date>20090126T20:24</dc:date>
   </metadata>
   <guide>
      <reference href="cover.jpg" type="cover" title="Cover"/>
   </guide>
</package>
The book.html file does not directly reference cover.jpg (it references instead img/img0000.jpg) but I tried using href="img/img0000.jpg" in the guide element to no avail.

Sorry to be so thick here but I am stumped.

Thanks,

KevinH
KevinH is offline   Reply With Quote
Old 12-24-2010, 05:07 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I don't think covers are ever read from zip files, unless they are identified as comics.
kovidgoyal is offline   Reply With Quote
Old 12-24-2010, 10:00 PM   #5
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Hi,

Is there any way to change this. I would very much like to take the book in html format and allow it to be imported properly.

Would it be possible to use a special extension such as .htmlz or bookz or something that would indicate to to calibre to look for a cover by parsing the opf?

If I extend the metadata.opf to include a full manifest listing the cover.jpg, would that help? It just seems sad to leave the cover unidentified upon import when it is well known by the file conversion process.

Also, immediately after import if I try to convert the book to pdf, I get a missing "Spine error" if I have the metadata.opf file in the zip archive. If I remove it (and lose all of the metadata) the conversion proceeds without issue.

Thanks again for answering my questions.

KevinH


Kevin
KevinH is offline   Reply With Quote
Advert
Old 12-24-2010, 10:16 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use .epub you're almost there already. All you need is to add <manifest> and <spine> to the OPF
kovidgoyal is offline   Reply With Quote
Old 12-24-2010, 10:35 PM   #7
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Hi,

Okay I can go for epub but I typically would have a single html file that is huge and I would rather not rewrite all of the code for detecting and splitting chapters, updating links, etc.

I just wouldn't want anyone to take the .epub format I give to Calibre and write it to disk and try to load it on a Sony eReader and end up with one big "page Error".

That was why I was hoping an ".htmlz" with an opf would act like a poor man's epub that forced people to convert it via calibre before trying to load it on their device.

Thanks,

Kevin
KevinH is offline   Reply With Quote
Old 12-24-2010, 11:12 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you dont need to write all that code, you can convert epub to epub in calibre.
kovidgoyal is offline   Reply With Quote
Old 12-25-2010, 07:42 AM   #9
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Hi,

So then how does a calibre plugin trigger a calibre epub to epub conversion after the run() method has completed? Is there a post-run callback of some sort?

Thanks,

Kevin
KevinH is offline   Reply With Quote
Old 12-25-2010, 10:53 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm confused, why does your plugin need to do an epub to epub cnoversion? You can just do that conversion as normal in calibre after the import has completed.
kovidgoyal is offline   Reply With Quote
Old 12-25-2010, 02:29 PM   #11
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by kovidgoyal View Post
I'm confused, why does your plugin need to do an epub to epub cnoversion? You can just do that conversion as normal in calibre after the import has completed.
Hi,

This is a plugin for a common Kindle format book that can not be deciphered today. I assume the plugin will be used by many people not all of whom will remember they have to do an "epub to epub" conversion before exporting the book or syncing it with their reader of choice. If they sync it to their Sony reader as is, they will end up with a single giant html file and a "Page Error". I was hoping for a seamless file type plugin that would use all of the information available in the original book format and pass it nicely to calibre.

I assumed an .htmlz or .zip archive with the proper files would be the best way to pass things through from the plugin to calibre itself.

I will play around and see if I can figure something else out.

Thanks,

Kevin
KevinH is offline   Reply With Quote
Old 12-25-2010, 02:35 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Do you mean the topaz format? In that case,, why not just implement a conversion plugin for de-drmed topaz. That can be made part of calibre.
kovidgoyal is offline   Reply With Quote
Old 12-25-2010, 04:25 PM   #13
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Hi,

Although, non-drm'd Topaz could exist, they do not seem to in the wild. The current "tools" do not take that approach since a non-drm Topaz originally could not easily be converted because it is internally a binary encoded data file that is really a poor man's version of an image only pdf file with some ocr info added to make searching possible.

So a non-DRM Topaz file was really only good for sharing/piracy as having it changes nothing for the owner, they could read it only on Kindles before and could only read it on Kindles after -- all the DRM removal accomplished was to allow owners to post/share the file with others (something the tools authors did not want to support).

The binary encoded data file itself needs to be converted to an incompletely reverse engineered xml using a dictionary lookup procedure, the custom xml then needs to be parsed, and the information which describes the image of the page needs to be combined with the internal OCR info to create something that is html based but unfortunately imperfect (the internal ocr can be horrible and all italics and most bolding is lost). The same binary data files can also be converted to a set of svg images of the page (perfect and scalable but not reflowable, unless you have an algorithm to reflow individual glyphs which need not map to any specific letter on the screen)

So are you saying, that if the "tools" were somehow reverted to do nothing other than generate non-DRM topaz files, we could move all of the reversed engineered python code that was added later that handles the conversion of the file to html and a set of svg images right into calibre itself?

I was not sure you would allow Calibre to host code that was reverse engineered. If so, we could certainly take that approach.

My original idea was to create a file plugin that handled the "non-drm part" and the detailed conversion behind the scenes and then handed calibre the results of the conversion as one nice package of some sort - say a .tpzZ (for zip) file so that nothing internal to calibre need change except for adding a pseudo-file type (.tpzZ) type support which I was going to write and contribute to calibre so that no reverse-engineered code need be included.

If you really would like to host the internal conversion code, I would be happy to contribute it and the authors of the standalone "tools" could revert to just creating non-drm Topaz files.

It is really your choice. If you are interested, I will can take the latest versions, strip out the drm removal code pieces, and just send you just a working converter program for you to play around with (pure python). I just thought that "all in a plugin" would be the safest approach.

Take care,

KevinH
KevinH is offline   Reply With Quote
Old 12-25-2010, 05:01 PM   #14
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,160
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I am perfectly fine with adding code to convert non drmed topaz. calibre's MOBI conversion code is also reverse engineered. I just dont want any DRM removal code in calibre, as that would violate the DMCA.

So if you write an InputPlugin to convert on DRMed topaz files, I will be happy to merge it with the calibre code base.
kovidgoyal is offline   Reply With Quote
Old 12-25-2010, 05:29 PM   #15
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
Hi,

Okay, I will grab the calibre source and look into doing just that.

Thanks,

Kevin
KevinH is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Importing - Metadata aquisition Justy Calibre 1 02-05-2010 03:44 PM
why does html appears as Zip? yasmeen57 Calibre 6 10-06-2009 11:25 AM
regex Issue when Importing river Calibre 3 06-16-2009 11:03 AM
Multiple html issue - too many links and zip isn't created in calibre Katelyn Calibre 1 03-10-2009 01:31 PM
Conversion issue with zip of Warbreaker Mitchll Calibre 6 07-28-2008 06:25 PM


All times are GMT -4. The time now is 03:31 PM.


MobileRead.com is a privately owned, operated and funded community.