Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2011, 03:10 PM   #1
duckpuppy
Junior Member
duckpuppy began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Android, various
Beneath Ceaseless Skies recipe for direct epub downloading

BCS is a free monthly digital publication for fantasy stories. I'm trying to massage the Now Toronto recipe into a BCS recipe. Beneath Ceaseless Skies has a feed for new issue announcements, and they provide PDF, epub, and mobi files for each issue. The problem is that the feed doesn't have a link to the published files, nor does it have a link to the issue page (the only link in each feed item is to the discussion forum thread for that issue).

I have a semi-working recipe (at the bottom of this post). It can grab the most recent issue from the feed by parsing the issue number and constructing a direct download link to the epub, downloading it, and then unzipping it and returning the content.opf file as the index.

I have the following problems:
  • The epub has a cover image, but it doesn't get used in the resulting re-zipped epub. The cover is blank.
  • There are usually multiple authors in each issue. The author metadata is just fine in the resulting epub, but the author sort is always the last,first of the last author listed in the author metadata. I can manually edit the metadata and click the button to automatically set the author sort from the author, and it's fine, but I'd like to make sure it's set during the conversion.

It will probably become apparent from the code below that I'm not a Python expert, though I am a software developer by day, so I've been able to muddle through getting this to work a little. Anybody out there able to help me with the problems I've listed?

As an added bonus, I'd love to set the series to "Beneath Ceaseless Skies" and set the series number to the issue number in the epub metadata... is that possible?

Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Based on Starson17's NowToronto recipe, which in turn was based on Lars Jacob's Taz Digiabo recipe

__license__ = 'GPL v3'
__copyright__ = '2011, DuckPuppy'

import os, urllib2, zipfile
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class BeneathCeaselessSkies(BasicNewsRecipe):
	title = u'Beneath Ceaseless Skies'
	description = u'Beneath Ceaseless Skies'
	__author__ = 'DuckPuppy'

	def build_index(self):
		epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2"
		soup = self.index_to_soup(epub_feed)
		item = soup.find(name='item')
		title = item.find(name='title').string
		print 'Title: ' + title
		issueloc = title.rfind("#")
		issue = title[issueloc+1:].encode('utf-8')
		print issue
		url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue))
		print url
		f = urllib2.urlopen(url)
		tmp = PersistentTemporaryFile(suffix='.epub')
		self.report_progress(0,_('downloading epub'))
		tmp.write(f.read())
		tmp.close()
		zfile = zipfile.ZipFile(tmp.name, 'r')
		self.report_progress(0,_('extracting epub'))
		zfile.extractall(self.output_dir)
		tmp.close()
		index = os.path.join(self.output_dir, 'content.opf')
		self.report_progress(1,_('epub downloaded and extracted'))
		return index

Last edited by duckpuppy; 02-23-2011 at 03:13 PM.
duckpuppy is offline   Reply With Quote
Old 02-23-2011, 03:37 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,175
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use the OPF class from calibre.ebooks.metadaat.opf2 to manipulate the metadata in content.opf to your hearts content
kovidgoyal is offline   Reply With Quote
Advert
Old 02-23-2011, 05:52 PM   #3
duckpuppy
Junior Member
duckpuppy began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Android, various
Quote:
Originally Posted by kovidgoyal View Post
Use the OPF class from calibre.ebooks.metadaat.opf2 to manipulate the metadata in content.opf to your hearts content
Thanks - I've cobbled another version (inserted below), and it mostly works with the cover, series, and series index when using the command line.

I've found what might be a strange bug with fetching news, though. I'm setting the series_index in my recipe, but when downloaded via "Fetch News" in the GUI, it's always showing up as book 1. If I use the command line "ebook-convert bcs.recipe bcs.epub" and manually add the resulting epub to calibre, it shows up with the correct book number (62).

It seems to be the metadata.opf that calibre creates in the library folder of the book - when I use "Fetch News" to generate the epub, it contains a series_index of 1 (even though content.opf of the epub has it set to 62), but when I add the exact same epub manually (by manually copying it from my library folder, deleting the book from the library via the GUI, and dragging the copy back into calibre), it contains the right values.

Code:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id">
  <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:publisher>_Beneath Ceaseless Skies_ Online Magazine</dc:publisher>
    <meta name="calibre:series_index" content="62"/>
    <dc:description>Issue #62 of &lt;I&gt;Beneath Ceaseless Skies&lt;/I&gt; online magazine, featuring stories by Kris Dikeman and Jesse Bullington.</dc:description>
    <dc:language>en</dc:language>
    <dc:creator opf:file-as="Bullington, Jesse" opf:role="aut">Dikeman, Kris</dc:creator>
    <dc:creator opf:file-as="Bullington, Jesse" opf:role="aut">Bullington, Jesse</dc:creator>
    <meta name="calibre:series" content="Beneath Ceaseless Skies"/>
    <dc:title>Beneath Ceaseless Skies #62</dc:title>
    <meta name="cover" content="id11"/>
    <dc:date>2011-02-10T07:00:00+00:00</dc:date>
    <meta name="calibre:timestamp" content="2011-02-12T01:44:30.555961+00:00"/>
    <dc:contributor opf:role="bkp">calibre (0.7.43) [http://calibre-ebook.com]</dc:contributor>
    <dc:identifier id="uuid_id" opf:scheme="uuid">92d85b12-a487-4f67-9198-f6d226ea27e2</dc:identifier>
  </metadata>
  <manifest>
    <item href="Beneath_Ceaseless_Skies_62_split_000.html" id="id16" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_001.html" id="id15" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_002.html" id="id14" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_003.html" id="id13" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_004.html" id="id12" media-type="application/xhtml+xml"/>
    <item href="images/00001.jpg" id="id3" media-type="image/jpeg"/>
    <item href="images/00002.jpg" id="id4" media-type="image/jpeg"/>
    <item href="images/00003.jpg" id="id5" media-type="image/jpeg"/>
    <item href="images/00004.jpg" id="id6" media-type="image/jpeg"/>
    <item href="images/00005.jpg" id="id7" media-type="image/jpeg"/>
    <item href="images/00006.jpg" id="id8" media-type="image/jpeg"/>
    <item href="images/calibre_cover.jpg" id="id11" media-type="image/jpeg"/>
    <item href="stylesheet1.css" id="css1" media-type="text/css"/>
    <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/>
    <item href="titlepage1.xhtml" id="titlepage1" media-type="application/xhtml+xml"/>
    <item href="toc.ncx" media-type="application/x-dtbncx+xml" id="ncx"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="titlepage1"/>
    <itemref idref="titlepage"/>
    <itemref idref="id16"/>
    <itemref idref="id15"/>
    <itemref idref="id14"/>
    <itemref idref="id13"/>
    <itemref idref="id12"/>
  </spine>
  <guide>
    <reference href="titlepage1.xhtml" type="cover" title="Cover"/>
  </guide>
</package>
Code:
from calibre.ebooks.metadata.opf2 import OPF, OPFCreator
from calibre.ptempfile import PersistentTemporaryFile

class BeneathCeaselessSkies(BasicNewsRecipe):
	title = u'Beneath Ceaseless Skies'
	description = u'Beneath Ceaseless Skies'
	__author__ = 'DuckPuppy'

	def build_index(self):
		epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2"
		soup = self.index_to_soup(epub_feed)
		item = soup.find(name='item')
		title = item.find(name='title').string
		issueloc = title.rfind("#")
		issue = title[issueloc+1:].encode('utf-8')
		url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue))
		f = urllib2.urlopen(url)
		tmp = PersistentTemporaryFile(suffix='.epub')
		self.report_progress(0,_('downloading epub'))
		tmp.write(f.read())
		tmp.close()
		zfile = zipfile.ZipFile(tmp.name, 'r')
		self.report_progress(0,_('extracting epub'))
		zfile.extractall(self.output_dir)
		tmp.close()
		index = os.path.join(self.output_dir, 'content.opf')
		indexfile = open(index, "r")
		opf = OPF(open(index, "r"), self.output_dir)
		indexfile.close()

		opf.series = title[:issueloc-1].strip("_")
		print 'issue#: ' + issue
		opf.series_index = issue
		opf.cover = os.path.join(self.output_dir, 'images/calibre_cover.jpg')

		indexfile = open(index, "w")
		opfc = OPFCreator(self.output_dir, opf)
		opfc.render(indexfile)
		indexfile.close()
		self.report_progress(1,_('epub downloaded and extracted'))
		return index
duckpuppy is offline   Reply With Quote
Old 02-23-2011, 08:13 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,175
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
When adding news via the GUI metadata isn't read from the epub file, so that's not surprising.
kovidgoyal is offline   Reply With Quote
Old 02-23-2011, 09:59 PM   #5
duckpuppy
Junior Member
duckpuppy began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Android, various
Quote:
Originally Posted by kovidgoyal View Post
When adding news via the GUI metadata isn't read from the epub file, so that's not surprising.
Can that be added? Should I make a feature request? It seems like it's reading or responding to the series metadata from the epub, just not the series_index. Both bits of information are added by my recipe and are not in the original epub download, and neither showed up in calibre at all until I added the code in the recipe to set them.
duckpuppy is offline   Reply With Quote
Advert
Old 02-23-2011, 10:12 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,175
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Feel free to add a feature request.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Downloading several years of blogposts via custom recipe flyash Calibre 4 01-01-2011 02:02 AM
Direct Downloads of DRMed Epub? luqmaninbmore PocketBook 5 05-11-2010 04:20 AM
Google Editions to offer direct ePub upload wdejager News 1 12-24-2009 12:22 AM
Beneath Ceaseless Skies--now in PRC (Mobi) BearMountainBooks Deals and Resources (No Self-Promotion or Affiliate Links) 5 11-18-2009 06:31 PM
downloading google epub europas_ice Sony Reader 3 03-22-2009 10:27 AM


All times are GMT -4. The time now is 07:31 PM.


MobileRead.com is a privately owned, operated and funded community.