Beneath Ceaseless Skies recipe for direct epub downloading

duckpuppy · 02-23-2011, 03:10 PM

BCS is a free monthly digital publication for fantasy stories. I'm trying to massage the Now Toronto recipe into a BCS recipe. Beneath Ceaseless Skies has a feed for new issue announcements, and they provide PDF, epub, and mobi files for each issue. The problem is that the feed doesn't have a link to the published files, nor does it have a link to the issue page (the only link in each feed item is to the discussion forum thread for that issue).

I have a semi-working recipe (at the bottom of this post). It can grab the most recent issue from the feed by parsing the issue number and constructing a direct download link to the epub, downloading it, and then unzipping it and returning the content.opf file as the index.

I have the following problems:

The epub has a cover image, but it doesn't get used in the resulting re-zipped epub. The cover is blank.

There are usually multiple authors in each issue. The author metadata is just fine in the resulting epub, but the author sort is always the last,first of the last author listed in the author metadata. I can manually edit the metadata and click the button to automatically set the author sort from the author, and it's fine, but I'd like to make sure it's set during the conversion.

It will probably become apparent from the code below that I'm not a Python expert, though I am a software developer by day, so I've been able to muddle through getting this to work a little. Anybody out there able to help me with the problems I've listed?

As an added bonus, I'd love to set the series to "Beneath Ceaseless Skies" and set the series number to the issue number in the epub metadata... is that possible?

Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Based on Starson17's NowToronto recipe, which in turn was based on Lars Jacob's Taz Digiabo recipe

__license__ = 'GPL v3'
__copyright__ = '2011, DuckPuppy'

import os, urllib2, zipfile
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class BeneathCeaselessSkies(BasicNewsRecipe):
	title = u'Beneath Ceaseless Skies'
	description = u'Beneath Ceaseless Skies'
	__author__ = 'DuckPuppy'

	def build_index(self):
		epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2"
		soup = self.index_to_soup(epub_feed)
		item = soup.find(name='item')
		title = item.find(name='title').string
		print 'Title: ' + title
		issueloc = title.rfind("#")
		issue = title[issueloc+1:].encode('utf-8')
		print issue
		url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue))
		print url
		f = urllib2.urlopen(url)
		tmp = PersistentTemporaryFile(suffix='.epub')
		self.report_progress(0,_('downloading epub'))
		tmp.write(f.read())
		tmp.close()
		zfile = zipfile.ZipFile(tmp.name, 'r')
		self.report_progress(0,_('extracting epub'))
		zfile.extractall(self.output_dir)
		tmp.close()
		index = os.path.join(self.output_dir, 'content.opf')
		self.report_progress(1,_('epub downloaded and extracted'))
		return index

kovidgoyal · 02-23-2011, 03:37 PM

Use the OPF class from calibre.ebooks.metadaat.opf2 to manipulate the metadata in content.opf to your hearts content

duckpuppy · 02-23-2011, 05:52 PM

Quote:

Originally Posted by kovidgoyal

Use the OPF class from calibre.ebooks.metadaat.opf2 to manipulate the metadata in content.opf to your hearts content

Thanks - I've cobbled another version (inserted below), and it mostly works with the cover, series, and series index when using the command line.

I've found what might be a strange bug with fetching news, though. I'm setting the series_index in my recipe, but when downloaded via "Fetch News" in the GUI, it's always showing up as book 1. If I use the command line "ebook-convert bcs.recipe bcs.epub" and manually add the resulting epub to calibre, it shows up with the correct book number (62).

It seems to be the metadata.opf that calibre creates in the library folder of the book - when I use "Fetch News" to generate the epub, it contains a series_index of 1 (even though content.opf of the epub has it set to 62), but when I add the exact same epub manually (by manually copying it from my library folder, deleting the book from the library via the GUI, and dragging the copy back into calibre), it contains the right values.

Code:

<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id">
  <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:publisher>_Beneath Ceaseless Skies_ Online Magazine</dc:publisher>
    <meta name="calibre:series_index" content="62"/>
    <dc:description>Issue #62 of &lt;I&gt;Beneath Ceaseless Skies&lt;/I&gt; online magazine, featuring stories by Kris Dikeman and Jesse Bullington.</dc:description>
    <dc:language>en</dc:language>
    <dc:creator opf:file-as="Bullington, Jesse" opf:role="aut">Dikeman, Kris</dc:creator>
    <dc:creator opf:file-as="Bullington, Jesse" opf:role="aut">Bullington, Jesse</dc:creator>
    <meta name="calibre:series" content="Beneath Ceaseless Skies"/>
    <dc:title>Beneath Ceaseless Skies #62</dc:title>
    <meta name="cover" content="id11"/>
    <dc:date>2011-02-10T07:00:00+00:00</dc:date>
    <meta name="calibre:timestamp" content="2011-02-12T01:44:30.555961+00:00"/>
    <dc:contributor opf:role="bkp">calibre (0.7.43) [http://calibre-ebook.com]</dc:contributor>
    <dc:identifier id="uuid_id" opf:scheme="uuid">92d85b12-a487-4f67-9198-f6d226ea27e2</dc:identifier>
  </metadata>
  <manifest>
    <item href="Beneath_Ceaseless_Skies_62_split_000.html" id="id16" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_001.html" id="id15" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_002.html" id="id14" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_003.html" id="id13" media-type="application/xhtml+xml"/>
    <item href="Beneath_Ceaseless_Skies_62_split_004.html" id="id12" media-type="application/xhtml+xml"/>
    <item href="images/00001.jpg" id="id3" media-type="image/jpeg"/>
    <item href="images/00002.jpg" id="id4" media-type="image/jpeg"/>
    <item href="images/00003.jpg" id="id5" media-type="image/jpeg"/>
    <item href="images/00004.jpg" id="id6" media-type="image/jpeg"/>
    <item href="images/00005.jpg" id="id7" media-type="image/jpeg"/>
    <item href="images/00006.jpg" id="id8" media-type="image/jpeg"/>
    <item href="images/calibre_cover.jpg" id="id11" media-type="image/jpeg"/>
    <item href="stylesheet1.css" id="css1" media-type="text/css"/>
    <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/>
    <item href="titlepage1.xhtml" id="titlepage1" media-type="application/xhtml+xml"/>
    <item href="toc.ncx" media-type="application/x-dtbncx+xml" id="ncx"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="titlepage1"/>
    <itemref idref="titlepage"/>
    <itemref idref="id16"/>
    <itemref idref="id15"/>
    <itemref idref="id14"/>
    <itemref idref="id13"/>
    <itemref idref="id12"/>
  </spine>
  <guide>
    <reference href="titlepage1.xhtml" type="cover" title="Cover"/>
  </guide>
</package>

Code:

from calibre.ebooks.metadata.opf2 import OPF, OPFCreator
from calibre.ptempfile import PersistentTemporaryFile

class BeneathCeaselessSkies(BasicNewsRecipe):
	title = u'Beneath Ceaseless Skies'
	description = u'Beneath Ceaseless Skies'
	__author__ = 'DuckPuppy'

	def build_index(self):
		epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2"
		soup = self.index_to_soup(epub_feed)
		item = soup.find(name='item')
		title = item.find(name='title').string
		issueloc = title.rfind("#")
		issue = title[issueloc+1:].encode('utf-8')
		url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue))
		f = urllib2.urlopen(url)
		tmp = PersistentTemporaryFile(suffix='.epub')
		self.report_progress(0,_('downloading epub'))
		tmp.write(f.read())
		tmp.close()
		zfile = zipfile.ZipFile(tmp.name, 'r')
		self.report_progress(0,_('extracting epub'))
		zfile.extractall(self.output_dir)
		tmp.close()
		index = os.path.join(self.output_dir, 'content.opf')
		indexfile = open(index, "r")
		opf = OPF(open(index, "r"), self.output_dir)
		indexfile.close()

		opf.series = title[:issueloc-1].strip("_")
		print 'issue#: ' + issue
		opf.series_index = issue
		opf.cover = os.path.join(self.output_dir, 'images/calibre_cover.jpg')

		indexfile = open(index, "w")
		opfc = OPFCreator(self.output_dir, opf)
		opfc.render(indexfile)
		indexfile.close()
		self.report_progress(1,_('epub downloaded and extracted'))
		return index

kovidgoyal · 02-23-2011, 08:13 PM

When adding news via the GUI metadata isn't read from the epub file, so that's not surprising.

duckpuppy · 02-23-2011, 09:59 PM

Quote:

Originally Posted by kovidgoyal

When adding news via the GUI metadata isn't read from the epub file, so that's not surprising.

Can that be added? Should I make a feature request? It seems like it's reading or responding to the series metadata from the epub, just not the series_index. Both bits of information are added by my recipe and are not in the original epub download, and neither showed up in calibre at all until I added the code in the recipe to set them.

kovidgoyal · 02-23-2011, 10:12 PM

Feel free to add a feature request.

02-23-2011, 03:10 PM	#1
duckpuppy Junior Member Posts: 8 Karma: 10 Join Date: Feb 2011 Device: Android, various	Beneath Ceaseless Skies recipe for direct epub downloading BCS is a free monthly digital publication for fantasy stories. I'm trying to massage the Now Toronto recipe into a BCS recipe. Beneath Ceaseless Skies has a feed for new issue announcements, and they provide PDF, epub, and mobi files for each issue. The problem is that the feed doesn't have a link to the published files, nor does it have a link to the issue page (the only link in each feed item is to the discussion forum thread for that issue). I have a semi-working recipe (at the bottom of this post). It can grab the most recent issue from the feed by parsing the issue number and constructing a direct download link to the epub, downloading it, and then unzipping it and returning the content.opf file as the index. I have the following problems: The epub has a cover image, but it doesn't get used in the resulting re-zipped epub. The cover is blank. There are usually multiple authors in each issue. The author metadata is just fine in the resulting epub, but the author sort is always the last,first of the last author listed in the author metadata. I can manually edit the metadata and click the button to automatically set the author sort from the author, and it's fine, but I'd like to make sure it's set during the conversion. It will probably become apparent from the code below that I'm not a Python expert, though I am a software developer by day, so I've been able to muddle through getting this to work a little. Anybody out there able to help me with the problems I've listed? As an added bonus, I'd love to set the series to "Beneath Ceaseless Skies" and set the series number to the issue number in the epub metadata... is that possible? Code: #!/usr/bin/env python # -- coding: utf-8 -- #Based on Starson17's NowToronto recipe, which in turn was based on Lars Jacob's Taz Digiabo recipe __license__ = 'GPL v3' __copyright__ = '2011, DuckPuppy' import os, urllib2, zipfile from calibre.web.feeds.news import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class BeneathCeaselessSkies(BasicNewsRecipe): title = u'Beneath Ceaseless Skies' description = u'Beneath Ceaseless Skies' __author__ = 'DuckPuppy' def build_index(self): epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2" soup = self.index_to_soup(epub_feed) item = soup.find(name='item') title = item.find(name='title').string print 'Title: ' + title issueloc = title.rfind("#") issue = title[issueloc+1:].encode('utf-8') print issue url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue)) print url f = urllib2.urlopen(url) tmp = PersistentTemporaryFile(suffix='.epub') self.report_progress(0,_('downloading epub')) tmp.write(f.read()) tmp.close() zfile = zipfile.ZipFile(tmp.name, 'r') self.report_progress(0,_('extracting epub')) zfile.extractall(self.output_dir) tmp.close() index = os.path.join(self.output_dir, 'content.opf') self.report_progress(1,_('epub downloaded and extracted')) return index Last edited by duckpuppy; 02-23-2011 at 03:13 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Downloading several years of blogposts via custom recipe	flyash	Calibre	4	01-01-2011 02:02 AM
Direct Downloads of DRMed Epub?	luqmaninbmore	PocketBook	5	05-11-2010 04:20 AM
Google Editions to offer direct ePub upload	wdejager	News	1	12-24-2009 12:22 AM
Beneath Ceaseless Skies--now in PRC (Mobi)	BearMountainBooks	Deals and Resources (No Self-Promotion or Affiliate Links)	5	11-18-2009 06:31 PM
downloading google epub	europas_ice	Sony Reader	3	03-22-2009 10:27 AM

02-23-2011, 03:37 PM	#2
kovidgoyal creator of calibre Posts: 45,345 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Use the OPF class from calibre.ebooks.metadaat.opf2 to manipulate the metadata in content.opf to your hearts content

02-23-2011, 08:13 PM	#4
kovidgoyal creator of calibre Posts: 45,345 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	When adding news via the GUI metadata isn't read from the epub file, so that's not surprising.

02-23-2011, 10:12 PM	#6
kovidgoyal creator of calibre Posts: 45,345 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Feel free to add a feature request.

Advert

Advert