![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Android, various
|
Beneath Ceaseless Skies recipe for direct epub downloading
BCS is a free monthly digital publication for fantasy stories. I'm trying to massage the Now Toronto recipe into a BCS recipe. Beneath Ceaseless Skies has a feed for new issue announcements, and they provide PDF, epub, and mobi files for each issue. The problem is that the feed doesn't have a link to the published files, nor does it have a link to the issue page (the only link in each feed item is to the discussion forum thread for that issue).
I have a semi-working recipe (at the bottom of this post). It can grab the most recent issue from the feed by parsing the issue number and constructing a direct download link to the epub, downloading it, and then unzipping it and returning the content.opf file as the index. I have the following problems:
It will probably become apparent from the code below that I'm not a Python expert, though I am a software developer by day, so I've been able to muddle through getting this to work a little. Anybody out there able to help me with the problems I've listed? As an added bonus, I'd love to set the series to "Beneath Ceaseless Skies" and set the series number to the issue number in the epub metadata... is that possible? Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- #Based on Starson17's NowToronto recipe, which in turn was based on Lars Jacob's Taz Digiabo recipe __license__ = 'GPL v3' __copyright__ = '2011, DuckPuppy' import os, urllib2, zipfile from calibre.web.feeds.news import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class BeneathCeaselessSkies(BasicNewsRecipe): title = u'Beneath Ceaseless Skies' description = u'Beneath Ceaseless Skies' __author__ = 'DuckPuppy' def build_index(self): epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2" soup = self.index_to_soup(epub_feed) item = soup.find(name='item') title = item.find(name='title').string print 'Title: ' + title issueloc = title.rfind("#") issue = title[issueloc+1:].encode('utf-8') print issue url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue)) print url f = urllib2.urlopen(url) tmp = PersistentTemporaryFile(suffix='.epub') self.report_progress(0,_('downloading epub')) tmp.write(f.read()) tmp.close() zfile = zipfile.ZipFile(tmp.name, 'r') self.report_progress(0,_('extracting epub')) zfile.extractall(self.output_dir) tmp.close() index = os.path.join(self.output_dir, 'content.opf') self.report_progress(1,_('epub downloaded and extracted')) return index Last edited by duckpuppy; 02-23-2011 at 03:13 PM. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,175
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use the OPF class from calibre.ebooks.metadaat.opf2 to manipulate the metadata in content.opf to your hearts content
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Android, various
|
Quote:
I've found what might be a strange bug with fetching news, though. I'm setting the series_index in my recipe, but when downloaded via "Fetch News" in the GUI, it's always showing up as book 1. If I use the command line "ebook-convert bcs.recipe bcs.epub" and manually add the resulting epub to calibre, it shows up with the correct book number (62). It seems to be the metadata.opf that calibre creates in the library folder of the book - when I use "Fetch News" to generate the epub, it contains a series_index of 1 (even though content.opf of the epub has it set to 62), but when I add the exact same epub manually (by manually copying it from my library folder, deleting the book from the library via the GUI, and dragging the copy back into calibre), it contains the right values. Code:
<?xml version='1.0' encoding='utf-8'?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id"> <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:publisher>_Beneath Ceaseless Skies_ Online Magazine</dc:publisher> <meta name="calibre:series_index" content="62"/> <dc:description>Issue #62 of <I>Beneath Ceaseless Skies</I> online magazine, featuring stories by Kris Dikeman and Jesse Bullington.</dc:description> <dc:language>en</dc:language> <dc:creator opf:file-as="Bullington, Jesse" opf:role="aut">Dikeman, Kris</dc:creator> <dc:creator opf:file-as="Bullington, Jesse" opf:role="aut">Bullington, Jesse</dc:creator> <meta name="calibre:series" content="Beneath Ceaseless Skies"/> <dc:title>Beneath Ceaseless Skies #62</dc:title> <meta name="cover" content="id11"/> <dc:date>2011-02-10T07:00:00+00:00</dc:date> <meta name="calibre:timestamp" content="2011-02-12T01:44:30.555961+00:00"/> <dc:contributor opf:role="bkp">calibre (0.7.43) [http://calibre-ebook.com]</dc:contributor> <dc:identifier id="uuid_id" opf:scheme="uuid">92d85b12-a487-4f67-9198-f6d226ea27e2</dc:identifier> </metadata> <manifest> <item href="Beneath_Ceaseless_Skies_62_split_000.html" id="id16" media-type="application/xhtml+xml"/> <item href="Beneath_Ceaseless_Skies_62_split_001.html" id="id15" media-type="application/xhtml+xml"/> <item href="Beneath_Ceaseless_Skies_62_split_002.html" id="id14" media-type="application/xhtml+xml"/> <item href="Beneath_Ceaseless_Skies_62_split_003.html" id="id13" media-type="application/xhtml+xml"/> <item href="Beneath_Ceaseless_Skies_62_split_004.html" id="id12" media-type="application/xhtml+xml"/> <item href="images/00001.jpg" id="id3" media-type="image/jpeg"/> <item href="images/00002.jpg" id="id4" media-type="image/jpeg"/> <item href="images/00003.jpg" id="id5" media-type="image/jpeg"/> <item href="images/00004.jpg" id="id6" media-type="image/jpeg"/> <item href="images/00005.jpg" id="id7" media-type="image/jpeg"/> <item href="images/00006.jpg" id="id8" media-type="image/jpeg"/> <item href="images/calibre_cover.jpg" id="id11" media-type="image/jpeg"/> <item href="stylesheet1.css" id="css1" media-type="text/css"/> <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/> <item href="titlepage1.xhtml" id="titlepage1" media-type="application/xhtml+xml"/> <item href="toc.ncx" media-type="application/x-dtbncx+xml" id="ncx"/> </manifest> <spine toc="ncx"> <itemref idref="titlepage1"/> <itemref idref="titlepage"/> <itemref idref="id16"/> <itemref idref="id15"/> <itemref idref="id14"/> <itemref idref="id13"/> <itemref idref="id12"/> </spine> <guide> <reference href="titlepage1.xhtml" type="cover" title="Cover"/> </guide> </package> Code:
from calibre.ebooks.metadata.opf2 import OPF, OPFCreator from calibre.ptempfile import PersistentTemporaryFile class BeneathCeaselessSkies(BasicNewsRecipe): title = u'Beneath Ceaseless Skies' description = u'Beneath Ceaseless Skies' __author__ = 'DuckPuppy' def build_index(self): epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2" soup = self.index_to_soup(epub_feed) item = soup.find(name='item') title = item.find(name='title').string issueloc = title.rfind("#") issue = title[issueloc+1:].encode('utf-8') url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue)) f = urllib2.urlopen(url) tmp = PersistentTemporaryFile(suffix='.epub') self.report_progress(0,_('downloading epub')) tmp.write(f.read()) tmp.close() zfile = zipfile.ZipFile(tmp.name, 'r') self.report_progress(0,_('extracting epub')) zfile.extractall(self.output_dir) tmp.close() index = os.path.join(self.output_dir, 'content.opf') indexfile = open(index, "r") opf = OPF(open(index, "r"), self.output_dir) indexfile.close() opf.series = title[:issueloc-1].strip("_") print 'issue#: ' + issue opf.series_index = issue opf.cover = os.path.join(self.output_dir, 'images/calibre_cover.jpg') indexfile = open(index, "w") opfc = OPFCreator(self.output_dir, opf) opfc.render(indexfile) indexfile.close() self.report_progress(1,_('epub downloaded and extracted')) return index |
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,175
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
When adding news via the GUI metadata isn't read from the epub file, so that's not surprising.
|
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: Android, various
|
Can that be added? Should I make a feature request? It seems like it's reading or responding to the series metadata from the epub, just not the series_index. Both bits of information are added by my recipe and are not in the original epub download, and neither showed up in calibre at all until I added the code in the recipe to set them.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,175
Karma: 22670164
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Feel free to add a feature request.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Downloading several years of blogposts via custom recipe | flyash | Calibre | 4 | 01-01-2011 02:02 AM |
Direct Downloads of DRMed Epub? | luqmaninbmore | PocketBook | 5 | 05-11-2010 04:20 AM |
Google Editions to offer direct ePub upload | wdejager | News | 1 | 12-24-2009 12:22 AM |
Beneath Ceaseless Skies--now in PRC (Mobi) | BearMountainBooks | Deals and Resources (No Self-Promotion or Affiliate Links) | 5 | 11-18-2009 06:31 PM |
downloading google epub | europas_ice | Sony Reader | 3 | 03-22-2009 10:27 AM |