Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-18-2010, 01:14 PM   #1
BrianG
Member
BrianG began at the beginning.
 
Posts: 23
Karma: 22
Join Date: Dec 2009
Device: Kindle DX
Recipes - Re-usable code

This thread can be used to post recipe code that might be re-usable or of general use to other coders.
Would a moderator kindly consider making this a STICKY?
Moderator Notice
This thread has been made a sticky, and unlike most other sticky threads, this one is open to all who have a useful bit of code to post or a useful recipe tip. Do not use this thread to ask any questions. Start a new thread. Posts that don't belong here will be deleted, but you are encouraged to post if you have something to share.
Please add a descriptive title to each post and explain how to use your code. Post the code in code tags and if it's longer than 4-5 lines, put spoiler tags around it to collapse it and make this thread more readable.

Last edited by Starson17; 05-16-2011 at 11:55 AM.
BrianG is offline   Reply With Quote
Old 01-18-2010, 01:25 PM   #2
BrianG
Member
BrianG began at the beginning.
 
Posts: 23
Karma: 22
Join Date: Dec 2009
Device: Kindle DX
Create a "virtual feed"

The PARSE_FEEDS method below is used to create a virtual feed from an RSS feed. in this case, it uses the PARSE_FEEDS method to:

Get all articles associated with the feed via the BasicRecipe methods
Scan thru the articles and find titles that contain the word "recipe"
Delete these articles from their existing feed
Create a new feed section called "recipes" to hold these articles


As-is looks like:
Code:
Main Feed
    Article a
    Recipe a
    Recipe b
    Article b
New looks like:
Code:
Main Feed
    Article a
    Article b
Recipes
    Recipe a
    Recipe b
This can be used to break up large RSS feeds or to create sub-topics, etc.
Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.web.feeds import Feed

    def parse_feeds (self): 

	# Do the "official" parse_feeds first
	feeds = BasicNewsRecipe.parse_feeds(self) 


	# Loop thru the articles in all feeds to find articles with "recipe" in it
	recipeArticles = []
	for curfeed in feeds:
		delList = []
		for a,curarticle in enumerate(curfeed.articles):
			if curarticle.title.upper().find('RECIPE') >= 0:
				recipeArticles.append(curarticle)
				delList.append(curarticle)
		if len(delList)>0:
			for d in delList:
				index = curfeed.articles.index(d)
				curfeed.articles[index:index+1] = []


	# If there are any recipes found, create, append a new Feed object
 	if len(recipeArticles) > 0:
		pfeed = Feed()
		pfeed.title = 'Recipes'
		pfeed.descrition = 'Recipe Feed (Virtual)'
        	             pfeed.image_url  = None
		pfeed.oldest_article = 30
        	             pfeed.id_counter = len(recipeArticles)
		# Create a new Feed, add the recipe articles, and append 
                         # to "official" list of feeds
		pfeed.articles = recipeArticles[:]
		feeds.append(pfeed)

	return feeds

Last edited by Starson17; 02-25-2011 at 10:06 PM.
BrianG is offline   Reply With Quote
Advert
Old 01-19-2010, 02:05 PM   #3
BrianG
Member
BrianG began at the beginning.
 
Posts: 23
Karma: 22
Join Date: Dec 2009
Device: Kindle DX
Add FEED objects during PARSE_INDEX

There is a slight difference between the Feeds object that is returned from PARSE_FEEDS and the object returned when using PARSE_INDEX.

The module below (FEED_TO_INDEX_APPEND) accepts a Feed object and an Index object. The Feed object is traversed and appended to the Index.

This can be useful in mixing the use of an index as well as RSS feeds in a single recipe. It could be called in a way similar to:

Spoiler:
Code:
	# Do the "official" parse_feeds first
	rssFeeds = Feed()
	rssFeeds = BasicNewsRecipe.parse_feeds(self)

	# Append the rssFeeds object to the feeds list that would be
             # used in PARSE_INDEX
	self.feed_to_index_append(rssFeeds[:], feeds)


Here is the conversion module:

Spoiler:
Code:
   def feed_to_index_append(self, feedObject, masterFeed):

	# Loop thru the feed object and build the correct type of article list
	for feed in feedObject:
		newArticles = []
		for article in feed.articles:
			newArt = {
	                          'title' : article.title,
      	                          'url'   : article.url,
            	                          'date'  : article.date,
                  	              'description' : article.text_summary
				}	

			newArticles.append(newArt)
	
		# append the newly-built list object to the index object 		             # passed in as masterFeed.
		masterFeed.append((feed.title,newArticles))

Last edited by Starson17; 02-25-2011 at 10:07 PM.
BrianG is offline   Reply With Quote
Old 01-24-2010, 11:13 AM   #4
BrianG
Member
BrianG began at the beginning.
 
Posts: 23
Karma: 22
Join Date: Dec 2009
Device: Kindle DX
Add dates to Feed title

Here's a module that will add the earliest/latest feed dates to the title of the feed (and therefore the Table of Contents entry as well). Useful when feeds aren't updated on a regular basis ...

The GET_FEED_DATES procedure accepts a single feed object (not a list of feeds, but a FOR loop could deal with a list). It also accepts a date mask of the begin/end dates for use with STRFTIME.

Here's an example of calling the procedure:

Spoiler:
Code:
		# Append the earliest/latest dates of the feed to feed title
		startDate, endDate = self.get_feed_dates(feed, '%d-%b')
		newFeedTitle = feed.title + '  (' + startDate + ' thru ' + endDate + ')'


Here's the procedure itself:
Spoiler:
Code:
   def get_feed_dates(self, feedObject, dateMask):
	
	startDate = feedObject.articles[len(feedObject.articles)-1].localtime.strftime(dateMask)
	endDate   = feedObject.articles[0].localtime.strftime(dateMask)

	return startDate, endDate

Last edited by Starson17; 02-25-2011 at 10:07 PM.
BrianG is offline   Reply With Quote
Old 04-11-2010, 06:24 AM   #5
hansd
Junior Member
hansd began at the beginning.
 
Posts: 5
Karma: 12
Join Date: Apr 2010
Device: dr1000
Default cover with custom image

Instead of fully replacing the cover using the cover_url, just replace the stock image of the default cover with a given image.

Add a cover_img_url field with the url in the definition of the class, and add the following code (imports and fuctions). There is no need to call any of these functions directly, all is handled via the redefined default_cover function.

The code is based on the code from BasicRecipe itself

Spoiler:
Code:
import os
from contextlib import nested, closing
from calibre import strftime, __appname__, __version__
import calibre.utils.PythonMagickWand as pw
from ctypes import byref
from calibre import fit_image


    def get_cover_img_url(self):
        return getattr(self, 'cover_img_url', None)

    def _download_cover_img(self):
        # hack to reuse download_cover
        old_cu = None
        try:
            old_cu = self.get_cover_ur()
        except:
            pass
        new_cu = self.get_cover_img_url()
        self.cover_url = new_cu     
        self._download_cover()

        outfile = os.path.join(self.output_dir, 'cover_img.jpg')
        self.prepare_masthead_image(self.cover_path, outfile) 
        
        self.cover_url = old_cu
        self.cover_img_path = outfile

    def download_cover_img(self):
        try:
            self._download_cover_img()
            self.report_progress(1, _('Downloaded cover to %s') % self.cover_img_path)
        except:
            self.log.exception('Failed to download cover img')
            self.cover_img_path = None
    
    def prepare_cover_image(self, path_to_image, out_path):
        with pw.ImageMagick():
            img = pw.NewMagickWand()
            if img < 0:
                raise RuntimeError('Out of memory')
            if not pw.MagickReadImage(img, path_to_image):
                severity = pw.ExceptionType(0)
                msg = pw.MagickGetException(img, byref(severity))
                raise IOError('Failed to read image from: %s: %s'
                        %(path_to_image, msg))
            if not pw.MagickWriteImage(img, out_path):
                raise RuntimeError('Failed to save image to %s'%out_path)
            pw.DestroyMagickWand(img)


    def default_cover(self, cover_file):
        '''
        Create a generic cover for recipes that have a special cover img
        '''
        try:
            try:
                from PIL import Image, ImageDraw, ImageFont
                Image, ImageDraw, ImageFont
            except ImportError:
                import Image, ImageDraw, ImageFont
            font_path = P('fonts/liberation/LiberationSerif-Bold.ttf')
            title = self.title if isinstance(self.title, unicode) else \
                    self.title.decode(preferred_encoding, 'replace')
            date = strftime(self.timefmt)
            app = '['+__appname__ +' '+__version__+']'

            COVER_WIDTH, COVER_HEIGHT = 590, 750
            img = Image.new('RGB', (COVER_WIDTH, COVER_HEIGHT), 'white')
            draw = ImageDraw.Draw(img)
            # Title
            font = ImageFont.truetype(font_path, 44)
            width, height = draw.textsize(title, font=font)
            left = max(int((COVER_WIDTH - width)/2.), 0)
            top = 15
            draw.text((left, top), title, fill=(0,0,0), font=font)
            bottom = top + height
            # Date
            font = ImageFont.truetype(font_path, 32)
            width, height = draw.textsize(date, font=font)
            left = max(int((COVER_WIDTH - width)/2.), 0)
            draw.text((left, bottom+15), date, fill=(0,0,0), font=font)
            # Vanity
            font = ImageFont.truetype(font_path, 28)
            width, height = draw.textsize(app, font=font)
            left = max(int((COVER_WIDTH - width)/2.), 0)
            top = COVER_HEIGHT - height - 15
            draw.text((left, top), app, fill=(0,0,0), font=font)

            # Logo
            logo_file = I('library.png')
            self.download_cover_img()
            if getattr(self, 'cover_img_path', None) is not None:
                logo_file = self.cover_img_path
            self.report_progress(1, _('using cover img from %s') % logo_file)
            logo = Image.open(logo_file, 'r')
            width, height = logo.size
            left = max(int((COVER_WIDTH - width)/2.), 0)
            top = max(int((COVER_HEIGHT - height)/2.), 0)
            img.paste(logo, (left, top))
            img = img.convert('RGB').convert('P', palette=Image.ADAPTIVE)
            img.convert('RGB').save(cover_file, 'JPEG')
            cover_file.flush()
        except Exception, e:
            self.log.exception('Failed to generate default cover ', e)
            return False
        return True

Last edited by Starson17; 02-25-2011 at 10:09 PM. Reason: refactored _download_cover_img
hansd is offline   Reply With Quote
Advert
Old 10-16-2010, 12:36 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Remove articles from feed

This code will easily remove articles from feeds based upon a character string in the title of the article or part of the URL. This is specifically for The New Yorker recipe to remove video links (word "video" in the title of the article) and Goings On About Town (GOAT in the url), but it can be easily adapted for any recipe:
Spoiler:
Code:
    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'VIDEO' in article.title.upper() or 'GOAT' in article.url:
            feed.articles.remove(article)
      return feeds


In addition to matching on the url and/or the title of the article, one can match on the article date or summary with article.date or article.text_summary.

Last edited by Starson17; 02-17-2011 at 04:38 PM.
Starson17 is offline   Reply With Quote
Old 10-16-2010, 08:07 PM   #7
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
One of several ways to Do a print Friendly Version

If the rss feed goes to a pages that has a nice link to get a printer friendly version that is clean. Then this method can work really well for you in most cases. You will have to know a little bit about regular expressions or of course you can do like I do and cheat and use firebug to get the link say for instance /services/printVersion.asp then let http://www.txt2re.com/index.php3 convert it to python reg expression.

then you could use code like this...
put this at the top to allow for the use of the tempfile usage
Code:
from calibre.ptempfile import PersistentTemporaryFile
then in your recipe simply add this code block and change only the urlreg=
Spoiler:
Code:
    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        
        br.open(url)
        '''
             we need to use a try catch block:
             what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does
             something with the error.
             So in our case we take and check to see if we can follow /content/printVersion, then if we can't
             then we simply pass it back the original calling url 
        '''
        
        try:
         response = br.follow_link(url_regex='.*?(\\/content\\/printVersion)', nr = 0)
         html = response.read()
        except:
         response = br.open(url)
         html = response.read()
         
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name
TonytheBookworm is offline   Reply With Quote
Old 11-15-2010, 10:11 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Recipe to download an EPUB from feed

So that this doesn't get lost, I'm going to repost it here. It's a recipe that grabs a single link to an EPUB.

The recipe modifies build_index, which is the method that gets the masthead image and cover, parses the feed for articles, retrieves the articles, removes tags from articles, etc. All of those steps ultimately produce a local directory structure that looks like an unzipped EPUB.

The recipe grabs the link to one EPUB (the first in the RSS feed), saves the EPUB locally, extracts it, and passes the result back into the recipe system as though all the other steps had been completed normally.

To use the recipe, just modify these lines:

epub_feed = "http://feeds.feedburner.com/NowEpubEditions"
soup = self.index_to_soup(epub_feed)
url = soup.find(name = 'feedburnerriglink').string

so that "url" points to an EPUB as in: "http://some.place.com/epubfile.epub"
The sample below grabs the first EPUB in an RSS feed, but you can just supply a single URL directly or grab it from the front page of a newspaper. I've posted a complete recipe to emphasize that the normal recipe methods, like "feeds", "remove_tags", etc. should all be omitted.

Spoiler:
Quote:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Based on Lars Jacob's Taz Digiabo recipe

__license__ = 'GPL v3'
__copyright__ = '2010, Starson17'

import os, urllib2, zipfile
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class NowToronto(BasicNewsRecipe):
title = u'Now Toronto'
description = u'Now Toronto'
__author__ = 'Starson17'
conversion_options = {
'no_default_epub_cover' : True
}

def build_index(self):
epub_feed = "http://feeds.feedburner.com/NowEpubEditions"
soup = self.index_to_soup(epub_feed)
url = soup.find(name = 'feedburnerriglink').string
f = urllib2.urlopen(url)
tmp = PersistentTemporaryFile(suffix='.epub')
self.report_progress(0,_('downloading epub'))
tmp.write(f.read())
tmp.close()
zfile = zipfile.ZipFile(tmp.name, 'r')
self.report_progress(0,_('extracting epub'))
zfile.extractall(self.output_dir)
tmp.close()
index = os.path.join(self.output_dir, 'content.opf')
self.report_progress(1,_('epub downloaded and extracted'))
return index
Starson17 is offline   Reply With Quote
Old 11-23-2010, 02:54 PM   #9
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Small piece of code to convert all links to text:

Spoiler:
Code:
    def preprocess_html(self, soup):
        for alink in soup.findAll('a'):
            if alink.string is not None:
               tstr = alink.string
               alink.replaceWith(tstr)
        return soup

Last edited by Starson17; 02-25-2011 at 10:10 PM.
kiklop74 is offline   Reply With Quote
Old 12-27-2010, 01:54 AM   #10
Pahan
Junior Member
Pahan began at the beginning.
 
Posts: 3
Karma: 16
Join Date: Dec 2010
Device: PRS 950
Recipe to keep track of feed items already downloaded and only download new items

Here is a recipe template that keeps track of already downloaded feed items and only downloads items that it hasn't seen before or whose description, content, or URL have changed. It does so by overriding the parse_feeds method.
Some caveats:
  • I recommend setting max_articles_per_feed and oldest_article to very high values. The first time, the recipe will download every item in every feed, but after that, it will "remember" not to do it again and will grab all new articles no matter how much time had elapsed since the last time it had been run and how many entries had been added. In particular, if you set max_articles_per_feed to a small value and the feed is one that lists all articles in a particular order, you might never see new articles.
  • The list of items downloaded for each feed will be stored in "Calibre configuration directory/recipes/recipe_storage/Recipe title/Feed title". This is probably suboptimal, and there ought to be a persistent storage API for recipes, but it's the best I could come up with.
  • The list of items downloaded is written to disk before the items are actually downloaded. Thus, if an item fails to download for some reason, the recipe won't know, and will not try to download it again. This could probably be fixed by writing the new item lists to temporary files and overriding some method later in the sequence to "commit" by overwriting the downloaded item lists with the new lists. (Thus, if the recipe fails before that, it will never get to that point, so the old lists will remain intact and will redownload next time the recipe is run.)
  • If there are no new items to download and remove_empty_feeds is set to True, the recipe will return an empty list of feeds, which will cause Calibre to raise an error. As far as I can tell, there is nothing that the recipe can do about that without a lot more coding.
  • I've tried to make this code portable, but I've only tested it on Linux systems, so let me know if it doesn't work on the other platforms. I am particularly unsure about newline handling.
Spoiler:
Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5

class OnlyLatestRecipe(BasicNewsRecipe):
    title          = u'Unknown News Source'
    oldest_article = 10000
    max_articles_per_feed = 10000
    feeds          = [ ]

    def parse_feeds(self):
        recipe_dir = os.path.join(config_dir,'recipes')
        hash_dir = os.path.join(recipe_dir,'recipe_storage')
        feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
        if not os.path.isdir(feed_dir):
            os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)

        feeds = BasicNewsRecipe.parse_feeds(self)

        for feed in feeds:
            feed_hash = urllib.quote(feed.title.encode('utf-8'),safe='')
            feed_fn = os.path.join(feed_dir,feed_hash)

            past_items = set()
            if os.path.exists(feed_fn):
               with file(feed_fn) as f:
                   for h in f:
                       past_items.add(h.strip())
                       
            cur_items = set()
            for article in feed.articles[:]:
                item_hash = md5()
                if article.content: item_hash.update(article.content.encode('utf-8'))
                if article.summary: item_hash.update(article.summary.encode('utf-8'))
                item_hash = item_hash.hexdigest()
                if article.url:
                    item_hash = article.url + ':' + item_hash
                cur_items.add(item_hash)
                if item_hash in past_items:
                    feed.articles.remove(article)
            with file(feed_fn,'w') as f:
                for h in cur_items:
                    f.write(h+'\n')

        remove = [f for f in feeds if len(f) == 0 and
                self.remove_empty_feeds]
        for f in remove:
            feeds.remove(f)

        return feeds

Last edited by Starson17; 02-25-2011 at 10:10 PM. Reason: Typos and grammar.
Pahan is offline   Reply With Quote
Old 01-18-2011, 11:40 AM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Rotate images

This code will rotate images so that the long dimension is vertical. It's been updated and revised from code found here after python magick wand was replaced by calibre.utils.magick and calibre.utils.magick.draw:

The code would typically be used for image-based recipes (comics) read on devices that don't have a g-sensor based autorotation capability or where you prefer to lock your reader software into portrait orientation. This code makes the long dimension of the image appear in the long dimension of the screen on a handheld device that can be rotated for better viewing. Don't use it for recipes being read on a non-rotatable PC monitor (unless you want to view images with your head tilted 90 degrees).

You may also want to set extra_css to include something like:
img {max-height:95%; min-height:95%;}
This leaves some room above and below the image for the links typically displayed in recipes.
Spoiler:
Code:
#Add these imports
from calibre.utils.magick import Image, PixelWand

    def postprocess_html(self, soup, first):
        #process all the images. assumes that the new html has the correct path
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            width, height = img.size
            print 'img is: ', iurl, 'width is: ', width, 'height is: ', height 
            if img < 0:
                raise RuntimeError('Out of memory')
            pw = PixelWand()
            if( width > height ) :
                print 'Rotate image'
                img.rotate(pw, -90)
                img.save(iurl)
        return soup

Last edited by Starson17; 05-16-2011 at 11:57 AM.
Starson17 is offline   Reply With Quote
Old 05-16-2011, 10:26 AM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Print_version and string handling.

The question of how to modify a URL in a recipe comes up often, particularly when using print_version. It's basic string handling in Python, but I thought I'd post some tips here:

The most common three ways this job (URL modification) is accomplished in recipes are to use : 1) replace, 2) partition/rpartition, and 3) split/join. You can read about them here:

Use replace whenever you are replacing some part of the URL string (usually in the middle) with some other string and 1) the part being replaced never changes (this lets you find it) and 2) you don't want to change the two parts on either side of the part being replaced. You can keep the part you are replacing (by inserting itself back in) and you can add additional stuff in the middle, but you can't change the first and last parts.

Use partition/rpartition when you want to split the string into three parts and the part in the middle never changes (so you can find it), but you want to change one or more of the three parts.

If there isn't any unchanging part in the middle to find, then the solution most commonly used is split/join with splitting being done on the slash. It splits the URL into each part between slashes and you can do whatever you want to each piece, then put them together with join (which adds back the original slashes). You find the part to change by counting the cut up pieces (between slashes) and changing the pieces you need to change and inserting anything needed between them.

This shows a URL having the string "/v-print/" being inserted after the 6th slash to change
http://www.03.com/04/05/06/07/08.html
to
http://www.03.com/04/05/06/v-print/07/08.html:
Code:
    def print_version(self,url):
        segments = url.split('/')
        printURL = '/'.join(segments[0:6]) + '/v-print/' + '/'.join(segments[6:])
        return printURL
If you don't understand the [0:6] or [6:] usage, you need to read up on lists and string slices in Python here.
Also see TonytheBookworm's method described above.

Last edited by Starson17; 11-04-2011 at 09:50 AM.
Starson17 is offline   Reply With Quote
Old 07-08-2011, 05:34 PM   #13
tylau0
Connoisseur
tylau0 began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
Set a recipe that generates a typical e-book rather than a periodical

Sometimes, one may wish to use a recipe to generate a typical e-book rather than a periodical. You may make that possible by adding the following piece of code inside your recipe class.

Spoiler:
Code:
    def create_opf(self, feeds, dir=None):
        if dir is None:
            dir = self.output_dir
        title = self.short_title()
        if self.output_profile.periodical_date_in_title:
            title += strftime(self.timefmt)
        mi = MetaInformation(title, [__appname__])
        mi.publisher = __appname__
        mi.author_sort = __appname__
        mi.publication_type = self.publication_type+':'+self.short_title()
        mi.timestamp = nowf()
        mi.comments = self.description
        if not isinstance(mi.comments, unicode):
            mi.comments = mi.comments.decode('utf-8', 'replace')
        mi.pubdate = nowf()
        opf_path = os.path.join(dir, 'index.opf')
        ncx_path = os.path.join(dir, 'index.ncx')

        opf = OPFCreator(dir, mi)
        # Add mastheadImage entry to <guide> section
        mp = getattr(self, 'masthead_path', None)
        if mp is not None and os.access(mp, os.R_OK):
            from calibre.ebooks.metadata.opf2 import Guide
            ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu())
            ref.type = 'masthead'
            ref.title = 'Masthead Image'
            opf.guide.append(ref)

        manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))]
        manifest.append(os.path.join(dir, 'index.html'))
        manifest.append(os.path.join(dir, 'index.ncx'))

        # Get cover
        cpath = getattr(self, 'cover_path', None)
        if cpath is None:
            pf = open(os.path.join(dir, 'cover.jpg'), 'wb')
            if self.default_cover(pf):
                cpath =  pf.name
        if cpath is not None and os.access(cpath, os.R_OK):
            opf.cover = cpath
            manifest.append(cpath)

        # Get masthead
        mpath = getattr(self, 'masthead_path', None)
        if mpath is not None and os.access(mpath, os.R_OK):
            manifest.append(mpath)

        opf.create_manifest_from_files_in(manifest)
        for mani in opf.manifest:
            if mani.path.endswith('.ncx'):
                mani.id = 'ncx'
            if mani.path.endswith('mastheadImage.jpg'):
                mani.id = 'masthead-image'

        entries = ['index.html']
        toc = TOC(base_path=dir)
        self.play_order_counter = 0
        self.play_order_map = {}

        def feed_index(num, parent):
            f = feeds[num]
            for j, a in enumerate(f):
                if getattr(a, 'downloaded', False):
                    adir = 'feed_%d/article_%d/'%(num, j)
                    auth = a.author
                    if not auth:
                        auth = None
                    desc = a.text_summary
                    if not desc:
                        desc = None
                    else:
                        desc = self.description_limiter(desc)
                    entries.append('%sindex.html'%adir)
                    po = self.play_order_map.get(entries[-1], None)
                    if po is None:
                        self.play_order_counter += 1
                        po = self.play_order_counter
                    parent.add_item('%sindex.html'%adir, None, a.title if a.title else _('Untitled Article'),
                                    play_order=po, author=auth, description=desc)
                    last = os.path.join(self.output_dir, ('%sindex.html'%adir).replace('/', os.sep))
                    for sp in a.sub_pages:
                        prefix = os.path.commonprefix([opf_path, sp])
                        relp = sp[len(prefix):]
                        entries.append(relp.replace(os.sep, '/'))
                        last = sp

                    if os.path.exists(last):
                        with open(last, 'rb') as fi:
                            src = fi.read().decode('utf-8')
                        soup = BeautifulSoup(src)
                        body = soup.find('body')
                        if body is not None:
                            prefix = '/'.join('..'for i in range(2*len(re.findall(r'link\d+', last))))
                            templ = self.navbar.generate(True, num, j, len(f),
                                            not self.has_single_feed,
                                            a.orig_url, __appname__, prefix=prefix,
                                            center=self.center_navbar)
                            elem = BeautifulSoup(templ.render(doctype='xhtml').decode('utf-8')).find('div')
                            body.insert(len(body.contents), elem)
                            with open(last, 'wb') as fi:
                                fi.write(unicode(soup).encode('utf-8'))
        if len(feeds) == 0:
            raise Exception('All feeds are empty, aborting.')

        if len(feeds) > 1:
            for i, f in enumerate(feeds):
                entries.append('feed_%d/index.html'%i)
                po = self.play_order_map.get(entries[-1], None)
                if po is None:
                    self.play_order_counter += 1
                    po = self.play_order_counter
                auth = getattr(f, 'author', None)
                if not auth:
                    auth = None
                desc = getattr(f, 'description', None)
                if not desc:
                    desc = None
                feed_index(i, toc.add_item('feed_%d/index.html'%i, None,
                    f.title, play_order=po, description=desc, author=auth))

        else:
            entries.append('feed_%d/index.html'%0)
            feed_index(0, toc)

        for i, p in enumerate(entries):
            entries[i] = os.path.join(dir, p.replace('/', os.sep))
        opf.create_spine(entries)
        opf.set_toc(toc)

        with nested(open(opf_path, 'wb'), open(ncx_path, 'wb')) as (opf_file, ncx_file):
            opf.render(opf_file, ncx_file)


The above basically changes one line in the implementation of the following function of calibre.web.feeds.news
Code:
def create_opf(self, feeds, dir=None):
Turn the line
Code:
mi.publication_type = 'periodical:'+self.publication_type+':'+self.short_title()
to
Code:
mi.publication_type = self.publication_type+':'+self.short_title()
The keyword "periodical" in mi.publication_type is searched in the current Calibre program to determine if it should generate an e-book or a periodical.

For an example recipe, check the "Ming Pao - Hong Kong" recipe in your Calibre copy or here. You may also get insights on how to append the publication date in the e-book title, which could be useful for differentiating e-books generated on different dates.

Moderator Notice: The difference between a typical ebook and a "periodical" is relevant for a Kindle, but may not be relevant for other ebook readers.

Last edited by tylau0; 07-09-2011 at 10:32 AM.
tylau0 is offline   Reply With Quote
Old 08-21-2011, 09:22 PM   #14
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Handling redirections

Let us assume that you have a feed with links that all point to redirected pages. By default Calibre does not handle this case so the safest way of doing this could be summarized like this:

Code:
    def print_version(self, url):
        return self.browser.open_novisit(url).geturl()
Of course similar thing can be done with urllib2 but using internal browser automatically adds support for sites that require login.

Last edited by kiklop74; 08-21-2011 at 09:38 PM.
kiklop74 is offline   Reply With Quote
Old 11-02-2011, 04:41 PM   #15
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Convert Images to Grayscale

For those who have grayscale only readers, this will convert images to grayscale. It saves between 8% and 20% of space. My tests on raw individual images showed 8%, but tests reported by others showed up to 20% reductions for the final ebook. YMMV

Spoiler:
Code:
from calibre.utils.magick import Image

    def postprocess_html(self, soup, first):
        #process all the images
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            if img < 0:
                raise RuntimeError('Out of memory')
            img.type = "GrayscaleType"
            img.save(iurl)
        return soup

Last edited by Starson17; 11-04-2011 at 09:51 AM.
Starson17 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DR800 The working (usable) screen resolution PaulS iRex 7 04-23-2010 01:27 PM
Let's create a source code repository for DR 800 related code? jraf iRex 3 03-11-2010 01:26 PM
any usable epub reader? janw iRex 10 09-04-2009 01:25 PM
FICTIONWISE, still usable? jcbeam Amazon Kindle 4 03-19-2009 02:17 PM
iLiad usable for scientists? doctorow iRex 5 08-14-2006 06:00 PM


All times are GMT -4. The time now is 11:50 AM.


MobileRead.com is a privately owned, operated and funded community.