Recipes - Re-usable code

BrianG · 01-18-2010, 01:14 PM

This thread can be used to post recipe code that might be re-usable or of general use to other coders.
Would a moderator kindly consider making this a STICKY?

Moderator Notice
This thread has been made a sticky, and unlike most other sticky threads, this one is open to all who have a useful bit of code to post or a useful recipe tip. Do not use this thread to ask any questions. Start a new thread. Posts that don't belong here will be deleted, but you are encouraged to post if you have something to share.
Please add a descriptive title to each post and explain how to use your code. Post the code in code tags and if it's longer than 4-5 lines, put spoiler tags around it to collapse it and make this thread more readable.

BrianG · 01-18-2010, 01:25 PM

The PARSE_FEEDS method below is used to create a virtual feed from an RSS feed. in this case, it uses the PARSE_FEEDS method to:

Get all articles associated with the feed via the BasicRecipe methods
Scan thru the articles and find titles that contain the word "recipe"
Delete these articles from their existing feed
Create a new feed section called "recipes" to hold these articles

As-is looks like:

Code:

Main Feed
    Article a
    Recipe a
    Recipe b
    Article b

New looks like:

Code:

Main Feed
    Article a
    Article b
Recipes
    Recipe a
    Recipe b

This can be used to break up large RSS feeds or to create sub-topics, etc.

Spoiler:

BrianG · 01-19-2010, 02:05 PM

There is a slight difference between the Feeds object that is returned from PARSE_FEEDS and the object returned when using PARSE_INDEX.

The module below (FEED_TO_INDEX_APPEND) accepts a Feed object and an Index object. The Feed object is traversed and appended to the Index.

This can be useful in mixing the use of an index as well as RSS feeds in a single recipe. It could be called in a way similar to:

Spoiler:

Here is the conversion module:

Spoiler:

BrianG · 01-24-2010, 11:13 AM

Here's a module that will add the earliest/latest feed dates to the title of the feed (and therefore the Table of Contents entry as well). Useful when feeds aren't updated on a regular basis ...

The GET_FEED_DATES procedure accepts a single feed object (not a list of feeds, but a FOR loop could deal with a list). It also accepts a date mask of the begin/end dates for use with STRFTIME.

Here's an example of calling the procedure:

Spoiler:

Here's the procedure itself:

Spoiler:

hansd · 04-11-2010, 06:24 AM

Instead of fully replacing the cover using the cover_url, just replace the stock image of the default cover with a given image.

Add a cover_img_url field with the url in the definition of the class, and add the following code (imports and fuctions). There is no need to call any of these functions directly, all is handled via the redefined default_cover function.

The code is based on the code from BasicRecipe itself

Spoiler:

Code:

import os
from contextlib import nested, closing
from calibre import strftime, __appname__, __version__
import calibre.utils.PythonMagickWand as pw
from ctypes import byref
from calibre import fit_image


    def get_cover_img_url(self):
        return getattr(self, 'cover_img_url', None)

    def _download_cover_img(self):
        # hack to reuse download_cover
        old_cu = None
        try:
            old_cu = self.get_cover_ur()
        except:
            pass
        new_cu = self.get_cover_img_url()
        self.cover_url = new_cu     
        self._download_cover()

        outfile = os.path.join(self.output_dir, 'cover_img.jpg')
        self.prepare_masthead_image(self.cover_path, outfile) 
        
        self.cover_url = old_cu
        self.cover_img_path = outfile

    def download_cover_img(self):
        try:
            self._download_cover_img()
            self.report_progress(1, _('Downloaded cover to %s') % self.cover_img_path)
        except:
            self.log.exception('Failed to download cover img')
            self.cover_img_path = None
    
    def prepare_cover_image(self, path_to_image, out_path):
        with pw.ImageMagick():
            img = pw.NewMagickWand()
            if img < 0:
                raise RuntimeError('Out of memory')
            if not pw.MagickReadImage(img, path_to_image):
                severity = pw.ExceptionType(0)
                msg = pw.MagickGetException(img, byref(severity))
                raise IOError('Failed to read image from: %s: %s'
                        %(path_to_image, msg))
            if not pw.MagickWriteImage(img, out_path):
                raise RuntimeError('Failed to save image to %s'%out_path)
            pw.DestroyMagickWand(img)


    def default_cover(self, cover_file):
        '''
        Create a generic cover for recipes that have a special cover img
        '''
        try:
            try:
                from PIL import Image, ImageDraw, ImageFont
                Image, ImageDraw, ImageFont
            except ImportError:
                import Image, ImageDraw, ImageFont
            font_path = P('fonts/liberation/LiberationSerif-Bold.ttf')
            title = self.title if isinstance(self.title, unicode) else \
                    self.title.decode(preferred_encoding, 'replace')
            date = strftime(self.timefmt)
            app = '['+__appname__ +' '+__version__+']'

            COVER_WIDTH, COVER_HEIGHT = 590, 750
            img = Image.new('RGB', (COVER_WIDTH, COVER_HEIGHT), 'white')
            draw = ImageDraw.Draw(img)
            # Title
            font = ImageFont.truetype(font_path, 44)
            width, height = draw.textsize(title, font=font)
            left = max(int((COVER_WIDTH - width)/2.), 0)
            top = 15
            draw.text((left, top), title, fill=(0,0,0), font=font)
            bottom = top + height
            # Date
            font = ImageFont.truetype(font_path, 32)
            width, height = draw.textsize(date, font=font)
            left = max(int((COVER_WIDTH - width)/2.), 0)
            draw.text((left, bottom+15), date, fill=(0,0,0), font=font)
            # Vanity
            font = ImageFont.truetype(font_path, 28)
            width, height = draw.textsize(app, font=font)
            left = max(int((COVER_WIDTH - width)/2.), 0)
            top = COVER_HEIGHT - height - 15
            draw.text((left, top), app, fill=(0,0,0), font=font)

            # Logo
            logo_file = I('library.png')
            self.download_cover_img()
            if getattr(self, 'cover_img_path', None) is not None:
                logo_file = self.cover_img_path
            self.report_progress(1, _('using cover img from %s') % logo_file)
            logo = Image.open(logo_file, 'r')
            width, height = logo.size
            left = max(int((COVER_WIDTH - width)/2.), 0)
            top = max(int((COVER_HEIGHT - height)/2.), 0)
            img.paste(logo, (left, top))
            img = img.convert('RGB').convert('P', palette=Image.ADAPTIVE)
            img.convert('RGB').save(cover_file, 'JPEG')
            cover_file.flush()
        except Exception, e:
            self.log.exception('Failed to generate default cover ', e)
            return False
        return True

Starson17 · 10-16-2010, 12:36 PM

This code will easily remove articles from feeds based upon a character string in the title of the article or part of the URL. This is specifically for The New Yorker recipe to remove video links (word "video" in the title of the article) and Goings On About Town (GOAT in the url), but it can be easily adapted for any recipe:

Spoiler:

In addition to matching on the url and/or the title of the article, one can match on the article date or summary with article.date or article.text_summary.

TonytheBookworm · 10-16-2010, 08:07 PM

If the rss feed goes to a pages that has a nice link to get a printer friendly version that is clean. Then this method can work really well for you in most cases. You will have to know a little bit about regular expressions or of course you can do like I do and cheat and use firebug to get the link say for instance /services/printVersion.asp then let http://www.txt2re.com/index.php3 convert it to python reg expression.

then you could use code like this...
put this at the top to allow for the use of the tempfile usage

Code:

from calibre.ptempfile import PersistentTemporaryFile

then in your recipe simply add this code block and change only the urlreg=

Spoiler:

Starson17 · 11-15-2010, 10:11 AM

So that this doesn't get lost, I'm going to repost it here. It's a recipe that grabs a single link to an EPUB.

The recipe modifies build_index, which is the method that gets the masthead image and cover, parses the feed for articles, retrieves the articles, removes tags from articles, etc. All of those steps ultimately produce a local directory structure that looks like an unzipped EPUB.

The recipe grabs the link to one EPUB (the first in the RSS feed), saves the EPUB locally, extracts it, and passes the result back into the recipe system as though all the other steps had been completed normally.

To use the recipe, just modify these lines:

epub_feed = "http://feeds.feedburner.com/NowEpubEditions"
soup = self.index_to_soup(epub_feed)
url = soup.find(name = 'feedburner

riglink').string

so that "url" points to an EPUB as in: "http://some.place.com/epubfile.epub"
The sample below grabs the first EPUB in an RSS feed, but you can just supply a single URL directly or grab it from the front page of a newspaper. I've posted a complete recipe to emphasize that the normal recipe methods, like "feeds", "remove_tags", etc. should all be omitted.

Spoiler:

kiklop74 · 11-23-2010, 02:54 PM

Small piece of code to convert all links to text:

Spoiler:

Pahan · 12-27-2010, 01:54 AM

Here is a recipe template that keeps track of already downloaded feed items and only downloads items that it hasn't seen before or whose description, content, or URL have changed. It does so by overriding the parse_feeds method.
Some caveats:

I recommend setting max_articles_per_feed and oldest_article to very high values. The first time, the recipe will download every item in every feed, but after that, it will "remember" not to do it again and will grab all new articles no matter how much time had elapsed since the last time it had been run and how many entries had been added. In particular, if you set max_articles_per_feed to a small value and the feed is one that lists all articles in a particular order, you might never see new articles.
The list of items downloaded for each feed will be stored in "Calibre configuration directory/recipes/recipe_storage/Recipe title/Feed title". This is probably suboptimal, and there ought to be a persistent storage API for recipes, but it's the best I could come up with.
The list of items downloaded is written to disk before the items are actually downloaded. Thus, if an item fails to download for some reason, the recipe won't know, and will not try to download it again. This could probably be fixed by writing the new item lists to temporary files and overriding some method later in the sequence to "commit" by overwriting the downloaded item lists with the new lists. (Thus, if the recipe fails before that, it will never get to that point, so the old lists will remain intact and will redownload next time the recipe is run.)
If there are no new items to download and remove_empty_feeds is set to True, the recipe will return an empty list of feeds, which will cause Calibre to raise an error. As far as I can tell, there is nothing that the recipe can do about that without a lot more coding.
I've tried to make this code portable, but I've only tested it on Linux systems, so let me know if it doesn't work on the other platforms. I am particularly unsure about newline handling.

Spoiler:

Starson17 · 01-18-2011, 11:40 AM

This code will rotate images so that the long dimension is vertical. It's been updated and revised from code found here after python magick wand was replaced by calibre.utils.magick and calibre.utils.magick.draw:

The code would typically be used for image-based recipes (comics) read on devices that don't have a g-sensor based autorotation capability or where you prefer to lock your reader software into portrait orientation. This code makes the long dimension of the image appear in the long dimension of the screen on a handheld device that can be rotated for better viewing. Don't use it for recipes being read on a non-rotatable PC monitor (unless you want to view images with your head tilted 90 degrees).

You may also want to set extra_css to include something like:
img {max-height:95%; min-height:95%;}
This leaves some room above and below the image for the links typically displayed in recipes.

Spoiler:

Starson17 · 05-16-2011, 10:26 AM

The question of how to modify a URL in a recipe comes up often, particularly when using print_version. It's basic string handling in Python, but I thought I'd post some tips here:

The most common three ways this job (URL modification) is accomplished in recipes are to use : 1) replace, 2) partition/rpartition, and 3) split/join. You can read about them here:

Use replace whenever you are replacing some part of the URL string (usually in the middle) with some other string and 1) the part being replaced never changes (this lets you find it) and 2) you don't want to change the two parts on either side of the part being replaced. You can keep the part you are replacing (by inserting itself back in) and you can add additional stuff in the middle, but you can't change the first and last parts.

Use partition/rpartition when you want to split the string into three parts and the part in the middle never changes (so you can find it), but you want to change one or more of the three parts.

If there isn't any unchanging part in the middle to find, then the solution most commonly used is split/join with splitting being done on the slash. It splits the URL into each part between slashes and you can do whatever you want to each piece, then put them together with join (which adds back the original slashes). You find the part to change by counting the cut up pieces (between slashes) and changing the pieces you need to change and inserting anything needed between them.

This shows a URL having the string "/v-print/" being inserted after the 6th slash to change
http://www.03.com/04/05/06/07/08.html
to
http://www.03.com/04/05/06/v-print/07/08.html:

Code:

    def print_version(self,url):
        segments = url.split('/')
        printURL = '/'.join(segments[0:6]) + '/v-print/' + '/'.join(segments[6:])
        return printURL

If you don't understand the [0:6] or [6:] usage, you need to read up on lists and string slices in Python here.
Also see TonytheBookworm's method described above.

tylau0 · 07-08-2011, 05:34 PM

Sometimes, one may wish to use a recipe to generate a typical e-book rather than a periodical. You may make that possible by adding the following piece of code inside your recipe class.

Spoiler:

Code:

    def create_opf(self, feeds, dir=None):
        if dir is None:
            dir = self.output_dir
        title = self.short_title()
        if self.output_profile.periodical_date_in_title:
            title += strftime(self.timefmt)
        mi = MetaInformation(title, [__appname__])
        mi.publisher = __appname__
        mi.author_sort = __appname__
        mi.publication_type = self.publication_type+':'+self.short_title()
        mi.timestamp = nowf()
        mi.comments = self.description
        if not isinstance(mi.comments, unicode):
            mi.comments = mi.comments.decode('utf-8', 'replace')
        mi.pubdate = nowf()
        opf_path = os.path.join(dir, 'index.opf')
        ncx_path = os.path.join(dir, 'index.ncx')

        opf = OPFCreator(dir, mi)
        # Add mastheadImage entry to <guide> section
        mp = getattr(self, 'masthead_path', None)
        if mp is not None and os.access(mp, os.R_OK):
            from calibre.ebooks.metadata.opf2 import Guide
            ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu())
            ref.type = 'masthead'
            ref.title = 'Masthead Image'
            opf.guide.append(ref)

        manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))]
        manifest.append(os.path.join(dir, 'index.html'))
        manifest.append(os.path.join(dir, 'index.ncx'))

        # Get cover
        cpath = getattr(self, 'cover_path', None)
        if cpath is None:
            pf = open(os.path.join(dir, 'cover.jpg'), 'wb')
            if self.default_cover(pf):
                cpath =  pf.name
        if cpath is not None and os.access(cpath, os.R_OK):
            opf.cover = cpath
            manifest.append(cpath)

        # Get masthead
        mpath = getattr(self, 'masthead_path', None)
        if mpath is not None and os.access(mpath, os.R_OK):
            manifest.append(mpath)

        opf.create_manifest_from_files_in(manifest)
        for mani in opf.manifest:
            if mani.path.endswith('.ncx'):
                mani.id = 'ncx'
            if mani.path.endswith('mastheadImage.jpg'):
                mani.id = 'masthead-image'

        entries = ['index.html']
        toc = TOC(base_path=dir)
        self.play_order_counter = 0
        self.play_order_map = {}

        def feed_index(num, parent):
            f = feeds[num]
            for j, a in enumerate(f):
                if getattr(a, 'downloaded', False):
                    adir = 'feed_%d/article_%d/'%(num, j)
                    auth = a.author
                    if not auth:
                        auth = None
                    desc = a.text_summary
                    if not desc:
                        desc = None
                    else:
                        desc = self.description_limiter(desc)
                    entries.append('%sindex.html'%adir)
                    po = self.play_order_map.get(entries[-1], None)
                    if po is None:
                        self.play_order_counter += 1
                        po = self.play_order_counter
                    parent.add_item('%sindex.html'%adir, None, a.title if a.title else _('Untitled Article'),
                                    play_order=po, author=auth, description=desc)
                    last = os.path.join(self.output_dir, ('%sindex.html'%adir).replace('/', os.sep))
                    for sp in a.sub_pages:
                        prefix = os.path.commonprefix([opf_path, sp])
                        relp = sp[len(prefix):]
                        entries.append(relp.replace(os.sep, '/'))
                        last = sp

                    if os.path.exists(last):
                        with open(last, 'rb') as fi:
                            src = fi.read().decode('utf-8')
                        soup = BeautifulSoup(src)
                        body = soup.find('body')
                        if body is not None:
                            prefix = '/'.join('..'for i in range(2*len(re.findall(r'link\d+', last))))
                            templ = self.navbar.generate(True, num, j, len(f),
                                            not self.has_single_feed,
                                            a.orig_url, __appname__, prefix=prefix,
                                            center=self.center_navbar)
                            elem = BeautifulSoup(templ.render(doctype='xhtml').decode('utf-8')).find('div')
                            body.insert(len(body.contents), elem)
                            with open(last, 'wb') as fi:
                                fi.write(unicode(soup).encode('utf-8'))
        if len(feeds) == 0:
            raise Exception('All feeds are empty, aborting.')

        if len(feeds) > 1:
            for i, f in enumerate(feeds):
                entries.append('feed_%d/index.html'%i)
                po = self.play_order_map.get(entries[-1], None)
                if po is None:
                    self.play_order_counter += 1
                    po = self.play_order_counter
                auth = getattr(f, 'author', None)
                if not auth:
                    auth = None
                desc = getattr(f, 'description', None)
                if not desc:
                    desc = None
                feed_index(i, toc.add_item('feed_%d/index.html'%i, None,
                    f.title, play_order=po, description=desc, author=auth))

        else:
            entries.append('feed_%d/index.html'%0)
            feed_index(0, toc)

        for i, p in enumerate(entries):
            entries[i] = os.path.join(dir, p.replace('/', os.sep))
        opf.create_spine(entries)
        opf.set_toc(toc)

        with nested(open(opf_path, 'wb'), open(ncx_path, 'wb')) as (opf_file, ncx_file):
            opf.render(opf_file, ncx_file)

The above basically changes one line in the implementation of the following function of calibre.web.feeds.news

Code:

def create_opf(self, feeds, dir=None):

Turn the line

Code:

mi.publication_type = 'periodical:'+self.publication_type+':'+self.short_title()

to

Code:

mi.publication_type = self.publication_type+':'+self.short_title()

The keyword "periodical" in mi.publication_type is searched in the current Calibre program to determine if it should generate an e-book or a periodical.

For an example recipe, check the "Ming Pao - Hong Kong" recipe in your Calibre copy or here. You may also get insights on how to append the publication date in the e-book title, which could be useful for differentiating e-books generated on different dates.

Moderator Notice: The difference between a typical ebook and a "periodical" is relevant for a Kindle, but may not be relevant for other ebook readers.

kiklop74 · 08-21-2011, 09:22 PM

Let us assume that you have a feed with links that all point to redirected pages. By default Calibre does not handle this case so the safest way of doing this could be summarized like this:

Code:

    def print_version(self, url):
        return self.browser.open_novisit(url).geturl()

Of course similar thing can be done with urllib2 but using internal browser automatically adds support for sites that require login.

Starson17 · 11-02-2011, 04:41 PM

For those who have grayscale only readers, this will convert images to grayscale. It saves between 8% and 20% of space. My tests on raw individual images showed 8%, but tests reported by others showed up to 20% reductions for the final ebook. YMMV

Spoiler:

01-18-2010, 01:14 PM	#1
BrianG Member Posts: 23 Karma: 22 Join Date: Dec 2009 Device: Kindle DX	Recipes - Re-usable code This thread can be used to post recipe code that might be re-usable or of general use to other coders. Would a moderator kindly consider making this a STICKY? Moderator Notice This thread has been made a sticky, and unlike most other sticky threads, this one is open to all who have a useful bit of code to post or a useful recipe tip. Do not use this thread to ask any questions. Start a new thread. Posts that don't belong here will be deleted, but you are encouraged to post if you have something to share. Please add a descriptive title to each post and explain how to use your code. Post the code in code tags and if it's longer than 4-5 lines, put spoiler tags around it to collapse it and make this thread more readable. Last edited by Starson17; 05-16-2011 at 11:55 AM.

01-24-2010, 11:13 AM	#4
BrianG Member Posts: 23 Karma: 22 Join Date: Dec 2009 Device: Kindle DX	Add dates to Feed title Here's a module that will add the earliest/latest feed dates to the title of the feed (and therefore the Table of Contents entry as well). Useful when feeds aren't updated on a regular basis ... The GET_FEED_DATES procedure accepts a single feed object (not a list of feeds, but a FOR loop could deal with a list). It also accepts a date mask of the begin/end dates for use with STRFTIME. Here's an example of calling the procedure: Spoiler: Code: # Append the earliest/latest dates of the feed to feed title startDate, endDate = self.get_feed_dates(feed, '%d-%b') newFeedTitle = feed.title + ' (' + startDate + ' thru ' + endDate + ')' Here's the procedure itself: Spoiler: Code: def get_feed_dates(self, feedObject, dateMask): startDate = feedObject.articles[len(feedObject.articles)-1].localtime.strftime(dateMask) endDate = feedObject.articles[0].localtime.strftime(dateMask) return startDate, endDate Last edited by Starson17; 02-25-2011 at 10:07 PM.

10-16-2010, 12:36 PM	#6
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	Remove articles from feed This code will easily remove articles from feeds based upon a character string in the title of the article or part of the URL. This is specifically for The New Yorker recipe to remove video links (word "video" in the title of the article) and Goings On About Town (GOAT in the url), but it can be easily adapted for any recipe: Spoiler: Code: def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: print 'article.title is: ', article.title if 'VIDEO' in article.title.upper() or 'GOAT' in article.url: feed.articles.remove(article) return feeds In addition to matching on the url and/or the title of the article, one can match on the article date or summary with article.date or article.text_summary. Last edited by Starson17; 02-17-2011 at 04:38 PM.

10-16-2010, 08:07 PM	#7
TonytheBookworm Addict Posts: 264 Karma: 62 Join Date: May 2010 Device: kindle 2, kindle 3, Kindle fire	One of several ways to Do a print Friendly Version If the rss feed goes to a pages that has a nice link to get a printer friendly version that is clean. Then this method can work really well for you in most cases. You will have to know a little bit about regular expressions or of course you can do like I do and cheat and use firebug to get the link say for instance /services/printVersion.asp then let http://www.txt2re.com/index.php3 convert it to python reg expression. then you could use code like this... put this at the top to allow for the use of the tempfile usage Code: from calibre.ptempfile import PersistentTemporaryFile then in your recipe simply add this code block and change only the urlreg= Spoiler: Code: temp_files = [] articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() br.open(url) ''' we need to use a try catch block: what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does something with the error. So in our case we take and check to see if we can follow /content/printVersion, then if we can't then we simply pass it back the original calling url ''' try: response = br.follow_link(url_regex='.*?(\\/content\\/printVersion)', nr = 0) html = response.read() except: response = br.open(url) html = response.read() self.temp_files.append(PersistentTemporaryFile('_fa.html')) self.temp_files[-1].write(html) self.temp_files[-1].close() return self.temp_files[-1].name

11-23-2010, 02:54 PM	#9
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	Small piece of code to convert all links to text: Spoiler: Code: def preprocess_html(self, soup): for alink in soup.findAll('a'): if alink.string is not None: tstr = alink.string alink.replaceWith(tstr) return soup Last edited by Starson17; 02-25-2011 at 10:10 PM.

12-27-2010, 01:54 AM	#10
Pahan Junior Member Posts: 3 Karma: 16 Join Date: Dec 2010 Device: PRS 950	Recipe to keep track of feed items already downloaded and only download new items Here is a recipe template that keeps track of already downloaded feed items and only downloads items that it hasn't seen before or whose description, content, or URL have changed. It does so by overriding the parse_feeds method. Some caveats: I recommend setting max_articles_per_feed and oldest_article to very high values. The first time, the recipe will download every item in every feed, but after that, it will "remember" not to do it again and will grab all new articles no matter how much time had elapsed since the last time it had been run and how many entries had been added. In particular, if you set max_articles_per_feed to a small value and the feed is one that lists all articles in a particular order, you might never see new articles. The list of items downloaded for each feed will be stored in "Calibre configuration directory/recipes/recipe_storage/Recipe title/Feed title". This is probably suboptimal, and there ought to be a persistent storage API for recipes, but it's the best I could come up with. The list of items downloaded is written to disk before the items are actually downloaded. Thus, if an item fails to download for some reason, the recipe won't know, and will not try to download it again. This could probably be fixed by writing the new item lists to temporary files and overriding some method later in the sequence to "commit" by overwriting the downloaded item lists with the new lists. (Thus, if the recipe fails before that, it will never get to that point, so the old lists will remain intact and will redownload next time the recipe is run.) If there are no new items to download and remove_empty_feeds is set to True, the recipe will return an empty list of feeds, which will cause Calibre to raise an error. As far as I can tell, there is nothing that the recipe can do about that without a lot more coding. I've tried to make this code portable, but I've only tested it on Linux systems, so let me know if it doesn't work on the other platforms. I am particularly unsure about newline handling. Spoiler: Code: from calibre.constants import config_dir, CONFIG_DIR_MODE import os, os.path, urllib from hashlib import md5 class OnlyLatestRecipe(BasicNewsRecipe): title = u'Unknown News Source' oldest_article = 10000 max_articles_per_feed = 10000 feeds = [ ] def parse_feeds(self): recipe_dir = os.path.join(config_dir,'recipes') hash_dir = os.path.join(recipe_dir,'recipe_storage') feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':')) if not os.path.isdir(feed_dir): os.makedirs(feed_dir,mode=CONFIG_DIR_MODE) feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: feed_hash = urllib.quote(feed.title.encode('utf-8'),safe='') feed_fn = os.path.join(feed_dir,feed_hash) past_items = set() if os.path.exists(feed_fn): with file(feed_fn) as f: for h in f: past_items.add(h.strip()) cur_items = set() for article in feed.articles[:]: item_hash = md5() if article.content: item_hash.update(article.content.encode('utf-8')) if article.summary: item_hash.update(article.summary.encode('utf-8')) item_hash = item_hash.hexdigest() if article.url: item_hash = article.url + ':' + item_hash cur_items.add(item_hash) if item_hash in past_items: feed.articles.remove(article) with file(feed_fn,'w') as f: for h in cur_items: f.write(h+'\n') remove = [f for f in feeds if len(f) == 0 and self.remove_empty_feeds] for f in remove: feeds.remove(f) return feeds Last edited by Starson17; 02-25-2011 at 10:10 PM. Reason: Typos and grammar.

01-18-2011, 11:40 AM	#11
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	Rotate images This code will rotate images so that the long dimension is vertical. It's been updated and revised from code found here after python magick wand was replaced by calibre.utils.magick and calibre.utils.magick.draw: The code would typically be used for image-based recipes (comics) read on devices that don't have a g-sensor based autorotation capability or where you prefer to lock your reader software into portrait orientation. This code makes the long dimension of the image appear in the long dimension of the screen on a handheld device that can be rotated for better viewing. Don't use it for recipes being read on a non-rotatable PC monitor (unless you want to view images with your head tilted 90 degrees). You may also want to set extra_css to include something like: img {max-height:95%; min-height:95%;} This leaves some room above and below the image for the links typically displayed in recipes. Spoiler: Code: #Add these imports from calibre.utils.magick import Image, PixelWand def postprocess_html(self, soup, first): #process all the images. assumes that the new html has the correct path for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')): iurl = tag['src'] img = Image() img.open(iurl) width, height = img.size print 'img is: ', iurl, 'width is: ', width, 'height is: ', height if img < 0: raise RuntimeError('Out of memory') pw = PixelWand() if( width > height ) : print 'Rotate image' img.rotate(pw, -90) img.save(iurl) return soup Last edited by Starson17; 05-16-2011 at 11:57 AM.

05-16-2011, 10:26 AM	#12
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	Print_version and string handling. The question of how to modify a URL in a recipe comes up often, particularly when using print_version. It's basic string handling in Python, but I thought I'd post some tips here: The most common three ways this job (URL modification) is accomplished in recipes are to use : 1) replace, 2) partition/rpartition, and 3) split/join. You can read about them here: Use replace whenever you are replacing some part of the URL string (usually in the middle) with some other string and 1) the part being replaced never changes (this lets you find it) and 2) you don't want to change the two parts on either side of the part being replaced. You can keep the part you are replacing (by inserting itself back in) and you can add additional stuff in the middle, but you can't change the first and last parts. Use partition/rpartition when you want to split the string into three parts and the part in the middle never changes (so you can find it), but you want to change one or more of the three parts. If there isn't any unchanging part in the middle to find, then the solution most commonly used is split/join with splitting being done on the slash. It splits the URL into each part between slashes and you can do whatever you want to each piece, then put them together with join (which adds back the original slashes). You find the part to change by counting the cut up pieces (between slashes) and changing the pieces you need to change and inserting anything needed between them. This shows a URL having the string "/v-print/" being inserted after the 6th slash to change http://www.03.com/04/05/06/07/08.html to http://www.03.com/04/05/06/v-print/07/08.html: Code: def print_version(self,url): segments = url.split('/') printURL = '/'.join(segments[0:6]) + '/v-print/' + '/'.join(segments[6:]) return printURL If you don't understand the [0:6] or [6:] usage, you need to read up on lists and string slices in Python here. Also see TonytheBookworm's method described above. Last edited by Starson17; 11-04-2011 at 09:50 AM.

08-21-2011, 09:22 PM	#14
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	Handling redirections Let us assume that you have a feed with links that all point to redirected pages. By default Calibre does not handle this case so the safest way of doing this could be summarized like this: Code: def print_version(self, url): return self.browser.open_novisit(url).geturl() Of course similar thing can be done with urllib2 but using internal browser automatically adds support for sites that require login. Last edited by kiklop74; 08-21-2011 at 09:38 PM.

11-02-2011, 04:41 PM	#15
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	Convert Images to Grayscale For those who have grayscale only readers, this will convert images to grayscale. It saves between 8% and 20% of space. My tests on raw individual images showed 8%, but tests reported by others showed up to 20% reductions for the final ebook. YMMV Spoiler: Code: from calibre.utils.magick import Image def postprocess_html(self, soup, first): #process all the images for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')): iurl = tag['src'] img = Image() img.open(iurl) if img < 0: raise RuntimeError('Out of memory') img.type = "GrayscaleType" img.save(iurl) return soup Last edited by Starson17; 11-04-2011 at 09:51 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
DR800 The working (usable) screen resolution	PaulS	iRex	7	04-23-2010 01:27 PM
Let's create a source code repository for DR 800 related code?	jraf	iRex	3	03-11-2010 01:26 PM
any usable epub reader?	janw	iRex	10	09-04-2009 01:25 PM
FICTIONWISE, still usable?	jcbeam	Amazon Kindle	4	03-19-2009 02:17 PM
iLiad usable for scientists?	doctorow	iRex	5	08-14-2006 06:00 PM

Advert

Advert