Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-16-2011, 11:33 PM   #1
Alexis
Member
Alexis began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jan 2011
Device: Kindle
Nature news - updated recipe

Hello,

Here is my first effort at improving a pre-existing recipe. I have mainly tidied up the markup and added some CSS formatting etc.

If any experienced rules developers would care to give it a look, I'd be interested in feedback... e.g. could I have done things in more straightforward ways? any useful tips?

Also, is this the proper way to submit recipes for inclusion in the calibre distribution?

Thanks

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag
import re

class NatureNews(BasicNewsRecipe):
    title          = u'Nature News'
    language       = 'en'
    __author__     = 'Krittika Goyal, Starson17'
    oldest_article = 31 #days
    remove_empty_feeds    = True
    max_articles_per_feed = 50

    no_stylesheets = True
    keep_only_tags = [dict(name='div', attrs={'id':'content'})]
#    remove_tags_before = dict(name='h1', attrs={'class':'heading entry-title'})
#    remove_tags_after  = dict(name='h2', attrs={'id':'comments'})
    remove_tags = [
       dict(name='h2', attrs={'id':'comments'}),
       dict(attrs={'alt':'Advertisement'}),
       dict(name='div', attrs={'class':'ad'}),
       dict(attrs={'class':'Z3988'}),
       dict(attrs={'class':['formatpublished','type-of-article','cleardiv','disclaimer','buttons','comments xoxo']}),
       dict(name='a', attrs={'href':'#comments'}),
       dict(name='h2',attrs={'class':'subheading plusicon icon-add-comment'})
    ] 

    preprocess_regexps = [
        (re.compile(r'<p>ADVERTISEMENT</p>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        ]
    
    extra_css             = '''
                            .author { text-align: right; font-size: small; line-height:1em; margin-top:0px; margin-left:0; margin-right:0; margin-bottom: 0; }
                            .imagedescription { font-size: small; font-style:italic; line-height:1em; margin-top:5px; margin-left:0; margin-right:0; margin-bottom: 0; }
                            .imagecredit { font-size: x-small; font-style: normal; font-weight: bold}
                            '''

    feeds = [('Nature News', 'http://feeds.nature.com/news/rss/most_recent')]
    
    def preprocess_html(self,soup):
        # The author name is slightly buried - dig it up
        author = soup.find('p', {'class':'byline'})
        if author:
            # Find out the author's name
            authornamediv = author.find('span',{'class':'author fn'})
            authornamelink = authornamediv.find('a')
            if authornamelink:
                authorname = authornamelink.contents[0]
            else:
                authorname = authornamediv.contents[0]
            # Stick the author's name in the byline tag
            tag = Tag(soup,'div')
            tag['class'] = 'author'
            tag.insert(0,authorname.strip())
            author.replaceWith(tag)
        
        # Change the intro from a p to a div
        intro = soup.find('p',{'class':'intro'})
        if intro:
            tag = Tag(soup,'div')
            tag['class'] = 'intro'
            tag.insert(0,intro.contents[0])
            intro.replaceWith(tag)
            
        # Change span class=imagedescription to div
        descr = soup.find('span',{'class':'imagedescription'})
        if descr:
            tag = Tag(soup,'div')
            tag['class'] = 'imagedescription'
            tag.insert(0,descr.renderContents())
            descr.replaceWith(tag)
        
        # The references are in a list, let's make them simpler
        reflistcont =  soup.find('ul',{'id':'article-refrences'})
        if reflistcont:
            reflist = reflistcont.li.renderContents()
            tag = Tag(soup,'div')
            tag['class'] = 'article-references'
            tag.insert(0,reflist)
            reflistcont.replaceWith(tag)
        
        # Within the id=content div, we need to remove all the stuff after the end of the class=entry-content
        entrycontent = soup.find('div',{'class':'entry-content'})
        for nextSibling in entrycontent.findNextSiblings():
            nextSibling.extract()
                
        return soup
Alexis is offline   Reply With Quote
Old 01-16-2011, 11:37 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,558
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Thanks, updated. Submitting recipes in this forum is fine. Your code also looks fine.
kovidgoyal is online now   Reply With Quote
Advert
Old 09-27-2012, 04:19 PM   #3
Kevin8or
Guru
Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.
 
Kevin8or's Avatar
 
Posts: 977
Karma: 43409226
Join Date: Sep 2011
Location: Bay Area, CA
Device: Kindle 3
Hello news chefs,

The Nature News recipe (for nature.com) is generating only a header(s) without the actual articles. Does anyone feel up to cooking?

TIA,
Kevin
Kevin8or is offline   Reply With Quote
Old 10-05-2012, 03:36 PM   #4
Kevin8or
Guru
Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.Kevin8or ought to be getting tired of karma fortunes by now.
 
Kevin8or's Avatar
 
Posts: 977
Karma: 43409226
Join Date: Sep 2011
Location: Bay Area, CA
Device: Kindle 3
Thank you to the person(s) who repaired this recipe.
Kevin8or is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Updated Telepolis (News+Artikel) Recipe syntaxis Recipes 8 05-15-2011 07:40 AM
Updated recipe for Le Monde? veezh Recipes 5 01-20-2011 10:06 PM
One new recipe and other one updated (In Spanish) desUBIKado Recipes 3 01-19-2011 04:58 AM
Updated New York Times recipe nickredding Recipes 2 11-20-2010 11:53 AM
The textbook of the future: Nature News: AprilHare News 0 04-03-2009 09:19 PM


All times are GMT -4. The time now is 05:30 AM.


MobileRead.com is a privately owned, operated and funded community.