Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-18-2014, 10:13 AM   #1
blackberry4
Junior Member
blackberry4 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2014
Device: Kindle2
Business Week Magazine

Hello

For a few months now I have noticed the recipe for Business week Magazine is only downloading the headlines but no articles and the cover page is very dated.

Any help would be great.

Thanks very much!
blackberry4 is offline   Reply With Quote
Old 01-18-2014, 02:04 PM   #2
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,163
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I made a quick update for this recipe. Hope, this will work for you.

Spoiler:
Code:
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
from collections import OrderedDict

class BusinessWeekMagazine(BasicNewsRecipe):

    title       = 'Business Week Magazine'
    __author__  = 'Rick Shang, Armin Geller' # AGE Upd 2014-01-18

    description = 'A renowned business publication. Business news, trends and profiles of successful businesspeople.'
    language = 'en'
    category = 'news'
    encoding = 'UTF-8'
    keep_only_tags = [
            dict(name='div', attrs={'id':['content']}),         # AGE 2014-01-18
            ]
    remove_tags = [dict(name='hr'), 
                    dict(name='a', attrs={'class':'sub_sales'}),
                    dict(name='div', attrs={'class':'fieldset'}),
                    dict(name='div', attrs={'id':'taboola_wrapper'})] # AGE 2014-01-18
    no_javascript = True
    no_stylesheets = True

    cover_url             = 'http://images.businessweek.com/mz/covers/current_120x160.jpg'

    def parse_index(self):
        #Go to the issue
        soup = self.index_to_soup('http://www.businessweek.com/magazine/news/articles/business_news.htm')

        #Find date
        mag=soup.find('h2',text='Magazine')
        dates=self.tag_to_string(mag.findNext('h3'))
        self.timefmt = u' [%s]'%dates

        #Go to the main body
        div0 = soup.find('div', attrs={'class':'column left'})
        section_title = ''
        feeds = OrderedDict()
        for div in div0.findAll('a', attrs={'class': None}):
            articles = []
            section_title = self.tag_to_string(div.findPrevious('h3')).strip()
            title=self.tag_to_string(div).strip()
            url=div['href']
            soup0 = self.index_to_soup(url)
            urlprint=soup0.find('a', attrs={'href':re.compile('.*printer.*')})
            if urlprint is not None:
                url=urlprint['href']
            articles.append({'title':title, 'url':url, 'description':'', 'date':''})

            if articles:
                if section_title not in feeds:
                    feeds[section_title] = []
                feeds[section_title] += articles
        div1 = soup.find('div', attrs={'class':'column center'})
        section_title = ''
        for div in div1.findAll('a'):
            articles = []
            desc=self.tag_to_string(div.findNext('p')).strip()
            section_title = self.tag_to_string(div.findPrevious('h3')).strip()
            title=self.tag_to_string(div).strip()
            url=div['href']
            soup0 = self.index_to_soup(url)
            urlprint=soup0.find('a', attrs={'href':re.compile('.*printer.*')})
            if urlprint is not None:
                url=urlprint['href']
            articles.append({'title':title, 'url':url, 'description':desc, 'date':''})
            if articles:
                if section_title not in feeds:
                    feeds[section_title] = []
                feeds[section_title] += articles

        ans = [(key, val) for key, val in feeds.iteritems()]
        return ans
Attached Files
File Type: zip BusinessWeekMagazine_AGE.zip (1.1 KB, 164 views)
Divingduck is offline   Reply With Quote
Advert
Old 01-18-2014, 04:21 PM   #3
blackberry4
Junior Member
blackberry4 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2014
Device: Kindle2
Thank you very much! its working perfectly now
blackberry4 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Business Week Magazine rainrdx Recipes 15 09-10-2013 12:57 AM
Business Week Magazine Recipe Broken agopalak Recipes 1 09-06-2013 06:47 AM
Business Week Magazine error garyzeb55 Recipes 1 04-26-2013 07:52 PM
Business Week Magazine error garyzeb55 Recipes 1 04-05-2013 09:40 PM
Business Week problem garyzeb55 Recipes 2 03-26-2013 10:02 AM


All times are GMT -4. The time now is 02:42 AM.


MobileRead.com is a privately owned, operated and funded community.