Recipe works when mocked up as Python file, fails when converted to Recipe

ode · 11-05-2010, 01:29 AM

Code:

import urllib2
from BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe

class Counterpunch(BasicNewsRecipe):
    '''
    Parses counterpunch.com for articles
    '''  
    def parse_index(self):
		feeds = []
		title, url = 'Counterpunch', 'http://www.counterpunch.com'
		articles = self.parse_page(url)
		if articles:
			feeds.append((title, articles))
		return feeds
			
			
    def parse_page(self, url):
        fd = urllib2.urlopen(url)
        soup = BeautifulSoup(fd, fromEncoding='iso-8859-1') 
        articles = []
        current_date = ''
        #Gets all dates and entries in the correctly dispersed way e.g. date, list of articles for date, next date, next list of articles
        #first expression gets entries, second gets dates
        dates_and_articles = soup.findAll(lambda tag: (tag.name == 'p' and
                                          tag.attrs == [(u'class', u'style2')] and
                                          len(tag) == 4 and
                                          'Website of the' not in tag.decode('utf-8')) or
                                          (tag.name == 'font' and
                                          tag.attrs == [(u'color', u'#990000'), (u'size', u'-1')]))
        for tag in dates_and_articles:
            #if 'Today\'s\n Stories' in tag.contents:
            if tag.name == 'p':
                #logic to deal with different ways names are printed (color difference I belive)
                if tag.find('span', {'class': 'style1'}):
                    author = tag.contents[0].contents[0] + ': '
                    url = 'http://www.counterpunch.com/' + tag.contents[3].attrs[0][1]
                else:
                    author = tag.contents[0] + ': '
                    url = 'http://www.counterpunch.com/' + tag.contents[3].attrs[0][1]
                title = author + str(tag.contents[3].contents[0])
                articles.append({'title': title, 'url': url, 'description':'', 'date': current_date})
            #if new date, update current_date
            elif tag.name == 'font':
                current_date = tag.contents[0]
                #print('the date is {0}').format(current_date)
        #cut just one days articles for clearer, quicker debugging
        articles = [a for a in articles if a['date'] == 'October 11, 2010']
        return articles
            
#for debugging on the cmd             
#c = Counterpunch()
#print c.parse_index()

This is the first recipe I have written.
It is for a site that has no rss. The articles are in a table at the side of the page separated by date headings.
I mocked it up as a .py file first. I got it to a workable state where it will spit out a list of feeds on the commandline.
I then made the few small changes to it to make it into a recipe and test with 'ebook-convert counterpunch.recipe test --test -vv' but I get the below traceback:

Code:

1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
Traceback (most recent call last):
  File "/tmp/init.py", line 48, in <module>
  File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/cli.py", line 254, in main
  File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 836, in run
  File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 216, in __call__
  File "/home/kovid/build/calibre/src/calibre/web/feeds/input.py", line 105, in convert
  File "/home/kovid/build/calibre/src/calibre/web/feeds/news.py", line 712, in download
  File "/home/kovid/build/calibre/src/calibre/web/feeds/news.py", line 837, in build_index
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 15, in parse_index
    articles = self.parse_page(url)
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 28, in parse_page
    dates_and_articles = soup.findAll(lambda tag: (tag.name == 'p' and
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 768, in findAll
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 332, in _findAll
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 890, in search
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 849, in searchTag
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 907, in _matches
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 31, in <lambda>
    'Website of the' not in tag.decode('utf-8')) or
TypeError: 'NoneType' object is not callable

I assumed it has something to do with the decode method. I have played with this for hours and sometimes have changed it to make this traceback different but still get no feeds when the same code, but when called directly on the cmdline it will give me the feeds I need with no problem.

Can anyone get it to run to grab the feeds for calibre?

Thanks

Starson17 · 11-05-2010, 09:55 PM

Quote:

Originally Posted by ode

Can anyone get it to run to grab the feeds for calibre?

I tested briefly on another machine, and got your feed parsed correctly. The articles weren't pulling, and I didn't debug why, but you were parsing the articles and building the feed from your source page just fine.

The recipe didn't finish, and I'm not sure if all you articles were parsed correctly, but most were. I started to play with it, added a postprocess_html for debugging, cleaned up some comments, added some print statements and the recipe finished, (empty articles) but that's as far as I went.

I know it's not much, but I thought you might want to know you weren't ignored.

bcaulf · 12-21-2010, 05:15 PM

Counterpunch is a good web publication and as a calibre user I would appreciate it if its recipe gets debugged and put into the software distribution.

aritza · 07-28-2011, 06:40 PM

It's been a year and a half since the original post. Does anyone know about any developments? I really would like to get a hold of a working recipe for CounterPunch. Thanks.

ode · 07-29-2011, 03:21 PM

I rewrote it and got it working.

I have contributed it to Calibre. It will be included from the version released today (0.8.12).

If you don't want to update you can use the file attached to this post.

Enjoy!

aritza · 07-31-2011, 07:01 PM

Thank you so much. So far so good! I love it!

aritza · 08-05-2011, 10:00 AM

There seems to be a limit of 10 entries per day. Actually some days there are less than ten and some days there are more than 10. So how does that work? Is there a way to make sure that no entries are repeated and that all entries eventually get pulled off? I'm new to this, so I am not sure how it works. Thanks.

ode · 09-04-2011, 04:57 AM

Counterpunch have redesigned their site and now have an RSS feed, making things easier for the recipe.
I have rewritten and submitted it to Calibre. It will be in the next version, which should be released next Friday (9 Sept).
You can use the version I attached to this post if you want in the meantime.

@aritza The new recipe has a limit of 7 days/100 posts but since it works by RSS now it is really limited by the number of posts in the feed (25 at this time.)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
To MOBI, Chapter detection fails? Works for EPUB	Fmstrat	Calibre	7	08-29-2010 05:37 PM
Help a beginner:Python/Recipe Unicode and ASCII	Starson17	Calibre	2	02-15-2010 11:10 AM
NY Times Recipe in Calibre 6.36 Fails	keyrunner	Calibre	1	01-28-2010 11:56 AM
Is it possible to specify output format in recipe file	madcow_x2	Calibre	3	01-07-2010 04:10 PM
Recipe works from 1 machine, not from another	BarryTX	Calibre	12	07-18-2009 12:31 AM

12-21-2010, 05:15 PM	#3
bcaulf Junior Member Posts: 4 Karma: 10 Join Date: Dec 2010 Device: Kindle 3	Counterpunch is a good web publication and as a calibre user I would appreciate it if its recipe gets debugged and put into the software distribution.

07-28-2011, 06:40 PM	#4
aritza Member Posts: 19 Karma: 10 Join Date: Jul 2010 Device: Calibre	It's been a year and a half since the original post. Does anyone know about any developments? I really would like to get a hold of a working recipe for CounterPunch. Thanks.

07-31-2011, 07:01 PM	#6
aritza Member Posts: 19 Karma: 10 Join Date: Jul 2010 Device: Calibre	Thank you so much. So far so good! I love it!

08-05-2011, 10:00 AM	#7
aritza Member Posts: 19 Karma: 10 Join Date: Jul 2010 Device: Calibre	There seems to be a limit of 10 entries per day. Actually some days there are less than ten and some days there are more than 10. So how does that work? Is there a way to make sure that no entries are repeated and that all entries eventually get pulled off? I'm new to this, so I am not sure how it works. Thanks.

Advert

Advert