Having trouble getting complete article for Reading Eagle

wilderf3353 · 02-13-2011, 09:06 PM

I apologize in advance if this has been discussed -- I couldn't find it.

Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml

I only get the first few lines of each article.

Here is my recipe:

Code:

class AdvancedUserRecipe1297542834(BasicNewsRecipe):
    title          = u'Reading Eagle'
    use_embedded_content = True
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript   = True
    no_stylesheets      = True
    remove_empty_feeds  = True

    feeds          = [
			(u'local news', u'http://readingeagle.com/feeds/all/newsrss.xml'), 
		     ]

Can someone either point me to an example that will help me solve this problem or help me fix the above recipe?

davidnye · 11-25-2011, 09:36 AM

Quote:

Originally Posted by wilderf3353

I apologize in advance if this has been discussed -- I couldn't find it.

Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml

I only get the first few lines of each article...

I'm having the same problem with getting The Progressive magazine at http://feeds.feedburner.com/progressivefeed. The problem is that the feed returns the first few lines which are followed by "Read More" with the url for the whole article. I'm guessing this must not be an uncommon problem. I've tried setting use_embedded_content = True and using

def print_version(self, url):
return self.browser.open_novisit(url).geturl()

to no avail. How can I get my recipe to follow that Read More url? Is there a builtin recipe for another site that has the same problem that I could crib from? Maddeningly, there is a print version which downloads fine, but the url cannot be derived from the one for the non-print version because it uses a number unrelated to the original article title.

kovidgoyal · 11-25-2011, 09:53 AM

You want

use_embedded_content = False

not True

davidnye · 11-25-2011, 08:36 PM

Quote:

Originally Posted by kovidgoyal

You want

use_embedded_content = False

not True

Thanks much for your reply. Unfortunately, that just makes the TOC disappear, so now all I get is 'Start' and no content. Here is my recipe:

Code:

class AdvancedUserRecipe1322154189(BasicNewsRecipe):
    title = u'the Progressive'
    masthead_url = 'http://progressive.org/sites/all/themes/progress/logo.png'
    oldest_article = 7

    feeds = [u'http://feeds.feedburner.com/progressivefeed']

    def get_cover_url(self):
        soup = self.index_to_soup('http://progressive.org')
        item = soup.find('div',attrs={'class':'views-field-field-cover-fid'})
        if item:           
           return item.img['src']
        return None

If I enter the rss url into my browser, I get a list of articles with short teasers followed by 'read more', same as the .epub my recipe produces. If I enter the article url (the same one listed in the epub by the article stub), I'm directed to a web page with the whole article. Any thoughts and assistance much appreciated.

David

davidnye · 11-25-2011, 09:12 PM

Quote:

Originally Posted by wilderf3353

I apologize in advance if this has been discussed -- I couldn't find it.

Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml

I only get the first few lines of each article.

While I was trying to make some headway on my problem I was able to fix yours using print_version. Try:

Code:

class AdvancedUserRecipe1297542834(BasicNewsRecipe):
    title          = u'Reading Eagle'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_empty_feeds  = True
    auto_cleanup = True

    feeds          = [
			(u'local news', u'http://readingeagle.com/feeds/all/newsrss.xml'), 
		     ]

    def print_version(self,url):
        return url + '#'

David

davidnye · 11-25-2011, 10:14 PM

Fixed my recipe for The Progressive using code from the recipe for Alternet! Here it is, for anyone else who wants it (it doesn't get you the whole magazine, just a few articles and some web-only content):

Code:

from calibre.ptempfile import PersistentTemporaryFile

class AdvancedUserRecipe1322154189(BasicNewsRecipe):
    title = u'the Progressive'
    masthead_url = 'http://progressive.org/sites/all/themes/progress/logo.png'
    oldest_article = 7
    articles_are_obfuscated = True
    use_embedded_content = False
    auto_cleanup = True

    temp_files= []

    feeds = [u'http://feeds.feedburner.com/progressivefeed']

    def get_article_url(self, article):
       return article.get('link',  None)

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        br.open(url)
        response = br.follow_link(url_regex = r'/print/[0-9]+', nr = 0)
        html = response.read()
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name

    def get_cover_url(self):
        soup = self.index_to_soup('http://progressive.org')
        item = soup.find('div',attrs={'class':'views-field-field-cover-fid'})
        if item:           
           return item.img['src']
        return None

02-13-2011, 09:06 PM	#1
wilderf3353 Junior Member Posts: 4 Karma: 748 Join Date: Jan 2011 Device: Kindle 3	Having trouble getting complete article for Reading Eagle I apologize in advance if this has been discussed -- I couldn't find it. Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml I only get the first few lines of each article. Here is my recipe: Code: class AdvancedUserRecipe1297542834(BasicNewsRecipe): title = u'Reading Eagle' use_embedded_content = True oldest_article = 7 max_articles_per_feed = 100 remove_javascript = True no_stylesheets = True remove_empty_feeds = True feeds = [ (u'local news', u'http://readingeagle.com/feeds/all/newsrss.xml'), ] Can someone either point me to an example that will help me solve this problem or help me fix the above recipe?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Article criticizes speed reading	GA Russell	General Discussions	18	01-17-2011 02:41 PM
trouble reading a converted pdf to lrf with unpdf	tuvoc	Calibre	1	06-20-2009 01:28 PM
Opinions of reading The Stand 'The Complete & Uncut Version'	snipenekkid	Reading Recommendations	39	06-17-2009 09:02 PM
'El Pais' article (in Spanish) on cyber-reading	Patricia	News	1	03-23-2008 07:04 AM
NY Times article about e-books and reading business	SpiderMatt	News	5	02-16-2008 09:55 PM

11-25-2011, 09:53 AM	#3
kovidgoyal creator of calibre Posts: 45,339 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You want use_embedded_content = False not True

Advert

Advert