Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-13-2011, 09:06 PM   #1
wilderf3353
Junior Member
wilderf3353 will become famous soon enoughwilderf3353 will become famous soon enoughwilderf3353 will become famous soon enoughwilderf3353 will become famous soon enoughwilderf3353 will become famous soon enoughwilderf3353 will become famous soon enoughwilderf3353 will become famous soon enough
 
Posts: 4
Karma: 748
Join Date: Jan 2011
Device: Kindle 3
Having trouble getting complete article for Reading Eagle

I apologize in advance if this has been discussed -- I couldn't find it.

Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml

I only get the first few lines of each article.

Here is my recipe:

Code:
class AdvancedUserRecipe1297542834(BasicNewsRecipe):
    title          = u'Reading Eagle'
    use_embedded_content = True
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript   = True
    no_stylesheets      = True
    remove_empty_feeds  = True

    feeds          = [
			(u'local news', u'http://readingeagle.com/feeds/all/newsrss.xml'), 
		     ]
Can someone either point me to an example that will help me solve this problem or help me fix the above recipe?
wilderf3353 is offline   Reply With Quote
Old 11-25-2011, 09:36 AM   #2
davidnye
Member
davidnye began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Aug 2011
Device: Nook
Re: Trouble returning whole article

Quote:
Originally Posted by wilderf3353 View Post
I apologize in advance if this has been discussed -- I couldn't find it.

Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml

I only get the first few lines of each article...
I'm having the same problem with getting The Progressive magazine at http://feeds.feedburner.com/progressivefeed. The problem is that the feed returns the first few lines which are followed by "Read More" with the url for the whole article. I'm guessing this must not be an uncommon problem. I've tried setting use_embedded_content = True and using

def print_version(self, url):
return self.browser.open_novisit(url).geturl()

to no avail. How can I get my recipe to follow that Read More url? Is there a builtin recipe for another site that has the same problem that I could crib from? Maddeningly, there is a print version which downloads fine, but the url cannot be derived from the one for the non-print version because it uses a number unrelated to the original article title.
davidnye is offline   Reply With Quote
Advert
Old 11-25-2011, 09:53 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,966
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You want

use_embedded_content = False

not True
kovidgoyal is offline   Reply With Quote
Old 11-25-2011, 08:36 PM   #4
davidnye
Member
davidnye began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by kovidgoyal View Post
You want

use_embedded_content = False

not True
Thanks much for your reply. Unfortunately, that just makes the TOC disappear, so now all I get is 'Start' and no content. Here is my recipe:

Code:
class AdvancedUserRecipe1322154189(BasicNewsRecipe):
    title = u'the Progressive'
    masthead_url = 'http://progressive.org/sites/all/themes/progress/logo.png'
    oldest_article = 7

    feeds = [u'http://feeds.feedburner.com/progressivefeed']

    def get_cover_url(self):
        soup = self.index_to_soup('http://progressive.org')
        item = soup.find('div',attrs={'class':'views-field-field-cover-fid'})
        if item:           
           return item.img['src']
        return None
If I enter the rss url into my browser, I get a list of articles with short teasers followed by 'read more', same as the .epub my recipe produces. If I enter the article url (the same one listed in the epub by the article stub), I'm directed to a web page with the whole article. Any thoughts and assistance much appreciated.

David
davidnye is offline   Reply With Quote
Old 11-25-2011, 09:12 PM   #5
davidnye
Member
davidnye began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by wilderf3353 View Post
I apologize in advance if this has been discussed -- I couldn't find it.

Here is the RSS Feed: http://readingeagle.com/feeds/all/newsrss.xml

I only get the first few lines of each article.
While I was trying to make some headway on my problem I was able to fix yours using print_version. Try:

Code:
class AdvancedUserRecipe1297542834(BasicNewsRecipe):
    title          = u'Reading Eagle'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_empty_feeds  = True
    auto_cleanup = True

    feeds          = [
			(u'local news', u'http://readingeagle.com/feeds/all/newsrss.xml'), 
		     ]

    def print_version(self,url):
        return url + '#'
David
davidnye is offline   Reply With Quote
Advert
Old 11-25-2011, 10:14 PM   #6
davidnye
Member
davidnye began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Aug 2011
Device: Nook
Fixed my recipe for The Progressive using code from the recipe for Alternet! Here it is, for anyone else who wants it (it doesn't get you the whole magazine, just a few articles and some web-only content):

Code:
from calibre.ptempfile import PersistentTemporaryFile

class AdvancedUserRecipe1322154189(BasicNewsRecipe):
    title = u'the Progressive'
    masthead_url = 'http://progressive.org/sites/all/themes/progress/logo.png'
    oldest_article = 7
    articles_are_obfuscated = True
    use_embedded_content = False
    auto_cleanup = True

    temp_files= []

    feeds = [u'http://feeds.feedburner.com/progressivefeed']

    def get_article_url(self, article):
       return article.get('link',  None)

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        br.open(url)
        response = br.follow_link(url_regex = r'/print/[0-9]+', nr = 0)
        html = response.read()
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name

    def get_cover_url(self):
        soup = self.index_to_soup('http://progressive.org')
        item = soup.find('div',attrs={'class':'views-field-field-cover-fid'})
        if item:           
           return item.img['src']
        return None
davidnye is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Article criticizes speed reading GA Russell General Discussions 18 01-17-2011 02:41 PM
trouble reading a converted pdf to lrf with unpdf tuvoc Calibre 1 06-20-2009 01:28 PM
Opinions of reading The Stand 'The Complete & Uncut Version' snipenekkid Reading Recommendations 39 06-17-2009 09:02 PM
'El Pais' article (in Spanish) on cyber-reading Patricia News 1 03-23-2008 07:04 AM
NY Times article about e-books and reading business SpiderMatt News 5 02-16-2008 09:55 PM


All times are GMT -4. The time now is 10:07 PM.


MobileRead.com is a privately owned, operated and funded community.