Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-20-2010, 07:52 AM   #2776
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
not an expert, but...

reddit.com's rss is a list of links to all sorts of different data (articles, movies, music and so on). i dont think much can be done for this.
i may be wrong ( i wrote my 1st recipe a week ago. and i am still writing).
marbs is offline  
Old 09-20-2010, 10:22 AM   #2777
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
new recipe TheMarker to be builtin

this is good enough to get in to calibre.
and thanks to TonyTheBookworm. I couldn't do this with out you.
Now its time for my next recipe.

Spoiler:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1283848012(BasicNewsRecipe):
    description   = 'TheMarker Financial News in Hebrew'
    __author__            = 'TonyTheBookworm, Marbs'
    cover_url      = 'http://static.ispot.co.il/wp-content/upload/2009/09/themarker.jpg'
    title          = u'TheMarker'
    language              = _('Hebrew')
    simultaneous_downloads = 5
    remove_javascript     = True
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 1
    remove_tags = [dict(name='tr', attrs={'bgcolor':['#738A94']})          ]
    max_articles_per_feed = 10
    extra_css='body{direction: rtl;} .article_description{direction: rtl; } a.article{direction: rtl; } .calibre_feed_description{direction: rtl; }'
    feeds          = [(u'Head Lines', u'http://www.themarker.com/tmc/content/xml/rss/hpfeed.xml'), 
                      (u'TA Market', u'http://www.themarker.com/tmc/content/xml/rss/sections/marketfeed.xml'),
                      (u'Real Estate', u'http://www.themarker.com/tmc/content/xml/rss/sections/realEstaterfeed.xml'),
                      (u'Wall Street & Global', u'http://www.themarker.com/tmc/content/xml/rss/sections/wallsfeed.xml'), 
                      (u'Law', u'http://www.themarker.com/tmc/content/xml/rss/sections/lawfeed.xml'), 
                      (u'Media', u'http://www.themarker.com/tmc/content/xml/rss/sections/mediafeed.xml'), 
                      (u'Consumer', u'http://www.themarker.com/tmc/content/xml/rss/sections/consumerfeed.xml'), 
                      (u'Career', u'http://www.themarker.com/tmc/content/xml/rss/sections/careerfeed.xml'), 
                      (u'Car', u'http://www.themarker.com/tmc/content/xml/rss/sections/carfeed.xml'), 
                      (u'High Tech', u'http://www.themarker.com/tmc/content/xml/rss/sections/hightechfeed.xml'), 
                      (u'Investor Guide', u'http://www.themarker.com/tmc/content/xml/rss/sections/investorGuidefeed.xml')]

    def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("=")
        print 'THE SPLIT IS: ', split1 
        weblinks = url
      
        if weblinks is not None:
            for link in weblinks:
                re1='.*?'	
                re2='(it\\.themarker\\.com)'	# Fully Qualified Domain Name 1
                rg = re.compile(re1+re2,re.IGNORECASE|re.DOTALL)
                m = rg.search(url)
                
                
                if m:
                 split2 = url.split("article/")
                 print 'FOUND IT.THEMARKER.COM: ', url
                 print_url = 'http://it.themarker.com/tmit/PrintArticle/' + split2[1]
                
                else:
                    print_url = 'http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F' + split1[1]+'.xml'
                 
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

Last edited by marbs; 09-20-2010 at 11:33 AM.
marbs is offline  
Advert
Old 09-20-2010, 02:52 PM   #2778
Flexicat
Junior Member
Flexicat began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2010
Device: Kobo
Hello. Has something changed in the GoComics site that the built-in recipe no longer works? I have not been able to get this recipe to work in a while. The error generated is this;
Spoiler:
Code:
ERROR: Conversion Error: <b>Failed</b>: Fetch news from GoComics

Fetch news from GoComics
Resolved conversion options
calibre version: 0.7.19
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'disable_font_rescaling': False,
 'dont_download_recipe': False,
 'dont_split_on_page_breaks': True,
 'extra_css': None,
 'extract_to': None,
 'flow_size': 260,
 'font_size_mapping': None,
 'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)',
 'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)',
 'html_unwrap_factor': 0.40000000000000002,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x66ad470>,
 'insert_blank_line': False,
 'insert_metadata': False,
 'isbn': None,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'max_toc_links': 50,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KoboReaderOutput object at 0x66ad790>,
 'page_breaks_before': None,
 'password': None,
 'prefer_metadata_cover': False,
 'preprocess_html': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_first_image': False,
 'remove_footer': False,
 'remove_header': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'tags': None,
 'test': False,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'use_auto_toc': False,
 'username': None,
 'verbose': 2}
Python function terminated unexpectedly: unsupported operand type(s) for +: 'NoneType' and 'str'
InputFormatPlugin: Recipe Input running
Traceback (most recent call last):
  File "/Users/ME/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main
    return run_entry_point()
  File "/Users/ME/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point
    return getattr(pmod, func)()
  File "site-packages/calibre/utils/ipc/worker.py", line 99, in main
  File "site-packages/calibre/gui2/convert/gui_conversion.py", line 24, in gui_convert
  File "site-packages/calibre/ebooks/conversion/plumber.py", line 832, in run
  File "site-packages/calibre/customize/conversion.py", line 211, in __call__
  File "site-packages/calibre/web/feeds/input.py", line 105, in convert
  File "site-packages/calibre/web/feeds/news.py", line 709, in download
  File "site-packages/calibre/web/feeds/news.py", line 834, in build_index
  File "/var/folders/LQ/LQJ79mVxFPGbnRkysSb0GZr3GGw/-Tmp-/calibre_0.7.19_tmp_8ZSKZQ/calibre_0.7.19_9loKvj_recipes/recipe0.py", line 121, in parse_index
    articles = self.make_links(url)
  File "/var/folders/LQ/LQJ79mVxFPGbnRkysSb0GZr3GGw/-Tmp-/calibre_0.7.19_tmp_8ZSKZQ/calibre_0.7.19_9loKvj_recipes/recipe0.py", line 141, in make_links
    title = strip_title + ' - ' + date_title
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

I run with the supplied preferences in Calibre, so I am hoping that I haven't caused this issue myself.
I have really enjoyed this recipe and how easily it works with Calibre.
Thanks.
Flexicat is offline  
Old 09-20-2010, 04:50 PM   #2779
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Flexicat View Post
Hello. Has something changed in the GoComics site that the built-in recipe no longer works? I have not been able to get this recipe to work in a while. The error generated is this; [SPOILER]I run with the supplied preferences in Calibre, so I am hoping that I haven't caused this issue myself.
I have really enjoyed this recipe and how easily it works with Calibre.
Thanks.
I'll take a look at it. IIRC, I spotted this error a while back and wrote some code to bypass it, but didn't see anyone complaining and never got around to uploading it. I'll hunt it up and post it. It's not you.

Edit: I checked and apparently, I did upload the revised recipe. I tested the current built in and it works fine. Are you perhaps using an earlier version that I uploaded here? If you are, switch to the built in that is now supplied with Calibre. The error you are getting looks like the error from the earlier version.

Last edited by Starson17; 09-20-2010 at 07:03 PM.
Starson17 is offline  
Old 09-20-2010, 08:30 PM   #2780
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
there is already a recipe for foreign policy but it covers rss feeds can anyone make the recipe for print edition
http://www.foreignpolicy.com/issues/current
thanks in advance
bhandarisaurabh is offline  
Advert
Old 09-20-2010, 08:32 PM   #2781
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by TonytheBookworm View Post
something like this would be my first guess: if it doesn't work I'll have to test it later.
Spoiler:

Code:
def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1 
        id = len(split1)
        # we want to find the size of the array split 
        # because we know the id will always be in the last index
        
        print_url = ‘http://www.business-standard.com/india/printpage.php?autono=’ + split1[id]+ ‘&tp=’
      return print_url
it is giving some type of indentation error
bhandarisaurabh is offline  
Old 09-21-2010, 12:16 AM   #2782
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by bhandarisaurabh View Post
it is giving some type of indentation error
If your actually trying to modify the built in recipe. I do not see why. I testing it on my end and do not see after running it where any of the articles were not in print version. Also, I ran a test with print statements included and I do not see anywhere where the original url is what you stated of being changed. It appears to follow the flow that the original author of the recipe expected and looked for. In other words, kinda hard to fix something that isn't broken. <shrug>

As far as the indents you have to make sure they are spaced out correctly.
Spoiler:

Code:
def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1 
        id = len(split1)
        # we want to find the size of the array split 
        # because we know the id will always be in the last index
        
        print_url = ‘http://www.business-standard.com/india/printpage.php?autono=’ + split1[id]+ ‘&tp=’
        return print_url

****notice the return statement is directly under the print_url statement.

Last edited by TonytheBookworm; 09-21-2010 at 12:18 AM.
TonytheBookworm is offline  
Old 09-21-2010, 01:29 AM   #2783
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
I know calibre appears to get the title for the news feed from the recipe section of title =
I'm curious to know is there a way to make the title where it puts the date next to the title?
For example
something like this
Spoiler:

Code:
import datetime

current_time = datetime.datetime.now()

title = 'AJC Breaking news as of: ' + now.strftime("%Y-%m-%d %H:%M")
...........


thanks
TonytheBookworm is offline  
Old 09-21-2010, 03:23 AM   #2784
noah
Junior Member
noah began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2010
Device: Kindle
Custom recipe help/request: The Bay Citizen

I've been trying to get a custom recipe for The Bay Citizen to work. I've successfully created custom recipes for other sites, but this one is giving me problems.

I'm using this feed: http://www.baycitizen.org/feeds/stories/

I'm in "Basic Mode" and leaving all other settings at their defaults.

When I attempt to fetch this recipe, Calibre tells me "Failed: Fetch news from The Bay Citizen". The error output is too long to paste here, so I'm attaching it. I'm sorry, but I don't understand what's going on.
Attached Files
File Type: txt baycitizen error.txt (1,010.9 KB, 651 views)
noah is offline  
Old 09-21-2010, 01:32 PM   #2785
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by noah View Post
I've been trying to get a custom recipe for The Bay Citizen to work. I've successfully created custom recipes for other sites, but this one is giving me problems.

I'm using this feed: http://www.baycitizen.org/feeds/stories/

I'm in "Basic Mode" and leaving all other settings at their defaults.

When I attempt to fetch this recipe, Calibre tells me "Failed: Fetch news from The Bay Citizen". The error output is too long to paste here, so I'm attaching it. I'm sorry, but I don't understand what's going on.
Post your recipe using spoiler tags and code tags and i will try and help.
Here you go... Please read the #'s (comments) in the code so you can get an understanding of what you needed to do.
Spoiler:

Code:
# this block is pretty much standard on all recipes
#----------------------------------------------------------------------------------------------------------
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'BayCitizen'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'BayCitizen'
    publisher = 'TonytheBookworm'
    category = 'news'
    oldest_article = 1 # USE THIS TO DETERMINE HOW FAR BACK YOU WANNA GO IN THE FEED DATE WISE
    max_articles_per_feed = 10 # USE TO DETERMINE HOW MANY ARTICLES YOU WISH TO READ PER FEED
    no_stylesheets = True # TURNS OFF JAVASCRIPT
      
    masthead_url = 'http://media.baycitizen.org/images/layout/logo1.png' #PUTS NICE LOGO ON KINDLE
#---------------------------------------------------------------------------------------------------------    
    
    #here we tell the recipe what feed(s) we wish to obtain
    #-----------------------------------------------------------------------------------------
    feeds          = [
                      ('Main Feed', 'http://www.baycitizen.org/feeds/stories/'),
                      
                    ]
    #------------------------------------------------------------------------------------------

    
    #since our articles have print version we want to use them to make it all nice and clean without 
    #all the junk. So we look and notice
    #orginial url: is for example: http://www.baycitizen.org/transportation/story/bart-board-challengers-hope-change/
    #print    url: is for example: http://www.baycitizen.org/transportation/story/bart-board-challengers-hope-change/print
    #
    #so what we need to do in this case is simple add /print to the end of the url like you see below
       
    def print_version(self, url):
        return url +'/print'

Last edited by TonytheBookworm; 09-21-2010 at 01:52 PM. Reason: posted code
TonytheBookworm is offline  
Old 09-21-2010, 02:11 PM   #2786
krunk
Member
krunk began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Los Angeles, CA
Device: Kindle 3
I'm getting the following error in a custom recipe:

Code:
1% Fetching feed Skeptic Blog...
Failed feed: Skeptic Blog
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1274, in parse_feeds
  File "site-packages/mechanize/_mechanize.py", line 209, in open
  File "site-packages/mechanize/_mechanize.py", line 261, in _mech_open
httperror_seek_wrapper: HTTP Error 403: Bad Behavior
I overloaded the get_browser() method with my own.

Code:
def get_browser(self):
    br = BasicNewsRecipe.get_browser()
    br.open(my_url)
    return br
From this I was able to determine it was throwing the 403 Bad Behavior error on mechanize.Browser().open().

This doesn't occur with a vanilla mechanize browser. Right now I'm doing:

Code:
def get_browser(self):
    br = mechanize.Browser()
    br.open(my_url)
    return br
This throws no errors and seems to parse well. My question is what in the BasicNewsRecipe's browser could be causing the above error? Is there a more correct or better way of resolving the problem?
krunk is offline  
Old 09-21-2010, 02:26 PM   #2787
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by krunk View Post
I'm getting the following error in a custom recipe:
HTTP Error 403: Bad Behavior
Spoiler:

Code:
1% Fetching feed Skeptic Blog...
Failed feed: Skeptic Blog
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1274, in parse_feeds
  File "site-packages/mechanize/_mechanize.py", line 209, in open
  File "site-packages/mechanize/_mechanize.py", line 261, in _mech_open
httperror_seek_wrapper: HTTP Error 403: Bad Behavior
I overloaded the get_browser() method with my own.

Code:
def get_browser(self):
    br = BasicNewsRecipe.get_browser()
    br.open(my_url)
    return br
From this I was able to determine it was throwing the 403 Bad Behavior error on mechanize.Browser().open().

This doesn't occur with a vanilla mechanize browser. Right now I'm doing:

Code:
def get_browser(self):
    br = mechanize.Browser()
    br.open(my_url)
    return br
This throws no errors and seems to parse well. My question is what in the BasicNewsRecipe's browser could be causing the above error? Is there a more correct or better way of resolving the problem?
It could be several things, depending on how the Bad Behavior module is set. I was able to narrow it down to a single header and pass the test for one site by sending a simple Accept: header to avoid being seen as a spambot. You can try what I did:
https://www.mobileread.com/forums/sho...postcount=2399
Or you can use TamperData to track down exactly why FireFox passes the test and Calibre does not.

Last edited by Starson17; 09-21-2010 at 02:28 PM.
Starson17 is offline  
Old 09-21-2010, 02:52 PM   #2788
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,302
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@krunk: the exact code setting up browser is in calibre/__init__.py you can copy paste it and comment out things one by one to see what's causing the problem.
kovidgoyal is offline  
Old 09-21-2010, 04:18 PM   #2789
krunk
Member
krunk began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Los Angeles, CA
Device: Kindle 3
@Starson17

Many thanks, that's a far more thorough recipe than I was working on.

@Kovid: Thank you, I'll poke around in there.

I'm curious why there's not a dedicated Calibre Recipe's forum? It seems worthy topic.
krunk is offline  
Old 09-21-2010, 04:34 PM   #2790
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,302
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I am not an MR moderator, so it's not really upto me. You should start a thread in the feedback forum requesting it.
kovidgoyal is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 05:27 PM.


MobileRead.com is a privately owned, operated and funded community.