Custom recipes (archive, read-only) - Page 186

marbs · 09-20-2010, 07:52 AM

reddit.com's rss is a list of links to all sorts of different data (articles, movies, music and so on). i dont think much can be done for this.
i may be wrong ( i wrote my 1st recipe a week ago. and i am still writing).

marbs · 09-20-2010, 10:22 AM

this is good enough to get in to calibre.
and thanks to TonyTheBookworm. I couldn't do this with out you.
Now its time for my next recipe.

Spoiler:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1283848012(BasicNewsRecipe):
    description   = 'TheMarker Financial News in Hebrew'
    __author__            = 'TonyTheBookworm, Marbs'
    cover_url      = 'http://static.ispot.co.il/wp-content/upload/2009/09/themarker.jpg'
    title          = u'TheMarker'
    language              = _('Hebrew')
    simultaneous_downloads = 5
    remove_javascript     = True
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 1
    remove_tags = [dict(name='tr', attrs={'bgcolor':['#738A94']})          ]
    max_articles_per_feed = 10
    extra_css='body{direction: rtl;} .article_description{direction: rtl; } a.article{direction: rtl; } .calibre_feed_description{direction: rtl; }'
    feeds          = [(u'Head Lines', u'http://www.themarker.com/tmc/content/xml/rss/hpfeed.xml'), 
                      (u'TA Market', u'http://www.themarker.com/tmc/content/xml/rss/sections/marketfeed.xml'),
                      (u'Real Estate', u'http://www.themarker.com/tmc/content/xml/rss/sections/realEstaterfeed.xml'),
                      (u'Wall Street & Global', u'http://www.themarker.com/tmc/content/xml/rss/sections/wallsfeed.xml'), 
                      (u'Law', u'http://www.themarker.com/tmc/content/xml/rss/sections/lawfeed.xml'), 
                      (u'Media', u'http://www.themarker.com/tmc/content/xml/rss/sections/mediafeed.xml'), 
                      (u'Consumer', u'http://www.themarker.com/tmc/content/xml/rss/sections/consumerfeed.xml'), 
                      (u'Career', u'http://www.themarker.com/tmc/content/xml/rss/sections/careerfeed.xml'), 
                      (u'Car', u'http://www.themarker.com/tmc/content/xml/rss/sections/carfeed.xml'), 
                      (u'High Tech', u'http://www.themarker.com/tmc/content/xml/rss/sections/hightechfeed.xml'), 
                      (u'Investor Guide', u'http://www.themarker.com/tmc/content/xml/rss/sections/investorGuidefeed.xml')]

    def print_version(self, url):
        print 'ORG URL IS: ', url
        split1 = url.split("=")
        print 'THE SPLIT IS: ', split1 
        weblinks = url
      
        if weblinks is not None:
            for link in weblinks:
                re1='.*?'	
                re2='(it\\.themarker\\.com)'	# Fully Qualified Domain Name 1
                rg = re.compile(re1+re2,re.IGNORECASE|re.DOTALL)
                m = rg.search(url)
                
                
                if m:
                 split2 = url.split("article/")
                 print 'FOUND IT.THEMARKER.COM: ', url
                 print_url = 'http://it.themarker.com/tmit/PrintArticle/' + split2[1]
                
                else:
                    print_url = 'http://www.themarker.com/ibo/misc/printFriendly.jhtml?ElementId=%2Fibo%2Frepositories%2Fstories%2Fm1_2000%2F' + split1[1]+'.xml'
                 
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

Flexicat · 09-20-2010, 02:52 PM

Hello. Has something changed in the GoComics site that the built-in recipe no longer works? I have not been able to get this recipe to work in a while. The error generated is this;

Spoiler:

Code:

ERROR: Conversion Error: <b>Failed</b>: Fetch news from GoComics

Fetch news from GoComics
Resolved conversion options
calibre version: 0.7.19
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'disable_font_rescaling': False,
 'dont_download_recipe': False,
 'dont_split_on_page_breaks': True,
 'extra_css': None,
 'extract_to': None,
 'flow_size': 260,
 'font_size_mapping': None,
 'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)',
 'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)',
 'html_unwrap_factor': 0.40000000000000002,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x66ad470>,
 'insert_blank_line': False,
 'insert_metadata': False,
 'isbn': None,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'max_toc_links': 50,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KoboReaderOutput object at 0x66ad790>,
 'page_breaks_before': None,
 'password': None,
 'prefer_metadata_cover': False,
 'preprocess_html': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_first_image': False,
 'remove_footer': False,
 'remove_header': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'tags': None,
 'test': False,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'use_auto_toc': False,
 'username': None,
 'verbose': 2}
Python function terminated unexpectedly: unsupported operand type(s) for +: 'NoneType' and 'str'
InputFormatPlugin: Recipe Input running
Traceback (most recent call last):
  File "/Users/ME/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main
    return run_entry_point()
  File "/Users/ME/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point
    return getattr(pmod, func)()
  File "site-packages/calibre/utils/ipc/worker.py", line 99, in main
  File "site-packages/calibre/gui2/convert/gui_conversion.py", line 24, in gui_convert
  File "site-packages/calibre/ebooks/conversion/plumber.py", line 832, in run
  File "site-packages/calibre/customize/conversion.py", line 211, in __call__
  File "site-packages/calibre/web/feeds/input.py", line 105, in convert
  File "site-packages/calibre/web/feeds/news.py", line 709, in download
  File "site-packages/calibre/web/feeds/news.py", line 834, in build_index
  File "/var/folders/LQ/LQJ79mVxFPGbnRkysSb0GZr3GGw/-Tmp-/calibre_0.7.19_tmp_8ZSKZQ/calibre_0.7.19_9loKvj_recipes/recipe0.py", line 121, in parse_index
    articles = self.make_links(url)
  File "/var/folders/LQ/LQJ79mVxFPGbnRkysSb0GZr3GGw/-Tmp-/calibre_0.7.19_tmp_8ZSKZQ/calibre_0.7.19_9loKvj_recipes/recipe0.py", line 141, in make_links
    title = strip_title + ' - ' + date_title
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

I run with the supplied preferences in Calibre, so I am hoping that I haven't caused this issue myself.
I have really enjoyed this recipe and how easily it works with Calibre.
Thanks.

Starson17 · 09-20-2010, 04:50 PM

Quote:

Originally Posted by Flexicat

Hello. Has something changed in the GoComics site that the built-in recipe no longer works? I have not been able to get this recipe to work in a while. The error generated is this; [SPOILER]I run with the supplied preferences in Calibre, so I am hoping that I haven't caused this issue myself.
I have really enjoyed this recipe and how easily it works with Calibre.
Thanks.

I'll take a look at it. IIRC, I spotted this error a while back and wrote some code to bypass it, but didn't see anyone complaining and never got around to uploading it. I'll hunt it up and post it. It's not you.

Edit: I checked and apparently, I did upload the revised recipe. I tested the current built in and it works fine. Are you perhaps using an earlier version that I uploaded here? If you are, switch to the built in that is now supplied with Calibre. The error you are getting looks like the error from the earlier version.

bhandarisaurabh · 09-20-2010, 08:30 PM

there is already a recipe for foreign policy but it covers rss feeds can anyone make the recipe for print edition
http://www.foreignpolicy.com/issues/current
thanks in advance

bhandarisaurabh · 09-20-2010, 08:32 PM

Quote:

Originally Posted by TonytheBookworm

something like this would be my first guess: if it doesn't work I'll have to test it later.

Spoiler:

it is giving some type of indentation error

TonytheBookworm · 09-21-2010, 12:16 AM

Quote:

Originally Posted by bhandarisaurabh

it is giving some type of indentation error

If your actually trying to modify the built in recipe. I do not see why. I testing it on my end and do not see after running it where any of the articles were not in print version. Also, I ran a test with print statements included and I do not see anywhere where the original url is what you stated of being changed. It appears to follow the flow that the original author of the recipe expected and looked for. In other words, kinda hard to fix something that isn't broken. <shrug>

As far as the indents you have to make sure they are spaced out correctly.

Spoiler:

****notice the return statement is directly under the print_url statement.

TonytheBookworm · 09-21-2010, 01:29 AM

I know calibre appears to get the title for the news feed from the recipe section of title =
I'm curious to know is there a way to make the title where it puts the date next to the title?
For example
something like this

Spoiler:

thanks

noah · 09-21-2010, 03:23 AM

I've been trying to get a custom recipe for The Bay Citizen to work. I've successfully created custom recipes for other sites, but this one is giving me problems.

I'm using this feed: http://www.baycitizen.org/feeds/stories/

I'm in "Basic Mode" and leaving all other settings at their defaults.

When I attempt to fetch this recipe, Calibre tells me "Failed: Fetch news from The Bay Citizen". The error output is too long to paste here, so I'm attaching it. I'm sorry, but I don't understand what's going on.

TonytheBookworm · 09-21-2010, 01:32 PM

Quote:

Originally Posted by noah

I've been trying to get a custom recipe for The Bay Citizen to work. I've successfully created custom recipes for other sites, but this one is giving me problems.

I'm using this feed: http://www.baycitizen.org/feeds/stories/

I'm in "Basic Mode" and leaving all other settings at their defaults.

When I attempt to fetch this recipe, Calibre tells me "Failed: Fetch news from The Bay Citizen". The error output is too long to paste here, so I'm attaching it. I'm sorry, but I don't understand what's going on.

Post your recipe using spoiler tags and code tags and i will try and help.
Here you go... Please read the #'s (comments) in the code so you can get an understanding of what you needed to do.

Spoiler:

krunk · 09-21-2010, 02:11 PM

I'm getting the following error in a custom recipe:

Code:

1% Fetching feed Skeptic Blog...
Failed feed: Skeptic Blog
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1274, in parse_feeds
  File "site-packages/mechanize/_mechanize.py", line 209, in open
  File "site-packages/mechanize/_mechanize.py", line 261, in _mech_open
httperror_seek_wrapper: HTTP Error 403: Bad Behavior

I overloaded the get_browser() method with my own.

Code:

def get_browser(self):
    br = BasicNewsRecipe.get_browser()
    br.open(my_url)
    return br

From this I was able to determine it was throwing the 403 Bad Behavior error on mechanize.Browser().open().

This doesn't occur with a vanilla mechanize browser. Right now I'm doing:

Code:

def get_browser(self):
    br = mechanize.Browser()
    br.open(my_url)
    return br

This throws no errors and seems to parse well. My question is what in the BasicNewsRecipe's browser could be causing the above error? Is there a more correct or better way of resolving the problem?

Starson17 · 09-21-2010, 02:26 PM

Quote:

Originally Posted by krunk

I'm getting the following error in a custom recipe:
HTTP Error 403: Bad Behavior

Spoiler:

It could be several things, depending on how the Bad Behavior module is set. I was able to narrow it down to a single header and pass the test for one site by sending a simple Accept: header to avoid being seen as a spambot. You can try what I did:
https://www.mobileread.com/forums/sho...postcount=2399
Or you can use TamperData to track down exactly why FireFox passes the test and Calibre does not.

kovidgoyal · 09-21-2010, 02:52 PM

@krunk: the exact code setting up browser is in calibre/__init__.py you can copy paste it and comment out things one by one to see what's causing the problem.

krunk · 09-21-2010, 04:18 PM

@Starson17

Many thanks, that's a far more thorough recipe than I was working on.

@Kovid: Thank you, I'll poke around in there.

I'm curious why there's not a dedicated Calibre Recipe's forum? It seems worthy topic.

kovidgoyal · 09-21-2010, 04:34 PM

I am not an MR moderator, so it's not really upto me. You should start a thread in the feedback forum requesting it.

09-20-2010, 07:52 AM	#2776
marbs Zealot Posts: 122 Karma: 10 Join Date: Jul 2010 Device: nook	not an expert, but... reddit.com's rss is a list of links to all sorts of different data (articles, movies, music and so on). i dont think much can be done for this. i may be wrong ( i wrote my 1st recipe a week ago. and i am still writing).

09-21-2010, 01:29 AM	#2783
TonytheBookworm Addict Posts: 264 Karma: 62 Join Date: May 2010 Device: kindle 2, kindle 3, Kindle fire	I know calibre appears to get the title for the news feed from the recipe section of title = I'm curious to know is there a way to make the title where it puts the date next to the title? For example something like this Spoiler: Code: import datetime current_time = datetime.datetime.now() title = 'AJC Breaking news as of: ' + now.strftime("%Y-%m-%d %H:%M") ........... thanks

09-21-2010, 02:11 PM	#2786
krunk Member Posts: 19 Karma: 10 Join Date: Feb 2010 Location: Los Angeles, CA Device: Kindle 3	I'm getting the following error in a custom recipe: Code: 1% Fetching feed Skeptic Blog... Failed feed: Skeptic Blog Traceback (most recent call last): File "site-packages/calibre/web/feeds/news.py", line 1274, in parse_feeds File "site-packages/mechanize/_mechanize.py", line 209, in open File "site-packages/mechanize/_mechanize.py", line 261, in _mech_open httperror_seek_wrapper: HTTP Error 403: Bad Behavior I overloaded the get_browser() method with my own. Code: def get_browser(self): br = BasicNewsRecipe.get_browser() br.open(my_url) return br From this I was able to determine it was throwing the 403 Bad Behavior error on mechanize.Browser().open(). This doesn't occur with a vanilla mechanize browser. Right now I'm doing: Code: def get_browser(self): br = mechanize.Browser() br.open(my_url) return br This throws no errors and seems to parse well. My question is what in the BasicNewsRecipe's browser could be causing the above error? Is there a more correct or better way of resolving the problem?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

09-20-2010, 08:30 PM	#2780
bhandarisaurabh Enthusiast Posts: 49 Karma: 10 Join Date: Aug 2009 Device: none	there is already a recipe for foreign policy but it covers rss feeds can anyone make the recipe for print edition http://www.foreignpolicy.com/issues/current thanks in advance

09-21-2010, 02:52 PM	#2788
kovidgoyal creator of calibre Posts: 45,339 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	@krunk: the exact code setting up browser is in calibre/__init__.py you can copy paste it and comment out things one by one to see what's causing the problem.

09-21-2010, 04:18 PM	#2789
krunk Member Posts: 19 Karma: 10 Join Date: Feb 2010 Location: Los Angeles, CA Device: Kindle 3	@Starson17 Many thanks, that's a far more thorough recipe than I was working on. @Kovid: Thank you, I'll poke around in there. I'm curious why there's not a dedicated Calibre Recipe's forum? It seems worthy topic.

09-21-2010, 04:34 PM	#2790
kovidgoyal creator of calibre Posts: 45,339 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I am not an MR moderator, so it's not really upto me. You should start a thread in the feedback forum requesting it.

Advert

Advert