Recipe not working

phkoech · 08-11-2009, 06:00 PM

Hello everybody,
I'm trying to grab french cooking from this type of page : http://www.marmiton.org/Recettes/Rec...ses_45471.aspx, but the "comment" section of the page (from "les commentaires des internautes" to "Bel effet dans l'assiette et excellent.") doesn't appear in the book produced by Calibre (via the News system).
Do you have any idea about how to succeed grabing this section ?

My recipe :

Code:

class Recettes(BasicNewsRecipe):
    title          = 'RecettesPrint'
    __author__ = 'Kek <kek.fr>'
    description = 'Recettes'
    oldest_article = 3
    language = _('French')
    max_articles_per_feed = 50
    no_stylesheets = True

    html2lrf_options = ['--base-font-size', '10']

    feeds =  [
             ('Recette Top', 'url from the uml feed'),
             ]
    
    def print_version(self, url):
        if 'marmiton.org/Recettes/' in url:
            url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url)
            return url

phkoech · 08-12-2009, 03:24 AM

I've tried to modify my code (see below), but I still have the problem with comments not output by Calibre.
That's very strange because HTML is quite simple. The only strange things I see is :
- fontsize = 1 (rest of the page have fontsize = 2) => I supose Calibre is able to manage it
- there is a bug in HTML source code because there is a </b> tag without the <b> before => can I correct it with proprocess_html ?

Code:

class RecettesPrint(BasicNewsRecipe):
    title          = 'RecettesPrint'
    __author__ = 'Kek <kek.fr>'
    description = 'Recettes'
    oldest_article = 3
    language = _('French')
    max_articles_per_feed = 5000
    no_stylesheets = True
    use_embedded_content = False
    remove_javascript = True
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }'
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True'

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(align=True):
            del item['align']
        for item in soup.findAll(valign=True):
            del item['valign']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
    
    def print_version(self, url):
        if 'marmiton.org/Recettes/' in url:
            url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url)
            return url

kiklop74 · 08-12-2009, 08:50 AM

I can just point out that html2lrf_options and html2epub_options are no longer valid. You should use new flag conversion_option like this:

Code:

conversion_options = {  'tags'         :'aa,bb'
                          , 'publisher'        : 'pub'
                          , 'comments'      :  'desc'
                          , 'language'       : 'en'
                          , 'linearize_tables' : True
                          }

phkoech · 08-13-2009, 05:41 PM

Thanks reading my code.
I've tried this correction, but no change. Strange.
I've found another solution but grabing the normal pages instead of the printable ones. In this case, I do not have the problem anymore. My problem is probably due to an unclean HTML code.

08-11-2009, 06:00 PM	#1
phkoech Member Posts: 10 Karma: 10 Join Date: Jul 2009 Device: Sony PRS-505	Recipe not working Hello everybody, I'm trying to grab french cooking from this type of page : http://www.marmiton.org/Recettes/Rec...ses_45471.aspx, but the "comment" section of the page (from "les commentaires des internautes" to "Bel effet dans l'assiette et excellent.") doesn't appear in the book produced by Calibre (via the News system). Do you have any idea about how to succeed grabing this section ? My recipe : Code: class Recettes(BasicNewsRecipe): title = 'RecettesPrint' __author__ = 'Kek <kek.fr>' description = 'Recettes' oldest_article = 3 language = _('French') max_articles_per_feed = 50 no_stylesheets = True html2lrf_options = ['--base-font-size', '10'] feeds = [ ('Recette Top', 'url from the uml feed'), ] def print_version(self, url): if 'marmiton.org/Recettes/' in url: url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url) return url

08-12-2009, 08:50 AM	#3
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	I can just point out that html2lrf_options and html2epub_options are no longer valid. You should use new flag conversion_option like this: Code: conversion_options = { 'tags' :'aa,bb' , 'publisher' : 'pub' , 'comments' : 'desc' , 'language' : 'en' , 'linearize_tables' : True }

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Guardian Recipe has stopped working	jbambridge	Calibre	2	04-11-2010 01:14 PM
The Economist (free) recipe not working	paladin10000	Calibre	1	01-28-2010 12:44 PM
Google Reader recipe not working :(	techie_007	Calibre	1	01-26-2010 09:58 PM
New Yorker recipe not working ...	cartesio	Calibre	11	08-20-2009 01:24 AM
The Moscow Times recipe isn't working.	girlperson1	Calibre	4	12-01-2008 06:42 AM

08-13-2009, 05:41 PM	#4
phkoech Member Posts: 10 Karma: 10 Join Date: Jul 2009 Device: Sony PRS-505	Thanks reading my code. I've tried this correction, but no change. Strange. I've found another solution but grabing the normal pages instead of the printable ones. In this case, I do not have the problem anymore. My problem is probably due to an unclean HTML code.

Advert