08-11-2009, 07:00 PM | #1 |
Member
Posts: 10
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
|
Recipe not working
Hello everybody,
I'm trying to grab french cooking from this type of page : http://www.marmiton.org/Recettes/Rec...ses_45471.aspx, but the "comment" section of the page (from "les commentaires des internautes" to "Bel effet dans l'assiette et excellent.") doesn't appear in the book produced by Calibre (via the News system). Do you have any idea about how to succeed grabing this section ? My recipe : Code:
class Recettes(BasicNewsRecipe): title = 'RecettesPrint' __author__ = 'Kek <kek.fr>' description = 'Recettes' oldest_article = 3 language = _('French') max_articles_per_feed = 50 no_stylesheets = True html2lrf_options = ['--base-font-size', '10'] feeds = [ ('Recette Top', 'url from the uml feed'), ] def print_version(self, url): if 'marmiton.org/Recettes/' in url: url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url) return url |
08-12-2009, 04:24 AM | #2 |
Member
Posts: 10
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
|
I've tried to modify my code (see below), but I still have the problem with comments not output by Calibre.
That's very strange because HTML is quite simple. The only strange things I see is : - fontsize = 1 (rest of the page have fontsize = 2) => I supose Calibre is able to manage it - there is a bug in HTML source code because there is a </b> tag without the <b> before => can I correct it with proprocess_html ? Code:
class RecettesPrint(BasicNewsRecipe): title = 'RecettesPrint' __author__ = 'Kek <kek.fr>' description = 'Recettes' oldest_article = 3 language = _('French') max_articles_per_feed = 5000 no_stylesheets = True use_embedded_content = False remove_javascript = True extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }' html2lrf_options = ['--ignore-tables'] html2epub_options = 'linearize_tables = True' def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(align=True): del item['align'] for item in soup.findAll(valign=True): del item['valign'] for item in soup.findAll(face=True): del item['face'] return soup def print_version(self, url): if 'marmiton.org/Recettes/' in url: url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url) return url |
Advert | |
|
08-12-2009, 09:50 AM | #3 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I can just point out that html2lrf_options and html2epub_options are no longer valid. You should use new flag conversion_option like this:
Code:
conversion_options = { 'tags' :'aa,bb' , 'publisher' : 'pub' , 'comments' : 'desc' , 'language' : 'en' , 'linearize_tables' : True } |
08-13-2009, 06:41 PM | #4 |
Member
Posts: 10
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
|
Thanks reading my code.
I've tried this correction, but no change. Strange. I've found another solution but grabing the normal pages instead of the printable ones. In this case, I do not have the problem anymore. My problem is probably due to an unclean HTML code. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Guardian Recipe has stopped working | jbambridge | Calibre | 2 | 04-11-2010 02:14 PM |
The Economist (free) recipe not working | paladin10000 | Calibre | 1 | 01-28-2010 01:44 PM |
Google Reader recipe not working :( | techie_007 | Calibre | 1 | 01-26-2010 10:58 PM |
New Yorker recipe not working ... | cartesio | Calibre | 11 | 08-20-2009 02:24 AM |
The Moscow Times recipe isn't working. | girlperson1 | Calibre | 4 | 12-01-2008 07:42 AM |