|
|
Thread Tools | Search this Thread |
06-11-2024, 02:32 AM | #1 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
Reuters : Need help for creating recipe for Reuters using these RSS feeds
A few months ago the built-in recipe for Reuters stopped working due to human verification on Reuters site. More info on this thread. Reuters don't offer RSS feeds for reading articles of their website. So I found some RSS feeds from third party sources which seems to fetch articles from Reuters efficiently. But with very poor formatting. 1. Paragraphs seems to break after every hyperlink 2. in-line image of share button is displayed as full size image. I read this file on my Kindle Paperwhite and the formatting was terrible for reading. Reuters - calibre.mobi I don't know how to code in python. I have all the required RSS feeds for the Reuters. Can anyone who understand python can create a recipe form these RSS feeds which have very good formatting. Here are all the RSS feed links. Reuters RSS feeds.txt Thank you in advance for your help. |
06-12-2024, 03:58 AM | #2 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
Share your recipe with all the feeds. I will make changes.
|
06-12-2024, 04:31 AM | #3 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
I didn't added any extra code. All I did was using the basic calibre interface to add RSS feeds in News Fetch section and named it "Reuters".
The downloaded news preodical is attached in my previous message. That's why the formatting was not good as I didn't added any extra python code to make it only download relatable text from the web page. As you can see in the .mobi file attached in my previous message, one image is displayed in full page size, which should have been displayed in-line in the text. Or even if it is not displayed, it doesn't matter. I don't know Python enough to make this work. That's why I posted all the RSS links here so that someone who understands Python can create the recipe. Since the last recipe was created by you, I believe you can do this better than anyone else. |
06-12-2024, 05:23 AM | #4 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
i still get HTTP Error 401: HTTP Forbidden.
maybe it kinda worked for you that one time. Code:
#!/usr/bin/env python # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2020, Kovid Goyal <kovid at kovidgoyal.net> from calibre.web.feeds.news import BasicNewsRecipe def prefixed_classes(classes): q = frozenset(classes.split(' ')) def matcher(x): if x: for candidate in frozenset(x.split()): for x in q: if candidate.startswith(x): return True return False return {'attrs': {'class': matcher}} class Reuters(BasicNewsRecipe): title = 'Reuters' description = 'News from all over' __author__ = 'Kovid Goyal' language = 'en' keep_only_tags = [ prefixed_classes('article-body__container__ article-header__container__'), ] remove_tags = [ prefixed_classes( 'context-widget__tabs___ article-header__toolbar__ read-next-mobile__container__ toolbar__container__ button__link__' ' ArticleBody-read-time-and-social Slideshow-expand-button- TwoColumnsLayout-footer- RegistrationPrompt__container___' ' SocialEmbed__inner___ trust-badge author-bio__social__ with-spinner__spinner__ author-bio__author-image__' ), dict(name=['button', 'link', 'svg']), ] remove_attributes = ['style', 'height', 'width'] extra_css = ''' img { max-width: 100%; } [class^="article-header__tags__"], [class^="author-bio__author-card__"], [class^="article-header__author-date__"] { font-size:small; } [data-testid="primary-gallery"], [data-testid="primary-image"] { font-size:small; text-align:center; } ''' feeds = [ ('World', 'https://rsshub.app/reuters/world'), ('Business', 'https://rsshub.app/reuters/business'), ('Finance', 'https://rsshub.app/reuters/business/finance'), ('Markets', 'https://rsshub.app/reuters/markets'), ('Technology', 'https://rsshub.app/reuters/technology'), ('Sports', 'https://rsshub.app/reuters/sports'), ('Science', 'https://rsshub.app/reuters/science'), ('Lifestyle', 'https://rsshub.app/reuters/lifestyle') ] def preprocess_html(self, soup): for noscript in soup.findAll('noscript'): if noscript.findAll('img'): noscript.name = 'div' for img in soup.findAll('img', attrs={'srcset':True}): img['src'] = img['srcset'].split()[0] return soup |
06-12-2024, 09:34 AM | #5 | |
Enthusiast
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
Quote:
Or have you formed completely new code as per the latest requirements?? First try to download articles from RSS feeds only. Then try to correct the formatting errors. The most annoying error is that "share" button image after 2-3 lines displayed in full page. Last edited by SpicyPoison; 06-12-2024 at 09:35 AM. Reason: Typo |
|
06-12-2024, 09:37 AM | #6 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
How can I exclude certain image from the article using Python?
How can I prevent paragraph breaks using python? |
06-12-2024, 11:43 AM | #7 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
|
06-13-2024, 11:25 PM | #8 | |
Enthusiast
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
|
Quote:
|
|
07-09-2024, 06:12 PM | #9 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jul 2024
Device: Apple
|
Explain Me
Bro, Can you tell me how to use this script step by step. I'm using windows 10.
|
07-09-2024, 06:27 PM | #10 |
Grand Sorcerer
Posts: 12,760
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Just use the calibre Fetch News button and select the Reuters recipe!
All the above link is showing the fix made to the repository for the recipe, and the recipes are automatically updated by calibre. |
07-20-2024, 12:01 PM | #11 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jul 2024
Location: italy
Device: macnook
|
Using RSS feeds to create a recipe for Reuters is an excellent idea. It ensures you stay updated with the latest news and trends. By aggregating these feeds, you can curate a personalized news digest that provides comprehensive and timely information from a trusted source. Happy reading!
|
Tags |
kindle, paperwhite, recipes, reuters, rss feeds |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
re: Reuters recipe too big 1.6GB! | hongho71 | Recipes | 2 | 10-13-2023 03:19 PM |
Seeking Reuters recipe | fengli | Recipes | 5 | 11-16-2022 05:57 AM |
Reuters recipe not working | duluoz | Recipes | 1 | 01-01-2022 05:06 AM |
Reuters recipe broken | duluoz | Recipes | 1 | 02-05-2021 03:25 AM |
Reuters (en) recipe help | BRGriff | Recipes | 3 | 11-29-2013 01:00 PM |