Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-11-2024, 02:32 AM   #1
SpicyPoison
Enthusiast
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Post Reuters : Need help for creating recipe for Reuters using these RSS feeds


A few months ago the built-in recipe for Reuters stopped working due to human verification on Reuters site. More info on this thread.

Reuters don't offer RSS feeds for reading articles of their website. So I found some RSS feeds from third party sources which seems to fetch articles from Reuters efficiently. But with very poor formatting.
1. Paragraphs seems to break after every hyperlink
2. in-line image of share button is displayed as full size image.
I read this file on my Kindle Paperwhite and the formatting was terrible for reading. Reuters - calibre.mobi

I don't know how to code in python. I have all the required RSS feeds for the Reuters. Can anyone who understand python can create a recipe form these RSS feeds which have very good formatting.

Here are all the RSS feed links. Reuters RSS feeds.txt

Thank you in advance for your help.
SpicyPoison is offline   Reply With Quote
Old 06-12-2024, 03:58 AM   #2
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 542
Karma: 82944
Join Date: May 2021
Device: kindle
Share your recipe with all the feeds. I will make changes.
unkn0wn is offline   Reply With Quote
Advert
Old 06-12-2024, 04:31 AM   #3
SpicyPoison
Enthusiast
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Quote:
Originally Posted by unkn0wn View Post
Share your recipe with all the feeds. I will make changes.
I didn't added any extra code. All I did was using the basic calibre interface to add RSS feeds in News Fetch section and named it "Reuters".
The downloaded news preodical is attached in my previous message.
That's why the formatting was not good as I didn't added any extra python code to make it only download relatable text from the web page. As you can see in the .mobi file attached in my previous message, one image is displayed in full page size, which should have been displayed in-line in the text. Or even if it is not displayed, it doesn't matter. I don't know Python enough to make this work. That's why I posted all the RSS links here so that someone who understands Python can create the recipe.

Since the last recipe was created by you, I believe you can do this better than anyone else.
SpicyPoison is offline   Reply With Quote
Old 06-12-2024, 05:23 AM   #4
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 542
Karma: 82944
Join Date: May 2021
Device: kindle
i still get HTTP Error 401: HTTP Forbidden.
maybe it kinda worked for you that one time.
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2020, Kovid Goyal <kovid at kovidgoyal.net>

from calibre.web.feeds.news import BasicNewsRecipe


def prefixed_classes(classes):
    q = frozenset(classes.split(' '))

    def matcher(x):
        if x:
            for candidate in frozenset(x.split()):
                for x in q:
                    if candidate.startswith(x):
                        return True
        return False
    return {'attrs': {'class': matcher}}


class Reuters(BasicNewsRecipe):
    title = 'Reuters'
    description = 'News from all over'
    __author__ = 'Kovid Goyal'
    language = 'en'


    keep_only_tags = [
        prefixed_classes('article-body__container__ article-header__container__'),
    ]
    remove_tags = [
        prefixed_classes(
            'context-widget__tabs___ article-header__toolbar__ read-next-mobile__container__ toolbar__container__ button__link__'
            ' ArticleBody-read-time-and-social Slideshow-expand-button- TwoColumnsLayout-footer- RegistrationPrompt__container___'
            ' SocialEmbed__inner___ trust-badge author-bio__social__ with-spinner__spinner__ author-bio__author-image__'
        ),
        dict(name=['button', 'link', 'svg']),
    ]
    remove_attributes = ['style', 'height', 'width']

    extra_css = '''
        img { max-width: 100%; }
        [class^="article-header__tags__"],
        [class^="author-bio__author-card__"],
        [class^="article-header__author-date__"] {
            font-size:small;
        }
        [data-testid="primary-gallery"], [data-testid="primary-image"] { font-size:small; text-align:center; }
    '''

    feeds = [
        ('World', 'https://rsshub.app/reuters/world'),
        ('Business', 'https://rsshub.app/reuters/business'),
        ('Finance', 'https://rsshub.app/reuters/business/finance'),
        ('Markets', 'https://rsshub.app/reuters/markets'),
        ('Technology', 'https://rsshub.app/reuters/technology'),
        ('Sports', 'https://rsshub.app/reuters/sports'),
        ('Science', 'https://rsshub.app/reuters/science'),
        ('Lifestyle', 'https://rsshub.app/reuters/lifestyle')
    ]

    def preprocess_html(self, soup):
        for noscript in soup.findAll('noscript'):
            if noscript.findAll('img'):
                noscript.name = 'div'
        for img in soup.findAll('img', attrs={'srcset':True}):
            img['src'] = img['srcset'].split()[0]
        return soup
unkn0wn is offline   Reply With Quote
Old 06-12-2024, 09:34 AM   #5
SpicyPoison
Enthusiast
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Quote:
Originally Posted by unkn0wn View Post
i still get HTTP Error 401: HTTP Forbidden.
maybe it kinda worked for you that one time.
Are you using the same python code from the previous built-in Reuters recipe?
Or have you formed completely new code as per the latest requirements??

First try to download articles from RSS feeds only. Then try to correct the formatting errors.

The most annoying error is that "share" button image after 2-3 lines displayed in full page.

Last edited by SpicyPoison; 06-12-2024 at 09:35 AM. Reason: Typo
SpicyPoison is offline   Reply With Quote
Advert
Old 06-12-2024, 09:37 AM   #6
SpicyPoison
Enthusiast
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
How can I exclude certain image from the article using Python?
How can I prevent paragraph breaks using python?
SpicyPoison is offline   Reply With Quote
Old 06-12-2024, 11:43 AM   #7
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 542
Karma: 82944
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...023d84d11a8412
unkn0wn is offline   Reply With Quote
Old 06-13-2024, 11:25 PM   #8
SpicyPoison
Enthusiast
SpicyPoison began at the beginning.
 
SpicyPoison's Avatar
 
Posts: 38
Karma: 10
Join Date: Dec 2023
Device: Amazon Kindle Paperwhite
Thumbs up

Quote:
Originally Posted by unkn0wn View Post
Just checked the new recipe. It's working fine. Thanks for your help.
SpicyPoison is offline   Reply With Quote
Old 07-09-2024, 06:12 PM   #9
Jhonybravo
Junior Member
Jhonybravo began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2024
Device: Apple
Explain Me

Bro, Can you tell me how to use this script step by step. I'm using windows 10.
Jhonybravo is offline   Reply With Quote
Old 07-09-2024, 06:27 PM   #10
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,746
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Just use the calibre Fetch News button and select the Reuters recipe!

All the above link is showing the fix made to the repository for the recipe, and the recipes are automatically updated by calibre.
PeterT is offline   Reply With Quote
Old 07-20-2024, 12:01 PM   #11
amanda64
Junior Member
amanda64 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2024
Location: italy
Device: macnook
Using RSS feeds to create a recipe for Reuters is an excellent idea. It ensures you stay updated with the latest news and trends. By aggregating these feeds, you can curate a personalized news digest that provides comprehensive and timely information from a trusted source. Happy reading!
amanda64 is offline   Reply With Quote
Reply

Tags
kindle, paperwhite, recipes, reuters, rss feeds


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
re: Reuters recipe too big 1.6GB! hongho71 Recipes 2 10-13-2023 03:19 PM
Seeking Reuters recipe fengli Recipes 5 11-16-2022 05:57 AM
Reuters recipe not working duluoz Recipes 1 01-01-2022 05:06 AM
Reuters recipe broken duluoz Recipes 1 02-05-2021 03:25 AM
Reuters (en) recipe help BRGriff Recipes 3 11-29-2013 01:00 PM


All times are GMT -4. The time now is 03:23 PM.


MobileRead.com is a privately owned, operated and funded community.