Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-03-2023, 11:04 AM   #1
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Aug 2022
Device: PC
Self-built RSS recipe crawl fails after calibre 6.12 update

After importing the OPML file, as long as the RSS feeds from Google news sources all fail, only the title is extracted, no content, please take a look, thank you very much!

I have tested all the RSS feeds imported by OPML and all the feeds from news.Google fail, only the headline, no content.Click image for larger version

Name:	屏幕截图 2023-02-04 000047.png
Views:	127
Size:	22.9 KB
ID:	199431

This problem did not occur before the 6.12 update, it is today's update that this problem has occurred

Other RSS feeds that are not news.Google.com/rss are extracted normally

I reinstalled back to version 6.11 and still have this problem

Last edited by fengli; 02-03-2023 at 09:48 PM.
fengli is offline   Reply With Quote
Old 02-03-2023, 09:48 PM   #2
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Aug 2022
Device: PC
For example:




#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1675479003(BasicNewsRecipe):
title = 'Google新闻-科技巨头Eng'
oldest_article = 1
max_articles_per_feed = 100
auto_cleanup = True

feeds = [
('"ASML" - Google News', 'https://news.google.com/news/rss/search?q=ASML&hl=en'),
('"twitter" - Google News', 'https://news.google.com/news/rss/search?q=twitter&hl=en'),
('"intel" - Google News', 'https://news.google.com/news/rss/search?q=intel&hl=en'),
('tencent - Google News', 'http://news.google.com/news?hl=en&gl=us&q=tencent&um=1&ie=UTF-8&output=rss'),
('amazon - Google News', 'http://news.google.com/news?hl=en&gl=us&q=amazon&um=1&ie=UTF-8&output=rss'),
('twitter - Google News', 'https://news.google.com/news/rss/search/section/q/twitter/twitter?hl=en&gl=US'),
('Ubuntu - Google News', 'http://news.google.com/news?hl=en&gl=us&q=Ubuntu&um=1&ie=UTF-8&output=rss'),
('TSMC - Google News', 'https://news.google.com/news/rss/search/section/q/TSMC/TSMC?hl=en&gl=US'),
('Google - Google News', 'https://news.google.com/news/rss/search/section/q/Google/Google?hl=en&gl=US'),
('alibaba - Google News', 'https://news.google.com/news/rss/search/section/q/alibaba/alibaba?hl=en&gl=US'),
('Apple - Google News', 'https://news.google.com/news/rss/search/section/q/Apple/Apple?hl=en&gl=US'),
('"tiktok" - Google News', 'https://news.google.com/news/rss/search/section/q/tiktok/tiktok?hl=en&gl=US&ned=us'),
('huawei - Google News', 'https://news.google.com/news/rss/search/section/q/huawei/huawei?hl=en&gl=US'),
('Amazon - Google News', 'https://news.google.com/news/rss/search/section/q/Amazon/Amazon?hl=en&gl=US'),
('space x - Google News', 'http://news.google.com/news?hl=en&gl=us&q=space%20x&um=1&ie=UTF-8&output=rss'),
('"AMD" - Google News', 'https://news.google.com/news/rss/search?q=AMD&hl=en'),
('"Nvidia" - Google News', 'https://news.google.com/news/rss/search?q=Nvidia&hl=en'),
('"STMicroelectronics" - Google News', 'https://news.google.com/news/rss/search?q=STMicroelectronics&hl=en'),
('"Broadcom" - Google News', 'https://news.google.com/news/rss/search?q=Broadcom&hl=en'),
('qualcomm - Google News', 'https://news.google.com/news/rss/search/section/q/qualcomm/qualcomm?hl=en&gl=US'),
('"MediaTek" - Google News', 'https://news.google.com/news/rss/search?q=MediaTek&hl=en'),
('"ZTE" - Google News', 'https://news.google.com/news/rss/search?q=ZTE&hl=en'),
('"huawei" - Google News', 'https://news.google.com/news/rss/search?q=huawei&hl=en'),
('"TSMC" - Google News', 'https://news.google.com/news/rss/search?q=TSMC&hl=en'),
('"Samsung" - Google News', 'https://news.google.com/news/rss/search?q=Samsung&&hl=en-US&gl=US&ceid=US:en'),
('"meta" - Google News', 'https://news.google.com/news/rss/search?q=meta&hl=en'),
('google新闻', 'https://news.google.com/news/rss/headlines/section/topic/TECHNOLOGY?ned=us&hl=en&gl=US'),
('microsoft', 'https://news.google.com/news/rss/search/section/q/microsoft/microsoft?hl=en&gl=US&ned=us'),
('amazone', 'https://news.google.com/news/rss/search/section/q/amazone/amazone?hl=en&gl=US&ned=us'),
('Google', 'https://news.google.com/news/rss/search/section/q/Google/Google?hl=en&gl=US&ned=us'),
('facebook', 'https://news.google.com/news/rss/search/section/q/facebook/facebook?hl=en&gl=US&ned=us'),
('apple', 'https://news.google.com/news/rss/search/section/q/apple/apple?hl=en&gl=US&ned=us'),
]

Last edited by fengli; 02-03-2023 at 09:52 PM.
fengli is offline   Reply With Quote
Advert
Old 02-04-2023, 01:46 AM   #3
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 490
Karma: 82764
Join Date: May 2021
Device: kindle
Code:
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        try:
            br.open(url)
        except Exception as e:
            url = e.hdrs.get('location')
        soup = self.index_to_soup(url)
        link = soup.find('a', href=True)
        html = br.open(link['href']).read()
        pt = PersistentTemporaryFile('.html')
        pt.write(html)
        pt.close()
        return pt.name
try adding this to the recipe

Last edited by unkn0wn; 02-04-2023 at 01:50 AM.
unkn0wn is offline   Reply With Quote
Old 02-04-2023, 04:59 AM   #4
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Aug 2022
Device: PC
Quote:
Originally Posted by unkn0wn View Post
Code:
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        try:
            br.open(url)
        except Exception as e:
            url = e.hdrs.get('location')
        soup = self.index_to_soup(url)
        link = soup.find('a', href=True)
        html = br.open(link['href']).read()
        pt = PersistentTemporaryFile('.html')
        pt.write(html)
        pt.close()
        return pt.name
try adding this to the recipe
I have tested many times and found that it still fails,The content is all empty and the links are gone please take another look, thank you very much
Test recipe:
#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1675504328(BasicNewsRecipe):
title = 'Google news-ceshi'
oldest_article = 1
max_articles_per_feed = 100
auto_cleanup = True

articles_are_obfuscated = True

def get_obfuscated_article(self, url):
br = self.get_browser()
try:
br.open(url)
except Exception as e:
url = e.hdrs.get('location')
soup = self.index_to_soup(url)
link = soup.find('a', href=True)
html = br.open(link['href']).read()
pt = PersistentTemporaryFile('.html')
pt.write(html)
pt.close()
return pt.name

feeds = [
('"ASML" - Google News', 'https://news.google.com/news/rss/search?q=ASML&hl=en'),
('"intel" - Google News', 'https://news.google.com/news/rss/search?q=intel&hl=en'),
('amazon - Google News', 'http://news.google.com/news?hl=en&gl=us&q=amazon&um=1&ie=UTF-8&output=rss'),
('Ubuntu - Google News', 'http://news.google.com/news?hl=en&gl=us&q=Ubuntu&um=1&ie=UTF-8&output=rss'),
]

Google news-ceshi.recipe
Attached Thumbnails
Click image for larger version

Name:	屏幕截图 2023-02-04 180352.png
Views:	95
Size:	16.9 KB
ID:	199455  

Last edited by fengli; 02-04-2023 at 05:07 AM.
fengli is offline   Reply With Quote
Old 02-04-2023, 10:15 AM   #5
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 490
Karma: 82764
Join Date: May 2021
Device: kindle
add these 2 at the top.
Code:
from calibre import browser
from calibre.ptempfile import PersistentTemporaryFile
i forgot to tell you this. try again once.
unkn0wn is offline   Reply With Quote
Advert
Old 02-04-2023, 11:16 AM   #6
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Aug 2022
Device: PC
Quote:
Originally Posted by unkn0wn View Post
add these 2 at the top.
Code:
from calibre import browser
from calibre.ptempfile import PersistentTemporaryFile
i forgot to tell you this. try again once.
Great, very impressive, it has worked, thank you very much big brother!👍👍👍
fengli is offline   Reply With Quote
Old Today, 04:52 AM   #7
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Aug 2022
Device: PC
Crawl Google News RSS suddenly failed, please help, thank you very much

Quote:
Originally Posted by fengli View Post
Great, very impressive, it has worked, thank you very much big brother!👍👍👍
Please help me take a look, thank you very much


N001-economic.recipe


Error message:
Using user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36
Using proxies: {'http': '127.0.0.1:7890', 'https': '127.0.0.1:7890', 'ftp': 'http://127.0.0.1:7890'}
Failed to download article: Umstieg auf E-Autos: Autoindustrie: Mehr als jede zweite Firma plant Stellenabbau - Zeit Online from https://news.google.com/rss/articles...iYmF10gEA?oc=5
Traceback (most recent call last):
File "calibre\utils\threadpool.py", line 100, in run
File "calibre\web\feeds\news.py", line 1201, in fetch_obfuscated_article
File "<string>", line 23, in get_obfuscated_article
TypeError: 'NoneType' object is not subscriptable
fengli is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre: Globe and Mail Recipe now fails xxxronjames Recipes 1 11-08-2018 03:25 AM
Kindle voyage slowed to a crawl after update cerem0ny Amazon Kindle 15 03-02-2016 01:41 PM
Built in calibre recipe broken : Prospect Magazine duluoz Recipes 1 05-24-2012 08:19 AM
Calibre rss recipe -- <em> tag in article titles? TonyDeWonderful Recipes 2 03-15-2011 12:23 PM
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 11:56 AM


All times are GMT -4. The time now is 08:19 AM.


MobileRead.com is a privately owned, operated and funded community.