02-03-2023, 11:04 AM | #1 |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
Self-built RSS recipe crawl fails after calibre 6.12 update
After importing the OPML file, as long as the RSS feeds from Google news sources all fail, only the title is extracted, no content, please take a look, thank you very much!
I have tested all the RSS feeds imported by OPML and all the feeds from news.Google fail, only the headline, no content. This problem did not occur before the 6.12 update, it is today's update that this problem has occurred Other RSS feeds that are not news.Google.com/rss are extracted normally I reinstalled back to version 6.11 and still have this problem Last edited by fengli; 02-03-2023 at 09:48 PM. |
02-03-2023, 09:48 PM | #2 |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
For example:
#!/usr/bin/env python # vim:fileencoding=utf-8 from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1675479003(BasicNewsRecipe): title = 'Google新闻-科技巨头Eng' oldest_article = 1 max_articles_per_feed = 100 auto_cleanup = True feeds = [ ('"ASML" - Google News', 'https://news.google.com/news/rss/search?q=ASML&hl=en'), ('"twitter" - Google News', 'https://news.google.com/news/rss/search?q=twitter&hl=en'), ('"intel" - Google News', 'https://news.google.com/news/rss/search?q=intel&hl=en'), ('tencent - Google News', 'http://news.google.com/news?hl=en&gl=us&q=tencent&um=1&ie=UTF-8&output=rss'), ('amazon - Google News', 'http://news.google.com/news?hl=en&gl=us&q=amazon&um=1&ie=UTF-8&output=rss'), ('twitter - Google News', 'https://news.google.com/news/rss/search/section/q/twitter/twitter?hl=en&gl=US'), ('Ubuntu - Google News', 'http://news.google.com/news?hl=en&gl=us&q=Ubuntu&um=1&ie=UTF-8&output=rss'), ('TSMC - Google News', 'https://news.google.com/news/rss/search/section/q/TSMC/TSMC?hl=en&gl=US'), ('Google - Google News', 'https://news.google.com/news/rss/search/section/q/Google/Google?hl=en&gl=US'), ('alibaba - Google News', 'https://news.google.com/news/rss/search/section/q/alibaba/alibaba?hl=en&gl=US'), ('Apple - Google News', 'https://news.google.com/news/rss/search/section/q/Apple/Apple?hl=en&gl=US'), ('"tiktok" - Google News', 'https://news.google.com/news/rss/search/section/q/tiktok/tiktok?hl=en&gl=US&ned=us'), ('huawei - Google News', 'https://news.google.com/news/rss/search/section/q/huawei/huawei?hl=en&gl=US'), ('Amazon - Google News', 'https://news.google.com/news/rss/search/section/q/Amazon/Amazon?hl=en&gl=US'), ('space x - Google News', 'http://news.google.com/news?hl=en&gl=us&q=space%20x&um=1&ie=UTF-8&output=rss'), ('"AMD" - Google News', 'https://news.google.com/news/rss/search?q=AMD&hl=en'), ('"Nvidia" - Google News', 'https://news.google.com/news/rss/search?q=Nvidia&hl=en'), ('"STMicroelectronics" - Google News', 'https://news.google.com/news/rss/search?q=STMicroelectronics&hl=en'), ('"Broadcom" - Google News', 'https://news.google.com/news/rss/search?q=Broadcom&hl=en'), ('qualcomm - Google News', 'https://news.google.com/news/rss/search/section/q/qualcomm/qualcomm?hl=en&gl=US'), ('"MediaTek" - Google News', 'https://news.google.com/news/rss/search?q=MediaTek&hl=en'), ('"ZTE" - Google News', 'https://news.google.com/news/rss/search?q=ZTE&hl=en'), ('"huawei" - Google News', 'https://news.google.com/news/rss/search?q=huawei&hl=en'), ('"TSMC" - Google News', 'https://news.google.com/news/rss/search?q=TSMC&hl=en'), ('"Samsung" - Google News', 'https://news.google.com/news/rss/search?q=Samsung&&hl=en-US&gl=US&ceid=US:en'), ('"meta" - Google News', 'https://news.google.com/news/rss/search?q=meta&hl=en'), ('google新闻', 'https://news.google.com/news/rss/headlines/section/topic/TECHNOLOGY?ned=us&hl=en&gl=US'), ('microsoft', 'https://news.google.com/news/rss/search/section/q/microsoft/microsoft?hl=en&gl=US&ned=us'), ('amazone', 'https://news.google.com/news/rss/search/section/q/amazone/amazone?hl=en&gl=US&ned=us'), ('Google', 'https://news.google.com/news/rss/search/section/q/Google/Google?hl=en&gl=US&ned=us'), ('facebook', 'https://news.google.com/news/rss/search/section/q/facebook/facebook?hl=en&gl=US&ned=us'), ('apple', 'https://news.google.com/news/rss/search/section/q/apple/apple?hl=en&gl=US&ned=us'), ] Last edited by fengli; 02-03-2023 at 09:52 PM. |
02-04-2023, 01:46 AM | #3 |
Fanatic
Posts: 541
Karma: 82944
Join Date: May 2021
Device: kindle
|
Code:
articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() try: br.open(url) except Exception as e: url = e.hdrs.get('location') soup = self.index_to_soup(url) link = soup.find('a', href=True) html = br.open(link['href']).read() pt = PersistentTemporaryFile('.html') pt.write(html) pt.close() return pt.name Last edited by unkn0wn; 02-04-2023 at 01:50 AM. |
02-04-2023, 04:59 AM | #4 | |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
Quote:
Test recipe: #!/usr/bin/env python # vim:fileencoding=utf-8 from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1675504328(BasicNewsRecipe): title = 'Google news-ceshi' oldest_article = 1 max_articles_per_feed = 100 auto_cleanup = True articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() try: br.open(url) except Exception as e: url = e.hdrs.get('location') soup = self.index_to_soup(url) link = soup.find('a', href=True) html = br.open(link['href']).read() pt = PersistentTemporaryFile('.html') pt.write(html) pt.close() return pt.name feeds = [ ('"ASML" - Google News', 'https://news.google.com/news/rss/search?q=ASML&hl=en'), ('"intel" - Google News', 'https://news.google.com/news/rss/search?q=intel&hl=en'), ('amazon - Google News', 'http://news.google.com/news?hl=en&gl=us&q=amazon&um=1&ie=UTF-8&output=rss'), ('Ubuntu - Google News', 'http://news.google.com/news?hl=en&gl=us&q=Ubuntu&um=1&ie=UTF-8&output=rss'), ] Google news-ceshi.recipe Last edited by fengli; 02-04-2023 at 05:07 AM. |
|
02-04-2023, 10:15 AM | #5 |
Fanatic
Posts: 541
Karma: 82944
Join Date: May 2021
Device: kindle
|
add these 2 at the top.
Code:
from calibre import browser from calibre.ptempfile import PersistentTemporaryFile |
02-04-2023, 11:16 AM | #6 |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
|
07-18-2024, 04:52 AM | #7 | |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
Crawl Google News RSS suddenly failed, please help, thank you very much
Quote:
N001-economic.recipe Error message: Using user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Using proxies: {'http': '127.0.0.1:7890', 'https': '127.0.0.1:7890', 'ftp': 'http://127.0.0.1:7890'} Failed to download article: Umstieg auf E-Autos: Autoindustrie: Mehr als jede zweite Firma plant Stellenabbau - Zeit Online from https://news.google.com/rss/articles...iYmF10gEA?oc=5 Traceback (most recent call last): File "calibre\utils\threadpool.py", line 100, in run File "calibre\web\feeds\news.py", line 1201, in fetch_obfuscated_article File "<string>", line 23, in get_obfuscated_article TypeError: 'NoneType' object is not subscriptable |
|
07-19-2024, 02:35 AM | #8 |
Fanatic
Posts: 541
Karma: 82944
Join Date: May 2021
Device: kindle
|
yea looks like google feeds wont be working anymore.
they've made it harder. Code:
{ "POST": { "scheme": "https", "host": "news.google.com", "filename": "/_/DotsSplashUi/data/batchexecute", "query": { "rpcids": "Fbv4je", "source-path": "/rss/articles/CBMiX2h0dHBzOi8vd3d3LnplaXQuZGUvbmV3cy8yMDI0LTA3LzE4L2F1dG9pbmR1c3RyaWUtbWVoci1hbHMtamVkZS16d2VpdGUtZmlybWEtcGxhbnQtc3RlbGxlbmFiYmF10gEA", "f.sid": "-5052485330158874245", "bl": "boq_dotssplashserver_20240715.12_p1", "hl": "en-IN", "gl": "IN", "soc-app": "140", "soc-platform": "1", "soc-device": "1", "_reqid": "123443", "rt": "c" }, "remote": { "Address": "" } } } response )]}' 221 [["wrb.fr","Fbv4je","[\"garturlres\",\"https://www.zeit.de/news/2024-07/18/autoindustrie-mehr-als-jede-zweite-firma-plant-stellenabbau\"]",null,null,null,"generic"],["di",13],["af.httprm",13,"-7658855237455742109",108]] 25 [["e",4,null,null,257]] |
07-19-2024, 03:34 AM | #9 | |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
Quote:
|
|
07-19-2024, 11:49 AM | #10 |
Resident Curmudgeon
Posts: 76,305
Karma: 136006010
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
@fengli is there any reason you cannot update to the latest version 7 of calibre?
|
07-19-2024, 07:23 PM | #11 |
Connoisseur
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre: Globe and Mail Recipe now fails | xxxronjames | Recipes | 1 | 11-08-2018 03:25 AM |
Kindle voyage slowed to a crawl after update | cerem0ny | Amazon Kindle | 15 | 03-02-2016 01:41 PM |
Built in calibre recipe broken : Prospect Magazine | duluoz | Recipes | 1 | 05-24-2012 08:19 AM |
Calibre rss recipe -- <em> tag in article titles? | TonyDeWonderful | Recipes | 2 | 03-15-2011 12:23 PM |
NY Times Recipe in Calibre 6.36 Fails | keyrunner | Calibre | 1 | 01-28-2010 11:56 AM |