Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-03-2023, 12:04 PM   #1
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
Self-built RSS recipe crawl fails after calibre 6.12 update

After importing the OPML file, as long as the RSS feeds from Google news sources all fail, only the title is extracted, no content, please take a look, thank you very much!

I have tested all the RSS feeds imported by OPML and all the feeds from news.Google fail, only the headline, no content.Click image for larger version

Name:	屏幕截图 2023-02-04 000047.png
Views:	321
Size:	22.9 KB
ID:	199431

This problem did not occur before the 6.12 update, it is today's update that this problem has occurred

Other RSS feeds that are not news.Google.com/rss are extracted normally

I reinstalled back to version 6.11 and still have this problem

Last edited by fengli; 02-03-2023 at 10:48 PM.
fengli is offline   Reply With Quote
Old 02-03-2023, 10:48 PM   #2
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
For example:




#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1675479003(BasicNewsRecipe):
title = 'Google新闻-科技巨头Eng'
oldest_article = 1
max_articles_per_feed = 100
auto_cleanup = True

feeds = [
('"ASML" - Google News', 'https://news.google.com/news/rss/search?q=ASML&hl=en'),
('"twitter" - Google News', 'https://news.google.com/news/rss/search?q=twitter&hl=en'),
('"intel" - Google News', 'https://news.google.com/news/rss/search?q=intel&hl=en'),
('tencent - Google News', 'http://news.google.com/news?hl=en&gl=us&q=tencent&um=1&ie=UTF-8&output=rss'),
('amazon - Google News', 'http://news.google.com/news?hl=en&gl=us&q=amazon&um=1&ie=UTF-8&output=rss'),
('twitter - Google News', 'https://news.google.com/news/rss/search/section/q/twitter/twitter?hl=en&gl=US'),
('Ubuntu - Google News', 'http://news.google.com/news?hl=en&gl=us&q=Ubuntu&um=1&ie=UTF-8&output=rss'),
('TSMC - Google News', 'https://news.google.com/news/rss/search/section/q/TSMC/TSMC?hl=en&gl=US'),
('Google - Google News', 'https://news.google.com/news/rss/search/section/q/Google/Google?hl=en&gl=US'),
('alibaba - Google News', 'https://news.google.com/news/rss/search/section/q/alibaba/alibaba?hl=en&gl=US'),
('Apple - Google News', 'https://news.google.com/news/rss/search/section/q/Apple/Apple?hl=en&gl=US'),
('"tiktok" - Google News', 'https://news.google.com/news/rss/search/section/q/tiktok/tiktok?hl=en&gl=US&ned=us'),
('huawei - Google News', 'https://news.google.com/news/rss/search/section/q/huawei/huawei?hl=en&gl=US'),
('Amazon - Google News', 'https://news.google.com/news/rss/search/section/q/Amazon/Amazon?hl=en&gl=US'),
('space x - Google News', 'http://news.google.com/news?hl=en&gl=us&q=space%20x&um=1&ie=UTF-8&output=rss'),
('"AMD" - Google News', 'https://news.google.com/news/rss/search?q=AMD&hl=en'),
('"Nvidia" - Google News', 'https://news.google.com/news/rss/search?q=Nvidia&hl=en'),
('"STMicroelectronics" - Google News', 'https://news.google.com/news/rss/search?q=STMicroelectronics&hl=en'),
('"Broadcom" - Google News', 'https://news.google.com/news/rss/search?q=Broadcom&hl=en'),
('qualcomm - Google News', 'https://news.google.com/news/rss/search/section/q/qualcomm/qualcomm?hl=en&gl=US'),
('"MediaTek" - Google News', 'https://news.google.com/news/rss/search?q=MediaTek&hl=en'),
('"ZTE" - Google News', 'https://news.google.com/news/rss/search?q=ZTE&hl=en'),
('"huawei" - Google News', 'https://news.google.com/news/rss/search?q=huawei&hl=en'),
('"TSMC" - Google News', 'https://news.google.com/news/rss/search?q=TSMC&hl=en'),
('"Samsung" - Google News', 'https://news.google.com/news/rss/search?q=Samsung&&hl=en-US&gl=US&ceid=US:en'),
('"meta" - Google News', 'https://news.google.com/news/rss/search?q=meta&hl=en'),
('google新闻', 'https://news.google.com/news/rss/headlines/section/topic/TECHNOLOGY?ned=us&hl=en&gl=US'),
('microsoft', 'https://news.google.com/news/rss/search/section/q/microsoft/microsoft?hl=en&gl=US&ned=us'),
('amazone', 'https://news.google.com/news/rss/search/section/q/amazone/amazone?hl=en&gl=US&ned=us'),
('Google', 'https://news.google.com/news/rss/search/section/q/Google/Google?hl=en&gl=US&ned=us'),
('facebook', 'https://news.google.com/news/rss/search/section/q/facebook/facebook?hl=en&gl=US&ned=us'),
('apple', 'https://news.google.com/news/rss/search/section/q/apple/apple?hl=en&gl=US&ned=us'),
]

Last edited by fengli; 02-03-2023 at 10:52 PM.
fengli is offline   Reply With Quote
Advert
Old 02-04-2023, 02:46 AM   #3
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
Code:
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        try:
            br.open(url)
        except Exception as e:
            url = e.hdrs.get('location')
        soup = self.index_to_soup(url)
        link = soup.find('a', href=True)
        html = br.open(link['href']).read()
        pt = PersistentTemporaryFile('.html')
        pt.write(html)
        pt.close()
        return pt.name
try adding this to the recipe

Last edited by unkn0wn; 02-04-2023 at 02:50 AM.
unkn0wn is offline   Reply With Quote
Old 02-04-2023, 05:59 AM   #4
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
Quote:
Originally Posted by unkn0wn View Post
Code:
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        try:
            br.open(url)
        except Exception as e:
            url = e.hdrs.get('location')
        soup = self.index_to_soup(url)
        link = soup.find('a', href=True)
        html = br.open(link['href']).read()
        pt = PersistentTemporaryFile('.html')
        pt.write(html)
        pt.close()
        return pt.name
try adding this to the recipe
I have tested many times and found that it still fails,The content is all empty and the links are gone please take another look, thank you very much
Test recipe:
#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1675504328(BasicNewsRecipe):
title = 'Google news-ceshi'
oldest_article = 1
max_articles_per_feed = 100
auto_cleanup = True

articles_are_obfuscated = True

def get_obfuscated_article(self, url):
br = self.get_browser()
try:
br.open(url)
except Exception as e:
url = e.hdrs.get('location')
soup = self.index_to_soup(url)
link = soup.find('a', href=True)
html = br.open(link['href']).read()
pt = PersistentTemporaryFile('.html')
pt.write(html)
pt.close()
return pt.name

feeds = [
('"ASML" - Google News', 'https://news.google.com/news/rss/search?q=ASML&hl=en'),
('"intel" - Google News', 'https://news.google.com/news/rss/search?q=intel&hl=en'),
('amazon - Google News', 'http://news.google.com/news?hl=en&gl=us&q=amazon&um=1&ie=UTF-8&output=rss'),
('Ubuntu - Google News', 'http://news.google.com/news?hl=en&gl=us&q=Ubuntu&um=1&ie=UTF-8&output=rss'),
]

Google news-ceshi.recipe
Attached Thumbnails
Click image for larger version

Name:	屏幕截图 2023-02-04 180352.png
Views:	282
Size:	16.9 KB
ID:	199455  

Last edited by fengli; 02-04-2023 at 06:07 AM.
fengli is offline   Reply With Quote
Old 02-04-2023, 11:15 AM   #5
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
add these 2 at the top.
Code:
from calibre import browser
from calibre.ptempfile import PersistentTemporaryFile
i forgot to tell you this. try again once.
unkn0wn is offline   Reply With Quote
Advert
Old 02-04-2023, 12:16 PM   #6
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
Quote:
Originally Posted by unkn0wn View Post
add these 2 at the top.
Code:
from calibre import browser
from calibre.ptempfile import PersistentTemporaryFile
i forgot to tell you this. try again once.
Great, very impressive, it has worked, thank you very much big brother!👍👍👍
fengli is offline   Reply With Quote
Old 07-18-2024, 05:52 AM   #7
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
Crawl Google News RSS suddenly failed, please help, thank you very much

Quote:
Originally Posted by fengli View Post
Great, very impressive, it has worked, thank you very much big brother!👍👍👍
Please help me take a look, thank you very much


N001-economic.recipe


Error message:
Using user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36
Using proxies: {'http': '127.0.0.1:7890', 'https': '127.0.0.1:7890', 'ftp': 'http://127.0.0.1:7890'}
Failed to download article: Umstieg auf E-Autos: Autoindustrie: Mehr als jede zweite Firma plant Stellenabbau - Zeit Online from https://news.google.com/rss/articles...iYmF10gEA?oc=5
Traceback (most recent call last):
File "calibre\utils\threadpool.py", line 100, in run
File "calibre\web\feeds\news.py", line 1201, in fetch_obfuscated_article
File "<string>", line 23, in get_obfuscated_article
TypeError: 'NoneType' object is not subscriptable
fengli is offline   Reply With Quote
Old 07-19-2024, 03:35 AM   #8
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
yea looks like google feeds wont be working anymore.

they've made it harder.

Code:
{
	"POST": {
		"scheme": "https",
		"host": "news.google.com",
		"filename": "/_/DotsSplashUi/data/batchexecute",
		"query": {
			"rpcids": "Fbv4je",
			"source-path": "/rss/articles/CBMiX2h0dHBzOi8vd3d3LnplaXQuZGUvbmV3cy8yMDI0LTA3LzE4L2F1dG9pbmR1c3RyaWUtbWVoci1hbHMtamVkZS16d2VpdGUtZmlybWEtcGxhbnQtc3RlbGxlbmFiYmF10gEA",
			"f.sid": "-5052485330158874245",
			"bl": "boq_dotssplashserver_20240715.12_p1",
			"hl": "en-IN",
			"gl": "IN",
			"soc-app": "140",
			"soc-platform": "1",
			"soc-device": "1",
			"_reqid": "123443",
			"rt": "c"
		},
		"remote": {
			"Address": ""
		}
	}
}


response
)]}'

221
[["wrb.fr","Fbv4je","[\"garturlres\",\"https://www.zeit.de/news/2024-07/18/autoindustrie-mehr-als-jede-zweite-firma-plant-stellenabbau\"]",null,null,null,"generic"],["di",13],["af.httprm",13,"-7658855237455742109",108]]
25
[["e",4,null,null,257]]
unkn0wn is offline   Reply With Quote
Old 07-19-2024, 04:34 AM   #9
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
Quote:
Originally Posted by unkn0wn View Post
yea looks like google feeds wont be working anymore.

they've made it harder.

Code:
{
	"POST": {
		"scheme": "https",
		"host": "news.google.com",
		"filename": "/_/DotsSplashUi/data/batchexecute",
		"query": {
			"rpcids": "Fbv4je",
			"source-path": "/rss/articles/CBMiX2h0dHBzOi8vd3d3LnplaXQuZGUvbmV3cy8yMDI0LTA3LzE4L2F1dG9pbmR1c3RyaWUtbWVoci1hbHMtamVkZS16d2VpdGUtZmlybWEtcGxhbnQtc3RlbGxlbmFiYmF10gEA",
			"f.sid": "-5052485330158874245",
			"bl": "boq_dotssplashserver_20240715.12_p1",
			"hl": "en-IN",
			"gl": "IN",
			"soc-app": "140",
			"soc-platform": "1",
			"soc-device": "1",
			"_reqid": "123443",
			"rt": "c"
		},
		"remote": {
			"Address": ""
		}
	}
}


response
)]}'

221
[["wrb.fr","Fbv4je","[\"garturlres\",\"https://www.zeit.de/news/2024-07/18/autoindustrie-mehr-als-jede-zweite-firma-plant-stellenabbau\"]",null,null,null,"generic"],["di",13],["af.httprm",13,"-7658855237455742109",108]]
25
[["e",4,null,null,257]]
Well, I very much hope that you, the expert, will take the time to help with this, thank you very much
fengli is offline   Reply With Quote
Old 07-19-2024, 12:49 PM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,510
Karma: 136565488
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
@fengli is there any reason you cannot update to the latest version 7 of calibre?
JSWolf is offline   Reply With Quote
Old 07-19-2024, 08:23 PM   #11
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 89
Karma: 10
Join Date: Aug 2022
Device: PC
Quote:
Originally Posted by JSWolf View Post
@fengli is there any reason you cannot update to the latest version 7 of calibre?
Hello, I have been using the latest version of calibre, this post was posted before, just now updated the problem haha!
fengli is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre: Globe and Mail Recipe now fails xxxronjames Recipes 1 11-08-2018 04:25 AM
Kindle voyage slowed to a crawl after update cerem0ny Amazon Kindle 15 03-02-2016 02:41 PM
Built in calibre recipe broken : Prospect Magazine duluoz Recipes 1 05-24-2012 09:19 AM
Calibre rss recipe -- <em> tag in article titles? TonyDeWonderful Recipes 2 03-15-2011 01:23 PM
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 12:56 PM


All times are GMT -4. The time now is 05:42 PM.


MobileRead.com is a privately owned, operated and funded community.