Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-07-2024, 05:58 AM   #1
dmiming
Junior Member
dmiming began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Sep 2024
Device: kindle oasis 2
Fetch South China Morning Post Magazine - incomplete content

I have been using the scmp.recipe in the recipe to scrape the South China Morning Post for several months now, but recently some issues have started to arise. A brief summary is as follows:

Incomplete Content: The content of each document is not fully retrieved, with some parts missing. Upon checking the source feeds (e.g., https://www.scmp.com/rss/2/feed), it appears that, much like the situation with The Economist Espresso servral months ago, the full content is not displayed. I’m uncertain if there is any other way to resolve this.

Invalid Content: The scraped content often contains irrelevant entries such as "Advertisement." I wonder if there is a way to filter such content during the scraping process.
dmiming is offline   Reply With Quote
Old 09-08-2024, 04:01 AM   #2
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
they've changed to __NEXT_DATA__ which is similar to economist. We can use part of econ code here too.

i will fix it when i have time..
unkn0wn is offline   Reply With Quote
Advert
Old 09-08-2024, 10:44 AM   #3
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...95637aa856352a
unkn0wn is offline   Reply With Quote
Old 09-08-2024, 11:49 PM   #4
dmiming
Junior Member
dmiming began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Sep 2024
Device: kindle oasis 2
Thank you for your quick response.
But the content is incomplete yet.
for example: https://www.scmp.com/news/hong-kong/...ource=rss_feed
when i browse it online at my computer of HK, it is complete. But when i fetch it on the vps of usa,the content is incomplete.
I don't khow why.
dmiming is offline   Reply With Quote
Old 09-09-2024, 02:12 AM   #5
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...95637aa856352a

it'll work now.
unkn0wn is offline   Reply With Quote
Advert
Old 09-09-2024, 02:23 AM   #6
dmiming
Junior Member
dmiming began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Sep 2024
Device: kindle oasis 2
It's ok.
Thank you very much.
dmiming is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
South China Morning Post (SCMP) recipe only headers but no articles hongho71 Recipes 6 09-27-2022 04:38 AM
South China Morning Post recipe not working cupcakeg2 Recipes 0 08-11-2015 01:00 PM
South China Morning Post recipe broke? mwerth1 Recipes 1 09-05-2012 01:04 AM
South China Morning Post (SCMP) - Hong Kong - Fixed llam Recipes 0 07-02-2011 11:48 PM
Recipe Request - South China Morning Post mobilewilier Calibre 1 05-03-2010 12:42 AM


All times are GMT -4. The time now is 02:47 PM.


MobileRead.com is a privately owned, operated and funded community.