04-17-2024, 03:52 AM | #1 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
Help to finish the recipe of my favorite news site
Hi all,
I am trying to create a working news recipe for elcorreo.com a spanish news site. There is a built in recipe but it just downloads the links and that is it. Not even the cover. link to official news site: www.elcorreo.com I managed to find that if you open any article and replace the ".html" with "_amp.html" you can open the 'immersive reader' in edge to read the full article. And inspecting the "_amp.html" site you can find a script json with the content of the article. So I started to create a custom one but I can only reach so far. I managed to retrieve the cover, title, subtitle and main image and delete the rest that is not relevant but I need help to add the content of the article by replacing the URLs of the articles to search in and finding the Script tag that contains a JSON with the article content. This is the code that I have so far: Spoiler:
I really tried with the API of calibre and reviewing other recipes but cannot manage to do it. I think I need to do something in the preprocesshtml function but no clue, really. Can someone with extensive recipe knowledge help? |
04-17-2024, 07:24 AM | #2 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
I think I found how to replace the desktop URL with the mobile URL adding this code:
#replace desktop url with mobile url def get_article_url(self, article): desktopUrl = BasicNewsRecipe.get_article_url(self, article) mobileUrl = desktopUrl.replace(".html", "_amp.html") return mobileUrl But now I can't retrieve the article image and I am still missing the article content in json inside the script tag. |
04-17-2024, 08:05 AM | #3 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
builtin recipe isn't workimg?
|
04-17-2024, 08:49 AM | #4 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
|
04-18-2024, 03:59 AM | #5 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
https://github.com/kovidgoyal/calibr...7b66c77715216f
I just tested this and output is too large >120Mb. Help me hash out some of the feeds. There so many articles, just in past 24 hours from this website. Code:
feeds = [ ('Portada', 'http://www.elcorreo.com/rss/atom/portada'), ('Mundo', 'http://www.elcorreo.com/rss/atom/?section=internacional'), ('Bizkaia', 'http://www.elcorreo.com/rss/atom/?section=bizkaia'), ('Opinión', 'https://www.elcorreo.com/rss/atom/?section=opinion'), ('Internacional', 'https://www.elcorreo.com/rss/atom/?section=internacional'), ('Ciencia', 'https://www.elcorreo.com/rss/atom/?section=ciencia'), ('Guipuzkoa', 'http://www.elcorreo.com/rss/atom/?section=gipuzkoa'), ('Araba', 'http://www.elcorreo.com/rss/atom/?section=araba'), ('La Rioja', 'http://www.elcorreo.com/rss/atom/?section=larioja'), ('Miranda', 'http://www.elcorreo.com/rss/atom/?section=miranda'), ('Economía', 'http://www.elcorreo.com/rss/atom/?section=economia'), ('Culturas', 'http://www.elcorreo.com/rss/atom/?section=culturas'), ('Politica', 'http://www.elcorreo.com/rss/atom/?section=politica'), ('De tiendas', 'https://www.elcorreo.com/rss/atom/?section=de-tiendas'), ('Deportes', 'https://www.elcorreo.com/rss/atom/?section=deportes'), ('Elecciones', 'https://www.elcorreo.com/rss/atom/?section=elecciones'), ('Sociedad', 'https://www.elcorreo.com/rss/atom/?section=sociedad'), ('Vivir', 'https://www.elcorreo.com/rss/atom/?section=vivir'), ('Tecnología', 'http://www.elcorreo.com/rss/atom/?section=tecnologia'), ('Gente - Estilo', 'http://www.elcorreo.com/rss/atom/?section=gente-estilo'), ('Planes', 'http://www.elcorreo.com/rss/atom/?section=planes'), ('Athletic', 'http://www.elcorreo.com/rss/atom/?section=athletic'), ('Alavés', 'http://www.elcorreo.com/rss/atom/?section=alaves'), ('Bilbao Basket', 'http://www.elcorreo.com/rss/atom/?section=bilbaobasket'), ('Baskonia', 'http://www.elcorreo.com/rss/atom/?section=baskonia'), ('Deportes', 'http://www.elcorreo.com/rss/atom/?section=deportes'), ('Jaiak', 'http://www.elcorreo.com/rss/atom/?section=jaiak'), ('La Blanca', 'http://www.elcorreo.com/rss/atom/?section=la-blanca-vitoria'), ('Aste Nagusia', 'http://www.elcorreo.com/rss/atom/?section=aste-nagusia-bilbao'), ('Semana Santa', 'http://www.elcorreo.com/rss/atom/?section=semana-santa'), ('Festivales', 'http://www.elcorreo.com/rss/atom/?section=festivales') ] |
04-18-2024, 10:55 AM | #6 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
Hi,
I left the most import ones. There can't be that many articles in 24hours. I guess a lot are duplicates and are in more than one feed. This is not a big newspaper. Also maybe images are not optimized. feeds = [ ('Portada', 'http://www.elcorreo.com/rss/atom/portada'), ('Mundo', 'http://www.elcorreo.com/rss/atom/?section=internacional'), ('Bizkaia', 'http://www.elcorreo.com/rss/atom/?section=bizkaia'), ('Opinión', 'https://www.elcorreo.com/rss/atom/?section=opinion'), ('Internacional', 'https://www.elcorreo.com/rss/atom/?section=internacional'), ('Ciencia', 'https://www.elcorreo.com/rss/atom/?section=ciencia'), ('Economía', 'http://www.elcorreo.com/rss/atom/?section=economia'), ('Politica', 'http://www.elcorreo.com/rss/atom/?section=politica'), ('Deportes', 'https://www.elcorreo.com/rss/atom/?section=deportes'), ('Tecnología', 'http://www.elcorreo.com/rss/atom/?section=tecnologia'), ('Deportes', 'http://www.elcorreo.com/rss/atom/?section=deportes'), ] |
04-18-2024, 11:35 AM | #7 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
Btw, I just had a look at the new Recipe code and used it to download the newspaper. I don't know how you did it but it looks perfect. Thank you very much!
And you were right, there are a lot of articles and they are not duplicated. Any ideas how we could reduce the size to be sent via email to the kindle? Maybe optimize images? Thanks again! really apretiate it. |
04-18-2024, 12:20 PM | #8 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
Ok by adding the following I managed to downsize to epub to decent size to send via email:
max_articles_per_feed = 10 #articles compress_news_images = True |
06-27-2024, 03:00 AM | #9 |
Junior Member
Posts: 7
Karma: 10
Join Date: Apr 2024
Device: Kindle paperwhite 2022
|
Hello,
Sorry for reopening the thread but the built-in recipe for 'El correo' stopped working. I get the following error message: Spoiler:
Last edited by theducks; 06-27-2024 at 04:14 AM. Reason: SPOILER LOG files |
Tags |
elcorreo, elcorreo.com, recipe, recipe broken, recipe request |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Foreign Affairs recipe for news from the site (not the magazine) | mendesitba | Recipes | 0 | 12-08-2015 10:14 PM |
NHK Easy News (Japanese News site) | beemanfunk | Recipes | 1 | 12-25-2014 04:44 AM |
IDG.se - Recipe for swedish news site | khromov | Recipes | 3 | 09-18-2011 10:40 PM |
Is there a recipe for "Le Figaro", a french news site? | mg666 | Recipes | 0 | 05-12-2011 06:50 AM |