05-03-2022, 12:31 PM | #1 | |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
Foreign affairs cover fails
Quote:
removed line 156 and changed 157 replace tags (or just replace('small', 'large') https://cdn-live.foreignaffairs.com/...over_large.jpg .webp ? itok=LUFlkUCK if we remove .webp link is not working.. Last edited by unkn0wn; 05-03-2022 at 12:33 PM. |
|
05-03-2022, 02:24 PM | #2 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
MIT tech review.. cover image fails to load
Code:
self.cover_url = soup.find( "div", attrs={"class":lambda name: name.startswith("magazineHero__image") if name else False}).find( "img", src=True, attrs = {"class":lambda x: x.startswith('image__img') if x else False} )['src'] also remove_attributes = ['height', 'width'] Last edited by unkn0wn; 05-03-2022 at 02:31 PM. |
Advert | |
|
05-03-2022, 02:33 PM | #3 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
https://github.com/kovidgoyal/calibr...agazine.recipe
Cover fails Code:
def get_cover_url(self): cover_url = None soup = self.index_to_soup('https://www.india-seminar.com/') citem = soup.find('img', src = lambda x: x and 'covers' in x) if citem: cover_url = "https://www.india-seminar.com/" + citem['src'] return cover_url remove_attributes = ['style', 'height', 'width'] |
06-01-2022, 02:20 AM | #4 | |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
India Seminar
https://github.com/kovidgoyal/calibr...agazine.recipe import re and add these lines (from 42) to skip url if tag to string is empty. At present it returns without titles in ToC Quote:
Last edited by unkn0wn; 06-01-2022 at 02:24 AM. |
|
06-01-2022, 11:57 AM | #5 | |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
https://github.com/kovidgoyal/calibr...s_today.recipe
business today default magazine page is for next edition.. and they keep adding articles.. I changed it to choose present edition and not the future edition thats still under construction. from line 28 Code:
def parse_index(self): soup = self.index_to_soup('https://www.businesstoday.in/magazine') issue = soup.find(attrs={'class': 'view-id-latest_issue_magzine'}) a = issue.findAll('a', href=lambda x: x and x.startswith('/magazine/issue/'))[1] url = a['href'] self.log('issue =', url) soup = self.index_to_soup('https://www.businesstoday.in' + url) tag = soup.find(attrs={'class': 'issue-image'}) if tag: self.cover_url = tag.find('img')['src'] section = None sections = {} Quote:
|
|
Advert | |
|
06-02-2022, 01:39 AM | #6 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
https://github.com/kovidgoyal/calibr...merican.recipe
scientific american cover and tags line 14 Code:
keep_classes = {'article-header', 'article-content', 'article-media', 'article-author', 'article-text', 'feature-article--header', 'feature-article--header-title', 'opinion-article__header-title', 'author-bio' } remove_classes = {'aside-banner', 'moreToExplore', 'article-footer', 'flex-column--25', 'article-author__suggested'} Code:
select = Select(self.index_to_soup(url, as_tree=True)) cover = [x.get('src', '') for x in select('main .product-detail__image img')][0].split('?')[0] self.cover_url = cover + '?w=800' feeds = [] and masthead_url = 'https://static.scientificamerican.com/sciam/assets/Image/newsletter/salogo.png' |
07-03-2022, 05:15 AM | #7 | ||
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
foreign affairs
the comments section and issue section articles are the same.. I think adding ignore duplicates is much easier.. Quote:
Quote:
|
||
07-03-2022, 05:18 AM | #8 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
Nautilus https://github.com/kovidgoyal/calibr...autilus.recipe
COVER method change.. i think oldest article needs to be 60 oldest_article = 60 # days Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.presspassnow.com/nautilus/issues/') div = soup.find('div', **classes('image-fade_in_back')) if div: self.cover_url = div.find('img', src=True)['src'] return getattr(self, 'cover_url', self.cover_url) |
07-03-2022, 05:21 AM | #9 |
Fanatic
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
|
Swarajya mag https://github.com/kovidgoyal/calibr...warajya.recipe
adding description Code:
if url.startswith('/'): url = 'https://swarajyamag.com' + url title = self.tag_to_string(a) d = a.find_previous_sibling('a', **classes('_2nEd_')) if d: desc = 'By ' + self.tag_to_string(d) self.log(title, ' at ', url, '\n', desc) ans.append({'title': title, 'url': url, 'description': desc}) return [('Articles', ans)] |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Foreign Affairs recipe broken? | vikshek | Recipes | 5 | 09-06-2022 11:05 AM |
Foreign Affairs recipe not working | iwayasu | Recipes | 3 | 08-19-2019 09:09 AM |
Foreign Affairs recipe broken | cornspicious | Recipes | 29 | 02-06-2019 07:00 AM |
Foreign Affairs fails to fetch | tamur93 | Recipes | 6 | 07-17-2015 11:58 AM |
Foreign Affairs-Free | tdonline | Recipes | 2 | 03-11-2012 10:51 PM |