Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-03-2022, 12:31 PM   #1
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
Foreign affairs cover fails

Quote:
self.cover_url = soup.find(**classes('subscribe-callout-image'))['data-src'].split("|")[-1]
self.cover_url = self.cover_url.split('?')[0]
self.cover_url = self.cover_url.replace('_webp_issue_small_2x', '_webp_issue_large_2x')
https://github.com/kovidgoyal/calibr...affairs.recipe

removed line 156 and changed 157 replace tags (or just replace('small', 'large')

https://cdn-live.foreignaffairs.com/...over_large.jpg .webp ? itok=LUFlkUCK

if we remove .webp link is not working..

Last edited by unkn0wn; 05-03-2022 at 12:33 PM.
unkn0wn is offline   Reply With Quote
Old 05-03-2022, 02:24 PM   #2
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
MIT tech review.. cover image fails to load

Code:
self.cover_url = soup.find(
            "div", attrs={"class":lambda name: name.startswith("magazineHero__image") if name else False}).find(
                "img",
                src=True, attrs = {"class":lambda x: x.startswith('image__img') if x else False}
                )['src']
absurl not required and img class needs to be defined

also remove_attributes = ['height', 'width']

Last edited by unkn0wn; 05-03-2022 at 02:31 PM.
unkn0wn is offline   Reply With Quote
Advert
Old 05-03-2022, 02:33 PM   #3
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...agazine.recipe

Cover fails
Code:
def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup('https://www.india-seminar.com/')
        citem = soup.find('img', src = lambda x: x and 'covers' in x)
        if citem:
            cover_url = "https://www.india-seminar.com/" + citem['src']
        return cover_url
and

remove_attributes = ['style', 'height', 'width']
unkn0wn is offline   Reply With Quote
Old 06-01-2022, 02:20 AM   #4
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
India Seminar
https://github.com/kovidgoyal/calibr...agazine.recipe

import re

and add these lines (from 42) to skip url if tag to string is empty. At present it returns without titles in ToC
Quote:
title = self.tag_to_string(a)
title = re.sub('\s+',' ', title)
empty = ' '
if title is empty:
url = ''

Last edited by unkn0wn; 06-01-2022 at 02:24 AM.
unkn0wn is offline   Reply With Quote
Old 06-01-2022, 11:57 AM   #5
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...s_today.recipe
business today default magazine page is for next edition.. and they keep adding articles.. I changed it to choose present edition and not the future edition thats still under construction.

from line 28
Code:
def parse_index(self):
        soup = self.index_to_soup('https://www.businesstoday.in/magazine')
        issue = soup.find(attrs={'class': 'view-id-latest_issue_magzine'})
        a = issue.findAll('a', href=lambda x: x and x.startswith('/magazine/issue/'))[1]
        url = a['href']
        self.log('issue =', url)
        soup = self.index_to_soup('https://www.businesstoday.in' + url)
        
        tag = soup.find(attrs={'class': 'issue-image'})
        if tag:
            self.cover_url = tag.find('img')['src']
        section = None
        sections = {}
and
Quote:
extra_css = 'a[href^="https://www.businesstoday.in/videos"]{display: none;}'
unkn0wn is offline   Reply With Quote
Advert
Old 06-02-2022, 01:39 AM   #6
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...merican.recipe

scientific american cover and tags
line 14
Code:
keep_classes = {'article-header', 'article-content',
                'article-media', 'article-author', 'article-text', 
                'feature-article--header', 'feature-article--header-title', 
                'opinion-article__header-title', 'author-bio' }
remove_classes = {'aside-banner', 'moreToExplore', 'article-footer', 'flex-column--25', 'article-author__suggested'}
remove line 60 and add below lines after line 63 (there's better cover in issue page)
Code:
        select = Select(self.index_to_soup(url, as_tree=True))
        cover = [x.get('src', '') for x in select('main .product-detail__image img')][0].split('?')[0]
        self.cover_url = cover + '?w=800'

        feeds = []
the + '?w=800' is to reduce the size.. the actual image is like 8k resolution - 1mb file
and masthead_url = 'https://static.scientificamerican.com/sciam/assets/Image/newsletter/salogo.png'
unkn0wn is offline   Reply With Quote
Old 07-03-2022, 05:15 AM   #7
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
foreign affairs
the comments section and issue section articles are the same.. I think adding ignore duplicates is much easier..
Quote:
ignore_duplicate_articles = {'title', 'url'}
remove_empty_feeds = True
foreign policy cover - it loads older edition cover image.. change
Quote:
img = soup.find('img', attrs={'data-lazy-src': lambda x: x and '-cover' in x})
self.cover_url = img['data-lazy-src']
unkn0wn is offline   Reply With Quote
Old 07-03-2022, 05:18 AM   #8
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
Nautilus https://github.com/kovidgoyal/calibr...autilus.recipe
COVER method change.. i think oldest article needs to be 60
oldest_article = 60 # days
Code:
def get_cover_url(self):
        soup = self.index_to_soup('https://www.presspassnow.com/nautilus/issues/')
        div = soup.find('div', **classes('image-fade_in_back'))
        if div:
            self.cover_url = div.find('img', src=True)['src']
        return getattr(self, 'cover_url', self.cover_url)
unkn0wn is offline   Reply With Quote
Old 07-03-2022, 05:21 AM   #9
unkn0wn
Fanatic
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 543
Karma: 82944
Join Date: May 2021
Device: kindle
Swarajya mag https://github.com/kovidgoyal/calibr...warajya.recipe

adding description
Code:
if url.startswith('/'):
                url = 'https://swarajyamag.com' + url
            title = self.tag_to_string(a)
            d = a.find_previous_sibling('a', **classes('_2nEd_'))
            if d:
                desc = 'By ' + self.tag_to_string(d) 
            self.log(title, ' at ', url, '\n', desc)
            ans.append({'title': title, 'url': url, 'description': desc})
        return [('Articles', ans)]
unkn0wn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Foreign Affairs recipe broken? vikshek Recipes 5 09-06-2022 11:05 AM
Foreign Affairs recipe not working iwayasu Recipes 3 08-19-2019 09:09 AM
Foreign Affairs recipe broken cornspicious Recipes 29 02-06-2019 07:00 AM
Foreign Affairs fails to fetch tamur93 Recipes 6 07-17-2015 11:58 AM
Foreign Affairs-Free tdonline Recipes 2 03-11-2012 10:51 PM


All times are GMT -4. The time now is 07:45 AM.


MobileRead.com is a privately owned, operated and funded community.