Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-03-2022, 11:31 AM   #1
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
Foreign affairs cover fails

Quote:
self.cover_url = soup.find(**classes('subscribe-callout-image'))['data-src'].split("|")[-1]
self.cover_url = self.cover_url.split('?')[0]
self.cover_url = self.cover_url.replace('_webp_issue_small_2x', '_webp_issue_large_2x')
https://github.com/kovidgoyal/calibr...affairs.recipe

removed line 156 and changed 157 replace tags (or just replace('small', 'large')

https://cdn-live.foreignaffairs.com/...over_large.jpg .webp ? itok=LUFlkUCK

if we remove .webp link is not working..

Last edited by unkn0wn; 05-03-2022 at 11:33 AM.
unkn0wn is offline   Reply With Quote
Old 05-03-2022, 01:24 PM   #2
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
MIT tech review.. cover image fails to load

Code:
self.cover_url = soup.find(
            "div", attrs={"class":lambda name: name.startswith("magazineHero__image") if name else False}).find(
                "img",
                src=True, attrs = {"class":lambda x: x.startswith('image__img') if x else False}
                )['src']
absurl not required and img class needs to be defined

also remove_attributes = ['height', 'width']

Last edited by unkn0wn; 05-03-2022 at 01:31 PM.
unkn0wn is offline   Reply With Quote
Advert
Old 05-03-2022, 01:33 PM   #3
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...agazine.recipe

Cover fails
Code:
def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup('https://www.india-seminar.com/')
        citem = soup.find('img', src = lambda x: x and 'covers' in x)
        if citem:
            cover_url = "https://www.india-seminar.com/" + citem['src']
        return cover_url
and

remove_attributes = ['style', 'height', 'width']
unkn0wn is offline   Reply With Quote
Old 06-01-2022, 01:20 AM   #4
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
India Seminar
https://github.com/kovidgoyal/calibr...agazine.recipe

import re

and add these lines (from 42) to skip url if tag to string is empty. At present it returns without titles in ToC
Quote:
title = self.tag_to_string(a)
title = re.sub('\s+',' ', title)
empty = ' '
if title is empty:
url = ''

Last edited by unkn0wn; 06-01-2022 at 01:24 AM.
unkn0wn is offline   Reply With Quote
Old 06-01-2022, 10:57 AM   #5
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...s_today.recipe
business today default magazine page is for next edition.. and they keep adding articles.. I changed it to choose present edition and not the future edition thats still under construction.

from line 28
Code:
def parse_index(self):
        soup = self.index_to_soup('https://www.businesstoday.in/magazine')
        issue = soup.find(attrs={'class': 'view-id-latest_issue_magzine'})
        a = issue.findAll('a', href=lambda x: x and x.startswith('/magazine/issue/'))[1]
        url = a['href']
        self.log('issue =', url)
        soup = self.index_to_soup('https://www.businesstoday.in' + url)
        
        tag = soup.find(attrs={'class': 'issue-image'})
        if tag:
            self.cover_url = tag.find('img')['src']
        section = None
        sections = {}
and
Quote:
extra_css = 'a[href^="https://www.businesstoday.in/videos"]{display: none;}'
unkn0wn is offline   Reply With Quote
Advert
Old 06-02-2022, 12:39 AM   #6
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
https://github.com/kovidgoyal/calibr...merican.recipe

scientific american cover and tags
line 14
Code:
keep_classes = {'article-header', 'article-content',
                'article-media', 'article-author', 'article-text', 
                'feature-article--header', 'feature-article--header-title', 
                'opinion-article__header-title', 'author-bio' }
remove_classes = {'aside-banner', 'moreToExplore', 'article-footer', 'flex-column--25', 'article-author__suggested'}
remove line 60 and add below lines after line 63 (there's better cover in issue page)
Code:
        select = Select(self.index_to_soup(url, as_tree=True))
        cover = [x.get('src', '') for x in select('main .product-detail__image img')][0].split('?')[0]
        self.cover_url = cover + '?w=800'

        feeds = []
the + '?w=800' is to reduce the size.. the actual image is like 8k resolution - 1mb file
and masthead_url = 'https://static.scientificamerican.com/sciam/assets/Image/newsletter/salogo.png'
unkn0wn is offline   Reply With Quote
Old 07-03-2022, 04:15 AM   #7
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
foreign affairs
the comments section and issue section articles are the same.. I think adding ignore duplicates is much easier..
Quote:
ignore_duplicate_articles = {'title', 'url'}
remove_empty_feeds = True
foreign policy cover - it loads older edition cover image.. change
Quote:
img = soup.find('img', attrs={'data-lazy-src': lambda x: x and '-cover' in x})
self.cover_url = img['data-lazy-src']
unkn0wn is offline   Reply With Quote
Old 07-03-2022, 04:18 AM   #8
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
Nautilus https://github.com/kovidgoyal/calibr...autilus.recipe
COVER method change.. i think oldest article needs to be 60
oldest_article = 60 # days
Code:
def get_cover_url(self):
        soup = self.index_to_soup('https://www.presspassnow.com/nautilus/issues/')
        div = soup.find('div', **classes('image-fade_in_back'))
        if div:
            self.cover_url = div.find('img', src=True)['src']
        return getattr(self, 'cover_url', self.cover_url)
unkn0wn is offline   Reply With Quote
Old 07-03-2022, 04:21 AM   #9
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
Swarajya mag https://github.com/kovidgoyal/calibr...warajya.recipe

adding description
Code:
if url.startswith('/'):
                url = 'https://swarajyamag.com' + url
            title = self.tag_to_string(a)
            d = a.find_previous_sibling('a', **classes('_2nEd_'))
            if d:
                desc = 'By ' + self.tag_to_string(d) 
            self.log(title, ' at ', url, '\n', desc)
            ans.append({'title': title, 'url': url, 'description': desc})
        return [('Articles', ans)]
unkn0wn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Foreign Affairs recipe broken? vikshek Recipes 5 09-06-2022 10:05 AM
Foreign Affairs recipe not working iwayasu Recipes 3 08-19-2019 08:09 AM
Foreign Affairs recipe broken cornspicious Recipes 29 02-06-2019 06:00 AM
Foreign Affairs fails to fetch tamur93 Recipes 6 07-17-2015 10:58 AM
Foreign Affairs-Free tdonline Recipes 2 03-11-2012 09:51 PM


All times are GMT -4. The time now is 01:23 AM.


MobileRead.com is a privately owned, operated and funded community.