Connoisseur
Posts: 78
Karma: 100000
Join Date: Aug 2015
Device: Kindle Keyboard + Kindle Voyage WiFi + Kindle PW11 Kids
Built-in Russian recipes
FIXED RUSSAIN RECIPES
Improved built-in
3DNews: Daily Digital Digest recipe (
3dnews.recipe ): HTTPS, revised RSS feeds.
Spoiler :
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class News(BasicNewsRecipe):
title = '3DNews: Daily Digital Digest'
__author__ = 'bugmen00t'
description = '\u041D\u0435\u0437\u0430\u0432\u0438\u0441\u0438\u043C\u043E\u0435 \u0440\u043E\u0441\u0441\u0438\u0439\u0441\u043A\u043E\u0435 \u043E\u043D\u043B\u0430\u0439\u043D-\u0438\u0437\u0434\u0430\u043D\u0438\u0435, \u043F\u043E\u0441\u0432\u044F\u0449\u0435\u043D\u043D\u043E\u0435 \u0446\u0438\u0444\u0440\u043E\u0432\u044B\u043C \u0442\u0435\u0445\u043D\u043E\u043B\u043E\u0433\u0438\u044F\u043C'
publisher = '3DNews'
category = 'news'
cover_url = u'http://www.3dnews.ru/assets/images/logo.png'
language = 'ru'
auto_cleanup = True
oldest_article = 15
max_articles_per_feed = 60
feeds = [
('\u0412\u0430\u0436\u043D\u044B\u0435 \u043D\u043E\u0432\u043E\u0441\u0442\u0438','https://3dnews.ru/breaking/rss/'),
('\u0412\u0441\u0435 \u043D\u043E\u0432\u043E\u0441\u0442\u0438','https://3dnews.ru/news/rss/'),
('\u041D\u043E\u0432\u043E\u0441\u0442\u0438 - \u0445\u0430\u0440\u0434','https://3dnews.ru/hardware-news/rss'),
('\u041D\u043E\u0432\u043E\u0441\u0442\u0438 - \u0433\u0430\u0434\u0436\u0435\u0442\u044B','https://3dnews.ru/gadgets/rss/'),
('\u041D\u043E\u0432\u043E\u0441\u0442\u0438 - \u0441\u043E\u0444\u0442','https://3dnews.ru/software-news/rss/'),
('\u041D\u043E\u0432\u043E\u0441\u0442\u0438 - \u0438\u0433\u0440\u044B','https://3dnews.ru/games/rss/'),
('\u0423\u043C\u043D\u044B\u0435 \u0412\u0435\u0449\u0438','https://3dnews.ru/smart-things/rss/'),
('\u0410\u043D\u0430\u043B\u0438\u0442\u0438\u043A\u0430','https://3dnews.ru/editorial/rss/'),
('\u041F\u0440\u043E\u0446\u0435\u0441\u0441\u043E\u0440\u044B \u0438 \u043F\u0430\u043C\u044F\u0442\u044C','https://3dnews.ru/cpu/rss/'),
('\u041C\u0430\u0442\u0435\u0440\u0438\u043D\u0441\u043A\u0438\u0435 \u043F\u043B\u0430\u0442\u044B','https://3dnews.ru/motherboard/rss/'),
('\u041A\u043E\u0440\u043F\u0443\u0441\u0430, \u0411\u041F \u0438 \u043E\u0445\u043B\u0430\u0436\u0434\u0435\u043D\u0438\u0435','https://3dnews.ru/cooling/rss/'),
('\u0412\u0438\u0434\u0435\u043E\u043A\u0430\u0440\u0442\u044B','https://3dnews.ru/video/rss/'),
('\u041C\u043E\u043D\u0438\u0442\u043E\u0440\u044B \u0438 \u043F\u0440\u043E\u0435\u043A\u0442\u043E\u0440\u044B','https://3dnews.ru/display/rss/'),
('\u041D\u0430\u043A\u043E\u043F\u0438\u0442\u0435\u043B\u0438','https://3dnews.ru/storage/rss/'),
('\u0426\u0438\u0444\u0440\u043E\u0432\u043E\u0439 \u0430\u0432\u0442\u043E\u043C\u043E\u0431\u0438\u043B\u044C','https://3dnews.ru/auto/rss/'),
('\u0421\u043E\u0442\u043E\u0432\u0430\u044F \u0441\u0432\u044F\u0437\u044C','https://3dnews.ru/phone/rss/'),
('\u041F\u0435\u0440\u0438\u0444\u0435\u0440\u0438\u044F','https://3dnews.ru/peripheral/rss/'),
('\u041D\u043E\u0443\u0442\u0431\u0443\u043A\u0438 \u0438 \u041F\u041A','https://3dnews.ru/mobile/rss/'),
('\u041F\u043B\u0430\u043D\u0448\u0435\u0442\u044B','https://3dnews.ru/tablets/rss/'),
('\u0417\u0432\u0443\u043A \u0438 \u0430\u043A\u0443\u0441\u0442\u0438\u043A\u0430','https://3dnews.ru/multimedia/rss/'),
('\u0426\u0438\u0444\u0440\u043E\u0432\u043E\u0435 \u0444\u043E\u0442\u043E \u0438 \u0432\u0438\u0434\u0435\u043E','https://3dnews.ru/digital/rss/'),
('\u0421\u0435\u0442\u0438 \u0438 \u043A\u043E\u043C\u043C\u0443\u043D\u0438\u043A\u0430\u0446\u0438\u0438','https://3dnews.ru/communication/rss/'),
('\u041F\u0440\u043E\u0433\u0440\u0430\u043C\u043C\u043D\u043E\u0435 \u043E\u0431\u0435\u0441\u043F\u0435\u0447\u0435\u043D\u0438\u0435','https://3dnews.ru/software/rss/'),
('Off-\u0441\u044F\u043D\u043A\u0430','https://3dnews.ru/offsyanka/rss/'),
('\u041C\u0430\u0441\u0442\u0435\u0440\u0441\u043A\u0430\u044F','https://3dnews.ru/workshop/rss/'),
('ServerNews - \u0441\u0442\u0430\u0442\u044C\u0438','https://servernews.ru/rss'),
('ServerNews - \u043D\u043E\u0432\u043E\u0441\u0442\u0438','https://servernews.ru/news/rss')
]
def print_version(self, url):
return url + '/print'
Improved built-in
7x7 recipe (
7x7.recipe ): new domain, revised RSS feeds. Unable to fetch lazyloaded images, so text-only output for now
Bonus:
favicon
Spoiler :
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class News(BasicNewsRecipe):
title = '7x7'
__author__ = 'bugmen00t'
description = '7x7 - \u043C\u0435\u0436\u0440\u0435\u0433\u0438\u043E\u043D\u0430\u043B\u044C\u043D\u044B\u0439 \u0438\u043D\u0442\u0435\u0440\u043D\u0435\u0442-\u0436\u0443\u0440\u043D\u0430\u043B'
publisher = '7x7-journal.ru'
category = 'news'
cover_url = u'https://semnasem.org/site-specific/7x7-journal.ru/images/frontend/logo/logo-header.svg'
language = 'ru'
no_stylesheets = True
remove_javascript = True
auto_cleanup = False
oldest_article = 14
max_articles_per_feed = 30
feeds = [
('7x7', 'https://semnasem.org/rss/default.xml'),
]
remove_tags_before = dict(name='article',attrs={'class': 'article'})
remove_tags_after = dict(name='div', attrs={'class': 'article__footer-wrap'})
remove_tags = [
dict(name='div', attrs={'class': 'article__footer-wrap'}),
dict(name='div', attrs={'class': 'promolink-widget'})
]
Fixed built-in
Izvestia recipe (
izvestia.recipe ): HTTPS, revised RSS feeds.
Spoiler :
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
__license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
izvestia.ru
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Izvestia(BasicNewsRecipe):
title = 'Izvestia'
__author__ = 'Darko Miletic (with fixes by bugmen00t)'
description = 'News from Russia'
publisher = 'Izvestia'
category = 'news, politics, Russia'
oldest_article = 5
max_articles_per_feed = 100
auto_cleanup = False
no_stylesheets = True
use_embedded_content = False
language = 'ru'
publication_type = 'newspaper'
cover_url = u'https://cdn.iz.ru/profiles/portal/themes/purple/images/favicons/apple-icon-180x180.png'
remove_tags_before = dict(name='div', attrs={'role': 'article'})
remove_tags_after = dict(name='div', attrs={'role': 'article'})
remove_tags = [
dict(name='div', attrs={'class': 'article_page__left__top__views'}),
dict(name='div', attrs={'class': 'hash_tags'}),
dict(name='div', attrs={'class': 'get_yandex_subscription_links'}),
dict(name='div', attrs={'class': 'article_buttons_block'}),
dict(name='div', attrs={'class': 'rubrics_btn'}),
dict(name='div', attrs={'class': 'hidden'}),
dict(name='div', attrs={'class': 'share_bottom2'}),
dict(name='div', attrs={'class': 'recommendation-block'}),
dict(name='div', attrs={'class': 'plug-text'}),
dict(name='div', attrs={'class': 'get_news_link'}),
dict(name='div', attrs={'itemprop': 'address'})
]
feeds = [
(u'Новости', u'https://iz.ru/xml/rss/all.xml')]
def preprocess_html(self, soup):
for img in soup.findAll('img', attrs={'data-src': True}):
img['src'] = img['data-src']
return soup
Fixed built-in
Kommersant recipe (
kommersant.recipe ): HTTPS, revised RSS feeds. Couldn't figure out how to keep images though, will be grateful if anyone could
Spoiler :
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
__license__ = 'GPL v3'
__copyright__ = '2010-2013, Darko Miletic <darko.miletic at gmail.com>'
'''
www.kommersant.ru
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Kommersant_ru(BasicNewsRecipe):
title = 'Kommersant'
__author__ = 'Darko Miletic (with fixes by bugmen00t)'
description = 'News from Russia'
publisher = 'Kommersant'
category = 'news, politics, Russia'
oldest_article = 7
max_articles_per_feed = 50
no_stylesheets = True
use_embedded_content = False
language = 'ru'
publication_type = 'newspaper'
cover_url = 'https://iv.kommersant.ru/ContentFlex/images/logo.png'
remove_tags_before = dict(name='header', attrs={'class': 'doc_header'})
remove_tags_after = dict(name='div', attrs={'class': 'doc__text document_authors'})
remove_tags = [
dict(name='ul', attrs={'class': 'crumbs'}),
dict(name='div', attrs={'class': 'hide_desktop'}),
dict(name='div', attrs={'class': 'incut incut--right'}),
dict(name='div', attrs={'class': 'incut incut--left'}),
dict(name='div', attrs={'class': 'incut incut--center'}),
dict(name='div', attrs={'class': 'ba'}),
dict(name='div', attrs={'id': 'lenta'}),
dict(name='div', attrs={'class': 'layout basement_news__body'}),
dict(name='footer', attrs={'class': 'footer'}),
dict(name='div', attrs={'class': 'ui-modal'}),
dict(name='section', attrs={'class': 'potd'}),
dict(name='footer', attrs={'class': 'doc_footer'}),
dict(name='div', attrs={'class': 'adv_interscroll hide_desktop'})
]
feeds = [
('\u0413\u043B\u0430\u0432\u043D\u043E\u0435','https://www.kommersant.ru/rss/main.xml'),
('\u0413\u0430\u0437\u0435\u0442\u0430 "\u041A\u043E\u043C\u043C\u0435\u0440\u0441\u0430\u043D\u0442"','https://www.kommersant.ru/rss/daily.xml'),
('\u041B\u0435\u043D\u0442\u0430 \u043D\u043E\u0432\u043E\u0441\u0442\u0435\u0439','https://www.kommersant.ru/RSS/news.xml'),
('\u041C\u0430\u0442\u0435\u0440\u0438\u0430\u043B\u044B \u0441 \u0441\u0430\u0439\u0442\u0430','https://www.kommersant.ru/RSS/corp.xml'),
('\u0420\u0430\u0434\u0438\u043E \u041A\u043E\u043C\u043C\u0435\u0440\u0441\u0430\u043D\u0442\u044A-FM','https://www.kommersant.ru/RSS/radio.xml'),
('\u0422\u0435\u043C\u0430\u0442\u0438\u0447\u0435\u0441\u043A\u0438\u0435 \u043F\u0440\u0438\u043B\u043E\u0436\u0435\u043D\u0438\u044F','https://www.kommersant.ru/RSS/tema.xml'),
('\u0416\u0443\u0440\u043D\u0430\u043B \u00AB\u041E\u0413\u041E\u041D\u0401\u041A\u00BB','https://www.kommersant.ru/RSS/ogoniok.xml'),
('\u0416\u0443\u0440\u043D\u0430\u043B \u00AB\u041A\u043E\u043C\u043C\u0435\u0440\u0441\u0430\u043D\u0442\u044A WEEKEND\u00BB','https://www.kommersant.ru/RSS/weekend.xml'),
('\u0416\u0443\u0440\u043D\u0430\u043B \u00AB\u041A\u043E\u043C\u043C\u0435\u0440\u0441\u0430\u043D\u0442\u044A \u0410\u0412\u0422\u041E\u041F\u0418\u041B\u041E\u0422\u00BB','https://www.kommersant.ru/RSS/auto.xml'),
('\u041F\u043E\u043B\u0438\u0442\u0438\u043A\u0430','https://www.kommersant.ru/rss/section-politics.xml'),
('\u042D\u043A\u043E\u043D\u043E\u043C\u0438\u043A\u0430','https://www.kommersant.ru/RSS/section-economics.xml'),
('\u0411\u0438\u0437\u043D\u0435\u0441','https://www.kommersant.ru/rss/section-business.xml'),
('\u0412 \u043C\u0438\u0440\u0435','https://www.kommersant.ru/rss/section-world.xml'),
('\u041F\u0440\u043E\u0438\u0441\u0448\u0435\u0441\u0442\u0432\u0438\u044F','https://www.kommersant.ru/rss/section-accidents.xml'),
('\u041E\u0431\u0449\u0435\u0441\u0442\u0432\u043E','https://www.kommersant.ru/rss/section-society.xml'),
('\u041A\u0443\u043B\u044C\u0442\u0443\u0440\u0430','https://www.kommersant.ru/rss/section-culture.xml'),
('\u0421\u043F\u043E\u0440\u0442','https://www.kommersant.ru/rss/section-sport.xml'),
('Hi-Tech','https://www.kommersant.ru/RSS/section-hitech.xml'),
('\u0410\u0432\u0442\u043E','https://www.kommersant.ru/RSS/Autopilot_on.xml'),
('\u0421\u0442\u0438\u043B\u044C','https://www.kommersant.ru/RSS/section-style.xml'),
('\u0421\u0430\u043D\u043A\u0442-\u041F\u0435\u0442\u0435\u0440\u0431\u0443\u0440\u0433','https://www.kommersant.ru/rss/regions/piter_all.xml'),
('\u0412\u043E\u0440\u043E\u043D\u0435\u0436','https://www.kommersant.ru/rss/regions/vrn_all.xml'),
('\u0415\u043A\u0430\u0442\u0435\u0440\u0438\u043D\u0431\u0443\u0440\u0433','https://www.kommersant.ru/rss/regions/ekaterinburg_all.xml'),
('\u0418\u0436\u0435\u0432\u0441\u043A','https://www.kommersant.ru/rss/regions/izhevsk_all.xml'),
('\u041A\u0430\u0437\u0430\u043D\u044C','https://www.kommersant.ru/rss/regions/kazan_all.xml'),
('\u041A\u0440\u0430\u0441\u043D\u043E\u0434\u0430\u0440','https://www.kommersant.ru/rss/regions/krasnodar_all.xml'),
('\u041A\u0440\u0430\u0441\u043D\u043E\u044F\u0440\u0441\u043A','https://www.kommersant.ru/rss/regions/krasnoyarsk_all.xml'),
('\u041D\u0438\u0436\u043D\u0438\u0439 \u041D\u043E\u0432\u0433\u043E\u0440\u043E\u0434','https://www.kommersant.ru/rss/regions/nnov_all.xml'),
('\u041D\u043E\u0432\u043E\u0441\u0438\u0431\u0438\u0440\u0441\u043A','https://www.kommersant.ru/rss/regions/novosibirsk_all.xml'),
('\u041F\u0435\u0440\u043C\u044C','https://www.kommersant.ru/rss/regions/perm_all.xml'),
('\u0420\u043E\u0441\u0442\u043E\u0432-\u043D\u0430-\u0414\u043E\u043D\u0443','https://www.kommersant.ru/rss/regions/rostov_all.xml'),
('\u0421\u0430\u043C\u0430\u0440\u0430','https://www.kommersant.ru/rss/regions/samara_all.xml'),
('\u0421\u0430\u0440\u0430\u0442\u043E\u0432','https://www.kommersant.ru/rss/regions/saratov_all.xml'),
('\u0423\u0444\u0430','https://www.kommersant.ru/rss/regions/ufa_all.xml'),
('\u0427\u0435\u043B\u044F\u0431\u0438\u043D\u0441\u043A','https://www.kommersant.ru/rss/regions/chelyabinsk_all.xml')
]
Fixed built-in
RBC.ru recipe (
rbc_ru.recipe ): HTTPS, page cleanup, revised RSS feeds.
Spoiler :
Code:
from calibre.web.feeds.news import BasicNewsRecipe
class RBC_ru(BasicNewsRecipe):
title = u'RBC.ru'
__author__ = 'A. Chewi (with fixes by bugmen00t)'
description = u'\u0420\u043E\u0441\u0441\u0438\u0439\u0441\u043A\u043E\u0435 \u0438\u043D\u0444\u043E\u0440\u043C\u0430\u0446\u0438\u043E\u043D\u043D\u043E\u0435 \u0430\u0433\u0435\u043D\u0442\u0441\u0442\u0432\u043E \u00AB\u0420\u043E\u0441\u0411\u0438\u0437\u043D\u0435\u0441\u041A\u043E\u043D\u0441\u0430\u043B\u0442\u0438\u043D\u0433\u00BB (\u0420\u0411\u041A) - \u043B\u0435\u043D\u0442\u044B \u043D\u043E\u0432\u043E\u0441\u0442\u0435\u0439 \u043F\u043E\u043B\u0438\u0442\u0438\u043A\u0438, \u044D\u043A\u043E\u043D\u043E\u043C\u0438\u043A\u0438 \u0438 \u0444\u0438\u043D\u0430\u043D\u0441\u043E\u0432, \u0430\u043D\u0430\u043B\u0438\u0442\u0438\u0447\u0435\u0441\u043A\u0438\u0435 \u043C\u0430\u0442\u0435\u0440\u0438\u0430\u043B\u044B, \u043A\u043E\u043C\u043C\u0435\u043D\u0442\u0430\u0440\u0438\u0438 \u0438 \u043F\u0440\u043E\u0433\u043D\u043E\u0437\u044B, \u0442\u0435\u043C\u0430\u0442\u0438\u0447\u0435\u0441\u043A\u0438\u0435 \u0441\u0442\u0430\u0442\u044C\u0438' # noqa
needs_subscription = False
cover_url = 'https://pics.rbc.ru/img/fp_v4/skin/img/logo.gif'
cover_margins = (80, 160, '#ffffff')
oldest_article = 20
max_articles_per_feed = 50
summary_length = 200
remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
use_embedded_content = False
conversion_options = {'linearize_tables': True}
auto_cleanup = True
language = 'ru'
timefmt = ' [%a, %d %b, %Y]'
feeds = [(u'RSS \u043D\u043E\u0432\u043E\u0441\u0442\u0438', u'https://rssexport.rbc.ru/rbcnews/news/30/full.rss'),
(u'\u0413\u043B\u0430\u0432\u043D\u044B\u0435\u0020\u043D\u043E\u0432\u043E\u0441\u0442\u0438', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/news.rss'),
]
Fixed built-in
RIA Novosti - Russian recipe (
ria_ru.recipe ): HTTPS, revised RSS feeds.
Spoiler :
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
__license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
www.ria.ru
'''
from calibre.web.feeds.news import BasicNewsRecipe
class RIANovosti(BasicNewsRecipe):
title = 'RIA Novosti - Russian'
__author__ = 'Darko Miletic (with fixes by bugmen00t)'
description = 'News from Russia'
publisher = '\u041C\u0418\u0410 \u00AB\u0420\u043E\u0441\u0441\u0438\u044F \u0441\u0435\u0433\u043E\u0434\u043D\u044F\u00BB\u2028 (MIA Russia Today)'
category = 'news, politics, Russia'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
encoding = 'utf8'
language = 'ru'
publication_type = 'newsportal'
cover_url = 'https://oldimg.ria.ru/i/ria_social.png'
remove_tags_before = dict(name='div', attrs={'class': 'article__header'})
remove_tags_after = dict(name='div', attrs={'class': 'article__userbar'})
remove_tags = [
dict(name='div', attrs={'class': 'article__userbar'}),
dict(name='div', attrs={'class': 'article__title'}),
dict(name='div', attrs={'class': 'article__aggr'}),
dict(name='div', attrs={'class': 'article__article-info'})
]
feeds = [
(u'\u041B\u0435\u043D\u0442\u0430 \u043D\u043E\u0432\u043E\u0441\u0442\u0435\u0439', u'https://ria.ru/export/rss2/archive/index.xml')
]
Improved built-in
TJournal recipe (
tjournal.recipe ): articles cleanup, revised RSS feeds. Still ugly placeholders instead of cool in-article images, that's a shame
Bonus: updated
favicon
Spoiler :
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
class TJournal(BasicNewsRecipe):
title = u'TJournal'
__author__ = 'bug_me_not (with fixes by bugmen00t)'
description = 'TJournal: \u0438\u0437\u0434\u0430\u043D\u0438\u0435 \u043E \u043C\u0435\u0434\u0438\u0430, \u0442\u0435\u0445\u043D\u043E\u043B\u043E\u0433\u0438\u044F\u0445 \u0438 \u0442\u0440\u0435\u043D\u0434\u0430\u0445'
publisher = 'tjournal.ru'
category = 'news'
language = 'ru'
no_stylesheets = False
remove_javascript = True
oldest_article = 30
max_articles_per_feed = 100
cover_url = 'https://tjournal.ru/static/build/tjournal.ru/images/search_logo.png'
remove_tags_before = dict(
name='div', attrs={'class': 'content-title"'})
remove_tags_after = dict(
name='div', attrs={'class': 'content-footer content-footer--full l-island-a'})
remove_tags = [
dict(name='div', attrs={'class': 'content-footer content-footer--full l-island-a'}),
dict(name='div', attrs={'air-module': 'module.distributionFloating'}),
dict(name='span', attrs={'class': 'content-editorial-tick'}),
dict(name='vue'),
dict(name='div', attrs={'class': 'comments'}),
dict(name='div', attrs={'class': 'propaganda'}),
dict(name='div', attrs={'class': 'propaganda propaganda--with-footer'}),
dict(name='div', attrs={'air-module': 'module.gallery'}),
dict(name='div', attrs={'class': 'content-container'}),
dict(name='div', attrs={'class': 'content-header__item content-header-number'}),
dict(name='span', attrs={'class': 'views__value'}),
dict(name='span', attrs={'class': 'views__label'})
]
feeds = [
('\u041F\u043E\u043F\u0443\u043B\u044F\u0440\u043D\u043E\u0435','https://tjournal.ru/rss'),
('\u041D\u043E\u0432\u043E\u0441\u0442\u0438','https://tjournal.ru/rss/news'),
('\u0421\u0432\u0435\u0436\u0435\u0435','https://tjournal.ru/rss/new'),
('\u0422\u0435\u0445\u043D\u043E\u043B\u043E\u0433\u0438\u0438','https://tjournal.ru/rss/tech'),
('\u0420\u0430\u0437\u0431\u043E\u0440\u044B','https://tjournal.ru/rss/analysis'),
('\u0418\u043D\u0442\u0435\u0440\u043D\u0435\u0442','https://tjournal.ru/rss/internet')
]
def preprocess_html(self, soup):
for img in soup.findAll('img', attrs={'data-image-src': True}):
img['src'] = img['data-image-src']
return soup
Fixed built-in
The Insider recipe (
the_insider.recipe ): HTTPS, revised RSS feeds. Bonus:
favicon
Spoiler :
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class TheInsider(BasicNewsRecipe):
title = u'The Insider'
cover_url = u'https://s3-us-west-2.amazonaws.com/anchor-generated-image-bank/production/podcast_uploaded_nologo400/10331708/10331708-1604408816914-d03520fb339d5.jpg'
__author__ = 'bugmen00t'
description = '\u0420\u0430\u0441\u0441\u043B\u0435\u0434\u043E\u0432\u0430\u043D\u0438\u044F \u0420\u0435\u043F\u043E\u0440\u0442\u0430\u0436\u0438 \u0410\u043D\u0430\u043B\u0438\u0442\u0438\u043A\u0430'
publisher = 'theins.ru'
category = 'news'
language = 'ru'
no_stylesheets = True
remove_javascript = True
oldest_article = 300
max_articles_per_feed = 100
remove_tags_before = dict(name='div', attrs={'id':'wrapper'})
remove_tags_after = dict(name='p', attrs={'style':' color: #999999;'})
remove_tags = [
dict(name='div',attrs={'class':'post-share'}),
dict(name='div', attrs={'class':'post-share fixed-likes'}),
dict(name='div', attrs={'class':'topads'}),
dict(name='div', attrs={'class':'pre-content-line'}),
dict(name='div', attrs={'class':'author-opinions'}),
dict(name='div', attrs={'class':'content-banner'}),
dict(name='div', attrs={'id':'sidebar'})
]
feeds = [
(u'\u041D\u043E\u0432\u043E\u0441\u0442\u0438', u'https://theins.ru/feed')
]
Improved built-in
iXBT.com recipe (
ixbt.recipe ): revised RSS feeds.
Spoiler :
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class Ixbt(BasicNewsRecipe):
title = 'iXBT.com'
__author__ = 'bugmen00t'
description = '\u0421\u043F\u0435\u0446\u0438\u0430\u043B\u0438\u0437\u0438\u0440\u043E\u0432\u0430\u043D\u043D\u044B\u0439 \u0440\u043E\u0441\u0441\u0438\u0439\u0441\u043A\u0438\u0439 \u0438\u043D\u0444\u043E\u0440\u043C\u0430\u0446\u0438\u043E\u043D\u043D\u043E-\u0430\u043D\u0430\u043B\u0438\u0442\u0438\u0447\u0435\u0441\u043A\u0438\u0439 \u0441\u0435\u0440\u0432\u0435\u0440, \u043E\u0441\u0432\u0435\u0449\u0430\u044E\u0449\u0438\u0439 \u0432\u043E\u043F\u0440\u043E\u0441\u044B \u0430\u043F\u043F\u0430\u0440\u0430\u0442\u043D\u043E\u0433\u043E \u043E\u0431\u0435\u0441\u043F\u0435\u0447\u0435\u043D\u0438\u044F \u043F\u0435\u0440\u0441\u043E\u043D\u0430\u043B\u044C\u043D\u044B\u0445 \u043A\u043E\u043C\u043F\u044C\u044E\u0442\u0435\u0440\u043E\u0432, \u043A\u043E\u043C\u043C\u0443\u043D\u0438\u043A\u0430\u0446\u0438\u0439 \u0438 \u0441\u0435\u0440\u0432\u0435\u0440\u043E\u0432, 3D-\u0433\u0440\u0430\u0444\u0438\u043A\u0438 \u0438 \u0437\u0432\u0443\u043A\u0430, \u0446\u0438\u0444\u0440\u043E\u0432\u043E\u0433\u043E \u0444\u043E\u0442\u043E \u0438 \u0432\u0438\u0434\u0435\u043E, Hi-Fi \u0430\u043F\u043F\u0430\u0440\u0430\u0442\u0443\u0440\u044B \u0438 \u043F\u0440\u043E\u0435\u043A\u0442\u043E\u0440\u043E\u0432, \u043C\u043E\u0431\u0438\u043B\u044C\u043D\u043E\u0439 \u0441\u0432\u044F\u0437\u0438 \u0438 \u043F\u0435\u0440\u0438\u0444\u0435\u0440\u0438\u0438, \u0438\u0433\u0440\u043E\u0432\u044B\u0445 \u043F\u0440\u0438\u043B\u043E\u0436\u0435\u043D\u0438\u0439 \u0438 \u043C\u043D\u043E\u0433\u043E\u0433\u043E \u0434\u0440\u0443\u0433\u043E\u0433\u043E.'
publisher = 'www.ixbt.com'
category = 'news'
cover_url = u'https://www.ixbt.com/images/ixbt-logo-new.jpg'
language = 'ru'
auto_cleanup = True
oldest_article = 30
max_articles_per_feed = 100
remove_tags_before = dict(name='div', attrs={'class': 'content'})
remove_tags_after = dict(name='ul', attrs={'id': 'soc_ShareBlock'})
feeds = [
(u'\u0421\u0442\u0430\u0442\u044C\u0438', 'http://www.ixbt.com/export/articles.rss'),
(u'\u041D\u043E\u0432\u043E\u0441\u0442\u0438', 'http://www.ixbt.com/export/news.rss'),
(u'\u0421\u0432\u0435\u0436\u0438\u0435 \u043D\u043E\u0432\u043E\u0441\u0442\u0438 DVD \u0438 \u0434\u043E\u043C\u0430\u0448\u043D\u0438\u0445 \u043A\u0438\u043D\u043E\u0442\u0435\u0430\u0442\u0440\u043E\u0432', 'http://www.ixbt.com/export/dvdnews.rss'),
(u'\u0421\u0432\u0435\u0436\u0438\u0435 \u043D\u043E\u0432\u043E\u0441\u0442\u0438 \u0438\u0437 \u043C\u0438\u0440\u0430 Apple', 'http://www.ixbt.com/export/applenews.rss'),
(u'\u041F\u0440\u043E\u0446\u0435\u0441\u0441\u043E\u0440\u044B', 'http://www.ixbt.com/export/sec_cpu.rss'),
(u'\u0421\u0438\u0441\u0442\u0435\u043C\u043D\u044B\u0435 \u043F\u043B\u0430\u0442\u044B, \u043F\u0430\u043C\u044F\u0442\u044C \u0438 \u0447\u0438\u043F\u0441\u0435\u0442\u044B', 'http://www.ixbt.com/export/sec_mainboard.rss'),
(u'D-\u0412\u0438\u0434\u0435\u043E \u0438 TV-\u0442\u044E\u043D\u0435\u0440\u044B', 'http://www.ixbt.com/export/sec_video.rss'),
(u'\u0421\u0435\u0442\u0438 \u0438 \u0421\u0435\u0440\u0432\u0435\u0440\u044B', 'http://www.ixbt.com/export/sec_comm.rss'),
(u'\u041E\u043F\u0442\u0438\u0447\u0435\u0441\u043A\u0438\u0435 \u043F\u0440\u0438\u0432\u043E\u0434\u044B \u0438 \u043D\u043E\u0441\u0438\u0442\u0435\u043B\u0438 \u0438\u043D\u0444\u043E\u0440\u043C\u0430\u0446\u0438\u0438', 'http://www.ixbt.com/export/sec_optical.rss'),
(u'\u041F\u0440\u0438\u043D\u0442\u0435\u0440\u044B \u0438 \u041C\u0424\u0423', 'http://www.ixbt.com/export/sec_printer.rss'),
(u'\u041C\u043E\u043D\u0438\u0442\u043E\u0440\u044B', 'http://www.ixbt.com/export/sec_monitor.rss'),
(u'\u041D\u043E\u0441\u0438\u0442\u0435\u043B\u0438 \u0438\u043D\u0444\u043E\u0440\u043C\u0430\u0446\u0438\u0438', 'http://www.ixbt.com/export/sec_storage.rss'),
(u'\u0426\u0438\u0444\u0440\u043E\u0432\u043E\u0439 \u0437\u0432\u0443\u043A', 'http://www.ixbt.com/export/sec_multimedia.rss'),
(u'ProAudio', 'http://www.ixbt.com/export/sec_proaudio.rss'),
(u'\u0418\u0437\u043E\u0431\u0440\u0430\u0436\u0435\u043D\u0438\u0435 \u0432 \u0447\u0438\u0441\u043B\u0430\u0445', 'http://www.ixbt.com/export/sec_digimage.rss'),
(u'\u041F\u0440\u043E\u0435\u043A\u0442\u043E\u0440\u044B, \u043A\u0438\u043D\u043E \u0438 \u0434\u043E\u043C\u0430\u0448\u043D\u0438\u0435 \u043A\u0438\u043D\u043E\u0442\u0435\u0430\u0442\u0440\u044B', 'http://www.ixbt.com/export/sec_dvd.rss'),
(u'\u0426\u0438\u0444\u0440\u043E\u0432\u043E\u0435 \u0432\u0438\u0434\u0435\u043E', 'http://www.ixbt.com/export/sec_divideo.rss'),
(u'\u041C\u043E\u0431\u0438\u043B\u044C\u043D\u044B\u0435 \u041F\u041A', 'http://www.ixbt.com/export/sec_portopc.rss'),
(u'\u041C\u043E\u0431\u0438\u043B\u044C\u043D\u044B\u0435 \u0443\u0441\u0442\u0440\u043E\u0439\u0441\u0442\u0432\u0430', 'http://www.ixbt.com/export/sec_pda.rss'),
(u'\u0412\u0441\u0435\u0433\u0434\u0430 \u043D\u0430 \u0441\u0432\u044F\u0437\u0438', 'http://www.ixbt.com/export/sec_mobile.rss'),
(u'\u041A\u043E\u0440\u043F\u0443\u0441\u0430, \u0441\u0438\u0441\u0442\u0435\u043C\u044B \u043F\u0438\u0442\u0430\u043D\u0438\u044F \u0438 \u043E\u0445\u043B\u0430\u0436\u0434\u0435\u043D\u0438\u044F', 'http://www.ixbt.com/export/sec_power.rss'),
(u'\u041A\u043E\u043B\u043E\u043D\u043A\u0430 \u0440\u0435\u0434\u0430\u043A\u0442\u043E\u0440\u0430', 'http://www.ixbt.com/export/sec_editorial.rss'),
(u'iXBT Live', 'https://www.ixbt.com/live/rss/index/')
]
Improved built-in
Идеальный пиксель recipe (
id_pixel.recipe ): HTTPS, articles cleanup. Bonus:
favicon
Spoiler :
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class IdPixel(BasicNewsRecipe):
title = '\u0418\u0434\u0435\u0430\u043B\u044C\u043D\u044B\u0439 \u043F\u0438\u043A\u0441\u0435\u043B\u044C'
cover_url = u'https://idpixel.ru/i/logo2x.png'
description = '\u041D\u043E\u0432\u043E\u0441\u0442\u043D\u043E\u0439 \u043F\u0440\u043E\u0435\u043A\u0442 \u043E \u0440\u0435\u0442\u0440\u043E-\u0438\u0433\u0440\u0430\u0445 \u0438 \u0440\u0435\u0442\u0440\u043E-\u0442\u0435\u0445\u043D\u0438\u043A\u0435. \u0412\u043E\u0441\u044C\u043C\u0438\u0431\u0438\u0442\u043D\u044B\u0435 \u0438\u0433\u0440\u044B, \u0448\u0435\u0441\u0442\u043D\u0430\u0434\u0446\u0430\u0442\u0438\u0431\u0438\u0442\u043D\u044B\u0435 \u043A\u043E\u043D\u0441\u043E\u043B\u0438, \u0434\u043E\u043C\u0430\u0448\u043D\u0438\u0435 \u043A\u043E\u043C\u043F\u044C\u044E\u0442\u0435\u0440\u044B \u0441 \u0438\u0433\u0440\u0430\u043C\u0438 \u043D\u0430 \u043A\u0430\u0441\u0441\u0435\u0442\u0430\u0445 \u0438 \u0442\u0430\u043A \u0434\u0430\u043B\u0435\u0435. \u041C\u044B \u0438\u0449\u0435\u043C \u0440\u0435\u0442\u0440\u043E-\u043D\u043E\u0432\u043E\u0441\u0442\u0438 \u043F\u043E \u0432\u0441\u0435\u043C\u0443 \u0441\u0432\u0435\u0442\u0443 \u0438 \u0434\u043E\u043D\u043E\u0441\u0438\u043C \u0438\u0445 \u0434\u043E \u0432\u0430\u0441.' # noqa
publisher = '\u041C\u0438\u0445\u0430\u0438\u043B \u0421\u0443\u0434\u0430\u043A\u043E\u0432'
category = 'news'
__author__ = 'bugmen00t'
language = 'ru'
no_stylesheets = False
remove_javascript = True
auto_cleanup = True
oldest_article = 100
max_articles_per_feed = 50
remove_tags_before = dict(name='div', attrs={'class':'blog-post'})
remove_tags_after = dict(name='div', attrs={'style':'margin: 20px 0 0 2px;font-size: 16px;'})
remove_tags = [dict(name='div',attrs={'class':' likely__widget likely__widget_vkontakte'}),
dict(name='div', attrs={'class':' likely__widget likely__widget_twitter'}),
dict(name='div', attrs={'class':' likely__widget likely__widget_facebook'}),
dict(name='div', attrs={'class':' likely__widget likely__widget_telegram'}),
dict(name='div', attrs={'class':' likely__widget likely__widget_odnoklassniki'}),
dict(name='div', attrs={'class':'comments_input_disabled'}),
dict(name='div', attrs={'id':'comments'})
]
feeds = [(u'\u041D\u043E\u0432\u043E\u0441\u0442\u0438', u'https://idpixel.ru/rss/news.rss')]
Fixed built-in
Компьютерра recipe (
kompiutierra.recipe ): HTTPS, revised RSS feeds. Bonus: updated
favicon
Fixed built-in
МедиаЗона recipe (
media_zone.recipe ): revised RSS feeds. Bonus:
favicon
Spoiler :
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class MediaZona(BasicNewsRecipe):
title = '\u041c\u0435\u0434\u0438\u0430\u0417\u043e\u043d\u0430'
__author__ = 'bugmen00t'
description = '\u041E\u0431\u0449\u0435\u0441\u0442\u0432\u0435\u043D\u043D\u043E-\u043F\u043E\u043B\u0438\u0442\u0438\u0447\u0435\u0441\u043A\u043E\u0435 \u0438\u0437\u0434\u0430\u043D\u0438\u0435, \u0441\u0434\u0435\u043B\u0430\u0432\u0448\u0435\u0435 \u0430\u043A\u0446\u0435\u043D\u0442 \u043D\u0430 \u0444\u0443\u043D\u043A\u0446\u0438\u043E\u043D\u0438\u0440\u043E\u0432\u0430\u043D\u0438\u0438 \u0437\u0430\u043A\u043E\u043D\u0430 \u0432 \u0420\u043E\u0441\u0441\u0438\u0438. \u041F\u043E \u043C\u043D\u0435\u043D\u0438\u044E \u0430\u0432\u0442\u043E\u0440\u0438\u0442\u0435\u0442\u043D\u044B\u0445 \u043C\u0435\u0434\u0438\u0430\u044D\u043A\u0441\u043F\u0435\u0440\u0442\u043E\u0432, \u043F\u043E \u0446\u0438\u0442\u0438\u0440\u0443\u0435\u043C\u043E\u0441\u0442\u0438 \u0438 \u043F\u043E\u0441\u0435\u0449\u0430\u0435\u043C\u043E\u0441\u0442\u0438 \u0444\u043E\u0440\u043C\u0430\u0442 \u00AB\u041C\u0435\u0434\u0438\u0430\u0437\u043E\u043D\u044B\u00BB \u043E\u043A\u0430\u0437\u0430\u043B\u0441\u044F \u0432\u0435\u0434\u0443\u0449\u0438\u043C \u0444\u043E\u0440\u043C\u0430\u0442\u043E\u043C \u043D\u043E\u0432\u043E\u0441\u0442\u043D\u043E\u0433\u043E \u0438\u0437\u0434\u0430\u043D\u0438\u044F \u0432 \u0420\u043E\u0441\u0441\u0438\u0438 2015 \u0433\u043E\u0434\u0430. \u00AB\u041C\u0435\u0434\u0438\u0430\u0437\u043E\u043D\u0430\u00BB \u043F\u0438\u0448\u0435\u0442 \u043E \u0440\u0435\u0430\u043B\u044C\u043D\u043E \u043F\u0440\u043E\u0438\u0441\u0445\u043E\u0434\u044F\u0449\u0435\u043C \u0432 \u0420\u043E\u0441\u0441\u0438\u0438, \u043F\u0435\u0440\u0432\u043E\u0439 \u0443\u043B\u0430\u0432\u043B\u0438\u0432\u0430\u044F \u0432\u0435\u043A\u0442\u043E\u0440\u044B \u0440\u0430\u0437\u0432\u0438\u0442\u0438\u044F \u043E\u0431\u0449\u0435\u0441\u0442\u0432\u0430.' # noqa
publisher = 'zona.media'
category = 'news'
cover_url = u'https://zona.media/s/share/default_mz.png'
language = 'ru'
no_stylesheets = False
remove_javascript = True
auto_cleanup = True
oldest_article = 30
max_articles_per_feed = 100
remove_tags_before = dict(name='section', attrs={'class': 'mz-layout-content__row pt0 clearfix'})
remove_tags_after = dict(name='div', attrs={'class': 'mz-publish__wrapper'})
remove_tags = [
dict(name='div', attrs={'class': 'mz-agent-banner'}),
dict(name='section', attrs={'data-share-id': 'post'})
]
feeds = [
('\u041C\u0435\u0434\u0438\u0430\u0437\u043E\u043D\u0430 ', 'https://zona.media/rss'),
('\u0411\u0435\u043B\u0430\u0440\u0443\u0441\u044C', 'https://mediazona.by/rss'),
('\u0426\u0435\u043D\u0442\u0440\u0430\u043B\u044C\u043D\u0430\u044F \u0410\u0437\u0438\u044F', 'https://mediazona.ca/rss'),
]
Fixed built-in
Правда.RU recipe (
pravda_ru.recipe ): HTTPS, revised RSS feeds.
N.B. : it seems that site is geo-restricted and recipe probably won't work for non-Russian IPs. Bonus: updated
favicon
Spoiler :
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
__license__ = 'GPL v3'
__copyright__ = '2012, Darko Miletic <darko.miletic at gmail.com>'
'''
www.pravda.ru
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Pravda_ru(BasicNewsRecipe):
title = u'\u041F\u0440\u0430\u0432\u0434\u0430'
__author__ = 'Darko Miletic (with fixes by bugmen00t)'
description = u'\u041F\u0440\u0430\u0432\u0434\u0430.\u0420\u0443: \u0410\u043D\u0430\u043B\u0438\u0442\u0438\u043A\u0430 \u0438 \u043D\u043E\u0432\u043E\u0441\u0442\u0438'
publisher = 'PRAVDA.Ru'
category = 'news, politics, Russia'
language = 'ru'
publication_type = 'newspaper'
cover_url = 'http://www.pravda.ru/pix/logo.gif'
oldest_article = 7
max_articles_per_feed = 50
auto_cleanup = True
remove_tags_before = dict(name='div', attrs={'class': 'full article full-article'})
remove_tags_after = dict(name='div', attrs={'class': 'authors-block'})
remove_tags = [
dict(name='div', attrs={'class': 'breadcumbs'})
]
feeds = [
(u'\u041F\u0440\u0430\u0432\u0434\u0430.RU', 'https://www.pravda.ru/export.xml'),
(u'\u0421\u0442\u0430\u0442\u044C\u0438', 'https://www.pravda.ru/export-articles.xml'),
(u'\u041D\u043E\u0432\u043E\u0441\u0442\u0438', 'https://www.pravda.ru/export-news.xml')
]
Improved built-in
Троицкий вариант recipe (
trv.recipe ): HTTPS, articles cleanup. Decided to keep the comments as they're often more interesting than the articles itself; could be completely disabled by uncommenting one line in
remove_tags section.
Spoiler :
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe
class TrvScience(BasicNewsRecipe):
title = u'\u0422\u0440\u043e\u0438\u0446\u043a\u0438\u0439 \u0432\u0430\u0440\u0438\u0430\u043d\u0442'
language = 'ru'
__author__ = 'Vadim Dyadkin (with fixes by bugmen00t)'
oldest_article = 30
max_articles_per_feed = 100
recursion = 4
no_stylesheets = True
simultaneous_downloads = 1
# cover_url = 'https://i0.wp.com/trv-science.ru/uploads/logo_trv2-e1573805568596-1.png'
cover_url = 'https://i0.wp.com/trv-science.ru/uploads/cropped-trv_neur-1024.png'
remove_tags_before = dict(name='main', attrs={'id': 'main'})
remove_tags_after = dict(name='div', attrs={'class': 'wpdiscuz-comment-pagination'})
remove_tags = [
dict(name='span', attrs={'class': 'fa fa-user'}),
dict(name='h4'),
dict(name='svg'),
dict(name='ul', attrs={'class': 'st-related-posts'}),
dict(name='footer', attrs={'class': 'entry-meta'}),
# dict(name='div', attrs={'id': 'comments'}),
dict(name='div', attrs={'class': 'wpd-vote'}),
dict(name='div', attrs={'class': 'mistape_caption'}),
dict(name='div', attrs={'class': 'wpd-comment-share wpd-hidden wpd-tooltip wpd-top'}),
dict(name='div', attrs={'class': 'wpd-comment-left '}),
dict(name='div', attrs={'class': 'wpd-space'}),
dict(name='div', attrs={'class': 'wpd-reply-button'}),
dict(name='div', attrs={'class': 'wpd-comment-link wpd-hidden'}),
dict(name='div', attrs={'class': 'wpd-comment-last-edited'}),
dict(name='div', attrs={'class': 'wpd-comment-date'}),
dict(name='div', attrs={'class': 'wpd-comment-info-bar'}),
dict(name='div', attrs={'class': 'wpd-form-wrap'})
]
feeds = [(u'\u0422\u0440\u043e\u0438\u0446\u043a\u0438\u0439 \u0432\u0430\u0440\u0438\u0430\u043d\u0442',
u'https://trv-science.ru/feed/')]
Last edited by bugmen00t; 07-23-2022 at 03:54 AM .
Reason: Медиазона and Троицкий Вариант recipe small fix