Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-25-2021, 02:35 PM   #1
MoloBolo
Junior Member
MoloBolo began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2021
Device: Kobo Libra H20
Mediapart recipe doesn't work anymore

Hi,

The website Mediapart was updated recently and since then the recipe doesn't work. The ePub created seems normal but the articles are blank, with just the titles.
It worked fine before the update. I had this issue a few times before but I just had to redownload a second time and that was it.

I think this is the recipe.

Is there anything I can do about this ? I checked the URLs (pretty much the only thing I understand lol). The feed in the recipe is HTTP instead of HTTPS (I changed it in a custom recipe to be sure but it didn't change anything) and the rest is ok.

Thanks !
MoloBolo is offline   Reply With Quote
Old 11-26-2021, 07:58 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,715
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You will likely need to change the keep_only_tags setting.
kovidgoyal is offline   Reply With Quote
Advert
Old 11-26-2021, 11:45 AM   #3
MoloBolo
Junior Member
MoloBolo began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2021
Device: Kobo Libra H20
Okay so I deleted lines 45 to 49

Code:
keep_only_tags = [
        dict(name='h1'),
        dict(name='div', **classes('author')),
        classes('introduction content-article')
    ]
I used Calibre's ereader and it worked (although it downloaded pretty much the whole web page instead of just the article, but I imagine that's the point).

The issue is that when I transfer the ePub to my Libra I can't access the articles. Going to the next page, even choosing a particular page, send me to the next article's first page (the one with links to the table of content).
The ePub is 1700 pages long (!!) when it used to be ~250.

The articles are there, it just won't let me read it

Thanks for your help !
MoloBolo is offline   Reply With Quote
Old 11-26-2021, 01:23 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,715
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You will need to figure out what to replace those with to extract the article's contents. See https://manual.calibre-ebook.com/news.html for an overview of the process.
kovidgoyal is offline   Reply With Quote
Old 11-27-2021, 07:37 AM   #5
MoloBolo
Junior Member
MoloBolo began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2021
Device: Kobo Libra H20
The print version looks great but the URL doesn't seem to match like the BBC exemple.

Article : https://www.mediapart.fr/journal/france/261121/les-lecons-de-l-affaire-nicolas-hulot

Print version : https://www.mediapart.fr/tools/print/996766

I can roughly see where the article is in the code but I'm not sure how to use keep/remove tags.
MoloBolo is offline   Reply With Quote
Advert
Old 12-04-2021, 07:22 AM   #6
senacra
Junior Member
senacra began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2021
Device: Kindle 4
Hi,

I just noticed the same issue.
After analysis, i think i found a fix :
keep_only_tags = [
dict(name='h1'),
dict(name='div', **classes('author')),
classes('news__heading__top__intro news__body__center__article')
]

Working for me
senacra is offline   Reply With Quote
Old 12-04-2021, 11:51 AM   #7
MoloBolo
Junior Member
MoloBolo began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2021
Device: Kobo Libra H20
Thanks a lot, sadly it's not working for me. It's perfect with Calibre's ereader but the articles are randomly cut on my Libra.
It always cut just before the links to other articles ("à lire aussi"). Kepub conversion didn't help so I ran a book check and got 19 "Parsing failed: redefinition of the xmlns prefix is forbidden" errors (screen attached).

Here's the full recipe :

Spoiler:
#!/usr/bin/env python
# vim:fileencoding=utf-8
#
# 11 Jan 2021 - L. Houpert - Major changes in the Mediapart recipe:
# 1) Summary of the article are noow available
# 2) Additional sections International, France, Economie and Culture have
# been added through custom entries in the function my_parse_index.
# 3) Fix the cover image so it doesnt disappear from the Kindle menu
# ( cover image format is changed to .jpeg)
# 14 Jan 2021 - Add Mediapart Logo url as masthead_url and change cover
# by overlaying the date on top of the Mediapart cover
from __future__ import unicode_literals

__license__ = 'GPL v3'
__copyright__ = '2021, Loïc Houpert <houpertloic at gmail .com>. Adapted from: 2016, Daniel Bonnery; 2009, Mathieu Godlewski; 2010-2012, Louis Gesbert' # noqa
'''
Mediapart
'''

import re
from datetime import date, datetime, timezone, timedelta
from calibre.web.feeds import feeds_from_index
from calibre.web.feeds.news import BasicNewsRecipe


def classes(classes):
q = frozenset(classes.split(' '))
return dict(
attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)}
)


class Mediapart(BasicNewsRecipe):
title = 'Mediapart'
__author__ = 'Loïc Houpert'
description = 'Global news in French from news site Mediapart'
publication_type = 'newspaper'
language = 'fr'
needs_subscription = True
oldest_article = 2

use_embedded_content = False
no_stylesheets = True

keep_only_tags = [
dict(name='h1'),
dict(name='div', **classes('author')),
classes('news__heading__top__intro news__body__center__article')
]
remove_tags = [classes('login-subscribe print-source_url')]
conversion_options = {'smarten_punctuation': True}

masthead_url = "https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart_masthead.png"
# cover_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg'

# --

# Get date in french time zone format
today = datetime.now(timezone.utc) + timedelta(hours=1)
oldest_article_date = today - timedelta(days=oldest_article)

feeds = [
('La Une', 'http://www.mediapart.fr/articles/feed'),
]

# The feed at 'http://www.mediapart.fr/articles/feed' only displayed the 10
# last elements so the articles are indexed on specific pages
# in the function my_parse_index. In this function the article are parsed
# using the funtion get_articles and the dict values dict_article_sources

def parse_feeds(self):
feeds = super(Mediapart, self).parse_feeds()
feeds += feeds_from_index(self.my_parse_index(feeds))
return feeds

def my_parse_index(self, la_une):

dict_article_sources = [
{
'type': 'Brèves',
'webpage': 'https://www.mediapart.fr/journal/fil-dactualites',
'separador': {
'page': 'ul',
'thread': 'li'
}
},
{
'type': 'International',
'webpage': 'https://www.mediapart.fr/journal/international',
'separador': {
'page': 'div',
'thread': 'div'
}
},
{
'type': 'France',
'webpage': 'https://www.mediapart.fr/journal/france',
'separador': {
'page': 'div',
'thread': 'div'
}
},
{
'type': 'Économie',
'webpage': 'https://www.mediapart.fr/journal/economie',
'separador': {
'page': 'div',
'thread': 'div'
}
},
{
'type': 'Culture',
'webpage': 'https://www.mediapart.fr/journal/culture-idees',
'separador': {
'page': 'div',
'thread': 'div'
}
},
]

def get_articles(
type_of_article, webpage, separador_page='ul', separador_thread='li'
):

specific_articles = []

webpage_article = []
soup = self.index_to_soup(webpage)
page = soup.find('main', {'class': 'global-wrapper'})
fils = page.find(separador_page, {'class': 'post-list universe-journal'})

all_articles = fils.findAll(separador_thread)
for article in all_articles:
try:
title = article.find('h3', recursive=False)
if title is None or ''.join(title['class']) == 'title-specific':
# print(f"[BAD title entry] Print value of title:\n {title}")
continue
# print(f"\n[OK title entry] Print value of title:\n {title}\n")

try:
article_mot_cle = article.find(
'a', {
'href': re.compile(r'.*\/mot-cle\/.*')
}
).renderContents().decode('utf-8')
except Exception:
article_mot_cle = ''

try:
article_type = article.find(
'a', {
'href': re.compile(r'.*\/type-darticles\/.*')
}
).renderContents().decode('utf-8')
except Exception:
article_type = ''

for s in title('span'):
s.replaceWith(s.renderContents().decode('utf-8') + "\n")
url = title.find('a', href=True)['href']

date = article.find('time', datetime=True)['datetime']
article_date = datetime.strptime(date, '%Y-%m-%d')
# Add French timezone to date of the article for date check
article_date = article_date.replace(tzinfo=timezone.utc) + timedelta(hours=1)
if article_date < self.oldest_article_date:
print("article_date < self.oldest_article_date\n")
continue

# print("-------- Recent article added to the list ------- \n")
all_authors = article.findAll(
'a', {'class': re.compile(r'\bjournalist\b')}
)
authors = [self.tag_to_string(a) for a in all_authors]
# print(f"Authors in tag <a>: {authors}")

# If not link to the author profile is available the
# html separador is a span tag
if not all_authors:
try:
all_authors = article.findAll(
'span', {'class': re.compile(r'\bjournalist\b')}
)
authors = [self.tag_to_string(a) for a in all_authors]
# print(f"Authors in tag <span>: {authors}")
except:
authors = 'unknown'

description = article.find('p').renderContents().decode('utf-8')
# print(f" <p> in article : {self.tag_to_string(description).strip()} ")

summary = {
'title': self.tag_to_string(title).strip(),
'description': description,
'date': article_date.strftime("%a, %d %b, %Y %H:%M"),
'author': ', '.join(authors),
'article_type': article_type,
'mot_cle': article_mot_cle.capitalize(),
'url': 'https://www.mediapart.fr' + url,
}

webpage_article.append(summary)
except Exception:
pass

specific_articles += [(type_of_article,
webpage_article)] if webpage_article else []
return specific_articles

articles = []

for category in dict_article_sources:
articles += get_articles(
category['type'], category['webpage'], category['separador']['page'],
category['separador']['thread']
)

return articles

# non-locale specific date parse (strptime("%d %b %Y",s) would work with
# french locale)
def parse_french_date(self, date_str):
date_arr = date_str.lower().split()
return date(
day=int(date_arr[0]),
year=int(date_arr[2]),
month=[
None, 'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'
].index(date_arr[1])
)

def get_browser(self):
# -- Handle login

def is_form_login(form):
return "id" in form.attrs and form.attrs['id'] == "logFormEl"

br = BasicNewsRecipe.get_browser(self)
if self.username is not None and self.password is not None:
br.open('https://www.mediapart.fr/login')
br.select_form(predicate=is_form_login)
br['name'] = self.username
br['password'] = self.password
br.submit()
return br

def default_cover(self, cover_file):
'''
Create a generic cover for recipes that don't have a cover
'''
from PyQt5.Qt import QImage, QPainter, QPen, Qt, QFont, QRect
from calibre.gui2 import ensure_app, load_builtin_fonts, pixmap_to_data

def init_environment():
ensure_app()
load_builtin_fonts()

def create_cover_mediapart(date):
' Create a cover for mediapart adding the date on Mediapart Cover'
init_environment()
# Get data
image_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg'
data = self.index_to_soup(image_url, raw=True)
# Get date and hour corresponding to french time zone
today = datetime.now(timezone.utc) + timedelta(hours=1)
wkd = today.weekday()
french_weekday={0:'Mon',1:'Mar',2:'Mer',3:'Jeu',4: 'Ven',5:'Sam',6:'Dim'}
day = french_weekday[wkd]+'.'
date = day + ' ' + today.strftime('%d %b. %Y')
edition = today.strftime('Édition de %Hh')

# Get Cover data
img = QImage()
img.loadFromData(data)

# Overlay date on cover
p = QPainter(img)
pen = QPen(Qt.black)
pen.setWidth(6)
p.setPen(pen)
font = QFont()
font.setFamily('Times')
font.setPointSize(78)
p.setFont(font)
r = QRect(0, 600, 744,100)
p.drawText(r, Qt.AlignmentFlag.AlignJustify | Qt.AlignmentFlag.AlignVCenter | Qt.AlignmentFlag.AlignCenter, date)
p.end()

# Overlay edition information on cover
p = QPainter(img)
pen = QPen(Qt.black)
pen.setWidth(4)
p.setPen(pen)
font = QFont()
font.setFamily('Times')
font.setItalic(True)
font.setPointSize(66)
p.setFont(font)
# Add date
r = QRect(0, 720, 744,100)
p.drawText(r, Qt.AlignmentFlag.AlignJustify | Qt.AlignmentFlag.AlignVCenter | Qt.AlignmentFlag.AlignCenter, edition)
p.end()
return pixmap_to_data(img)

try:
today=datetime.today()
date = today.strftime('%d %b %Y')
img_data = create_cover_mediapart(date)
cover_file.write(img_data)
cover_file.flush()
except Exception:
self.log.exception('Failed to generate default cover')
return False
return True


calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'


I'm assuming those links are the issue, at least with Kobo Libra (that's where it cuts and where the errors are), but I have no idea how to fix this.
Attached Thumbnails
Click image for larger version

Name:	Sans titre.jpg
Views:	305
Size:	535.3 KB
ID:	190643  
MoloBolo is offline   Reply With Quote
Old 12-05-2021, 12:50 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,715
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just add svg to remove_tags in the recipe.
kovidgoyal is offline   Reply With Quote
Old 12-05-2021, 07:35 AM   #9
MoloBolo
Junior Member
MoloBolo began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2021
Device: Kobo Libra H20
I added this :

Code:
remove_tags = [dict(name='svg')]
and ran a book check and now I have 150+ new errors

I tried to add stuff like "class" which didn't help, and eventually decided to test it on my Libra anyway...

It's working

I didn't notice any cuts and the table of content is functional.

The full recipe :

Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
#
# 11 Jan 2021 -  L. Houpert - Major changes in the Mediapart recipe:
#   1) Summary of the article are noow available
#   2) Additional sections  International, France, Economie and Culture have
# been added through custom entries in the function my_parse_index.
#   3) Fix the cover image so it doesnt disappear from the Kindle menu
# ( cover image format is changed to .jpeg)
# 14 Jan 2021 - Add Mediapart Logo url as masthead_url and change cover
#   by overlaying the date on top of the Mediapart cover
from __future__ import unicode_literals

__license__ = 'GPL v3'
__copyright__ = '2021, Loïc Houpert <houpertloic at gmail .com>. Adapted from: 2016, Daniel Bonnery; 2009, Mathieu Godlewski; 2010-2012, Louis Gesbert'  # noqa
'''
Mediapart
'''

import re
from datetime import date, datetime, timezone, timedelta
from calibre.web.feeds import feeds_from_index
from calibre.web.feeds.news import BasicNewsRecipe


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(
        attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)}
    )


class Mediapart(BasicNewsRecipe):
    title = 'Mediapart'
    __author__ = 'Loïc Houpert'
    description = 'Global news in French from news site Mediapart'
    publication_type = 'newspaper'
    language = 'fr'
    needs_subscription = True
    oldest_article = 2

    use_embedded_content = False
    no_stylesheets = True

    keep_only_tags = [
dict(name='h1'),
dict(name='div', **classes('author')),
classes('news__heading__top__intro news__body__center__article')
]
    remove_tags = [classes('login-subscribe print-source_url')]
    remove_tags = [dict(name='svg')]
    conversion_options = {'smarten_punctuation': True}

    masthead_url = "https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart_masthead.png"
    # cover_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg'

    # --

    # Get date in french time zone format
    today = datetime.now(timezone.utc) + timedelta(hours=1)
    oldest_article_date = today - timedelta(days=oldest_article)

    feeds = [
        ('La Une', 'http://www.mediapart.fr/articles/feed'),
    ]

    # The feed at 'http://www.mediapart.fr/articles/feed' only displayed the 10
    # last elements so the articles are indexed on specific pages
    # in the function my_parse_index. In this function the article are parsed
    # using the funtion get_articles and the dict values dict_article_sources

    def parse_feeds(self):
        feeds = super(Mediapart, self).parse_feeds()
        feeds += feeds_from_index(self.my_parse_index(feeds))
        return feeds

    def my_parse_index(self, la_une):

        dict_article_sources = [
            {
                'type': 'Brèves',
                'webpage': 'https://www.mediapart.fr/journal/fil-dactualites',
                'separador': {
                    'page': 'ul',
                    'thread': 'li'
                }
            },
            {
                'type': 'International',
                'webpage': 'https://www.mediapart.fr/journal/international',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
            {
                'type': 'France',
                'webpage': 'https://www.mediapart.fr/journal/france',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
            {
                'type': 'Économie',
                'webpage': 'https://www.mediapart.fr/journal/economie',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
            {
                'type': 'Culture',
                'webpage': 'https://www.mediapart.fr/journal/culture-idees',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
        ]

        def get_articles(
            type_of_article, webpage, separador_page='ul', separador_thread='li'
        ):

            specific_articles = []

            webpage_article = []
            soup = self.index_to_soup(webpage)
            page = soup.find('main', {'class': 'global-wrapper'})
            fils = page.find(separador_page, {'class': 'post-list universe-journal'})

            all_articles = fils.findAll(separador_thread)
            for article in all_articles:
                try:
                    title = article.find('h3', recursive=False)
                    if title is None or ''.join(title['class']) == 'title-specific':
                        # print(f"[BAD title entry] Print value of title:\n {title}")
                        continue
                    # print(f"\n[OK title entry] Print value of title:\n {title}\n")

                    try:
                        article_mot_cle = article.find(
                            'a', {
                                'href': re.compile(r'.*\/mot-cle\/.*')
                            }
                        ).renderContents().decode('utf-8')
                    except Exception:
                        article_mot_cle = ''

                    try:
                        article_type = article.find(
                            'a', {
                                'href': re.compile(r'.*\/type-darticles\/.*')
                            }
                        ).renderContents().decode('utf-8')
                    except Exception:
                        article_type = ''

                    for s in title('span'):
                        s.replaceWith(s.renderContents().decode('utf-8') + "\n")
                    url = title.find('a', href=True)['href']

                    date = article.find('time', datetime=True)['datetime']
                    article_date = datetime.strptime(date, '%Y-%m-%d')
                    # Add French timezone to date of the article for date check
                    article_date = article_date.replace(tzinfo=timezone.utc) + timedelta(hours=1)
                    if article_date < self.oldest_article_date:
                        print("article_date < self.oldest_article_date\n")
                        continue

                    # print("-------- Recent article added to the list ------- \n")
                    all_authors = article.findAll(
                        'a', {'class': re.compile(r'\bjournalist\b')}
                    )
                    authors = [self.tag_to_string(a) for a in all_authors]
                    # print(f"Authors in tag <a>: {authors}")

                    # If not link to the author profile is available the
                    # html separador is a span tag
                    if not all_authors:
                        try:
                            all_authors = article.findAll(
                                'span', {'class': re.compile(r'\bjournalist\b')}
                            )
                            authors = [self.tag_to_string(a) for a in all_authors]
                            # print(f"Authors in tag <span>: {authors}")
                        except:
                            authors = 'unknown'

                    description = article.find('p').renderContents().decode('utf-8')
                    # print(f" <p> in article : {self.tag_to_string(description).strip()} ")

                    summary = {
                        'title': self.tag_to_string(title).strip(),
                        'description': description,
                        'date': article_date.strftime("%a, %d %b, %Y %H:%M"),
                        'author': ', '.join(authors),
                        'article_type': article_type,
                        'mot_cle': article_mot_cle.capitalize(),
                        'url': 'https://www.mediapart.fr' + url,
                    }

                    webpage_article.append(summary)
                except Exception:
                    pass

            specific_articles += [(type_of_article,
                                   webpage_article)] if webpage_article else []
            return specific_articles

        articles = []

        for category in dict_article_sources:
            articles += get_articles(
                category['type'], category['webpage'], category['separador']['page'],
                category['separador']['thread']
            )

        return articles

    # non-locale specific date parse (strptime("%d %b %Y",s) would work with
    # french locale)
    def parse_french_date(self, date_str):
        date_arr = date_str.lower().split()
        return date(
            day=int(date_arr[0]),
            year=int(date_arr[2]),
            month=[
                None, 'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
                'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'
            ].index(date_arr[1])
        )

    def get_browser(self):
        # -- Handle login

        def is_form_login(form):
            return "id" in form.attrs and form.attrs['id'] == "logFormEl"

        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            br.open('https://www.mediapart.fr/login')
            br.select_form(predicate=is_form_login)
            br['name'] = self.username
            br['password'] = self.password
            br.submit()
        return br

    def default_cover(self, cover_file):
        '''
        Create a generic cover for recipes that don't have a cover
        '''
        from PyQt5.Qt import QImage, QPainter, QPen, Qt, QFont, QRect
        from calibre.gui2 import ensure_app, load_builtin_fonts, pixmap_to_data

        def init_environment():
            ensure_app()
            load_builtin_fonts()

        def create_cover_mediapart(date):
            ' Create a cover for mediapart adding the date on Mediapart Cover'
            init_environment()
            # Get data
            image_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg'
            data = self.index_to_soup(image_url, raw=True)
            # Get date and hour corresponding to french time zone
            today = datetime.now(timezone.utc) + timedelta(hours=1)
            wkd = today.weekday()
            french_weekday={0:'Mon',1:'Mar',2:'Mer',3:'Jeu',4:'Ven',5:'Sam',6:'Dim'}
            day = french_weekday[wkd]+'.'
            date = day + ' ' + today.strftime('%d %b. %Y')
            edition = today.strftime('Édition de %Hh')

            # Get Cover data
            img  = QImage()
            img.loadFromData(data)

            # Overlay date on cover
            p = QPainter(img)
            pen = QPen(Qt.black)
            pen.setWidth(6)
            p.setPen(pen)
            font = QFont()
            font.setFamily('Times')
            font.setPointSize(78)
            p.setFont(font)
            r = QRect(0, 600, 744,100)
            p.drawText(r, Qt.AlignmentFlag.AlignJustify | Qt.AlignmentFlag.AlignVCenter | Qt.AlignmentFlag.AlignCenter, date)
            p.end()

            # Overlay edition information on cover
            p = QPainter(img)
            pen = QPen(Qt.black)
            pen.setWidth(4)
            p.setPen(pen)
            font = QFont()
            font.setFamily('Times')
            font.setItalic(True)
            font.setPointSize(66)
            p.setFont(font)
            # Add date
            r = QRect(0, 720, 744,100)
            p.drawText(r, Qt.AlignmentFlag.AlignJustify | Qt.AlignmentFlag.AlignVCenter | Qt.AlignmentFlag.AlignCenter, edition)
            p.end()
            return pixmap_to_data(img)

        try:
            today=datetime.today()
            date = today.strftime('%d %b %Y')
            img_data = create_cover_mediapart(date)
            cover_file.write(img_data)
            cover_file.flush()
        except Exception:
            self.log.exception('Failed to generate default cover')
            return False
        return True


calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'


I can post screenshots of the errors if anyone is interested, errors are :
  • Missing standard property 'border-bottom-right-radius' to go along with '-webkit-border-bottom-right-radius'. [stylesheet.css]
  • Link points to a location not present in the target file [feed_0/article_0/index_u27.html]
  • The linked resource 'w.colibris-lemouvement.org' does not exist [feed_0/article_2/index_u1.html]

Thanks a lot for the help !
MoloBolo is offline   Reply With Quote
Old 12-11-2021, 10:03 AM   #10
UniversalRead
Junior Member
UniversalRead began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2015
Device: Kobo Touch
Recipe updated in Calibre ?

Thank you all, worked for me too.

Oddly, I didn't make any change by myself in the code, but I can see that the recipe has been been updated within Calibre, because the code is different from the version of january 2021 (see Github page : https://github.com/lhoupert/calibre_...diapart.recipe).

Is it possible, even though I didn't updated the whole app (Calibre) ? @Kovid ?
UniversalRead is offline   Reply With Quote
Old 12-11-2021, 10:09 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,715
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
recipes are updated automatically. you dont need to update calibre for it.
kovidgoyal is offline   Reply With Quote
Old 12-11-2021, 10:16 AM   #12
UniversalRead
Junior Member
UniversalRead began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2015
Device: Kobo Touch
Forget it, the versioning system of Github indicates that Kovid officially made the update (here).
Thank you Kovid.
UniversalRead is offline   Reply With Quote
Old 12-11-2021, 10:19 AM   #13
UniversalRead
Junior Member
UniversalRead began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2015
Device: Kobo Touch
Quote:
Originally Posted by kovidgoyal View Post
recipes are updated automatically. you dont need to update calibre for it.
This is another cool feature of your app.
UniversalRead is offline   Reply With Quote
Old 03-26-2022, 08:01 AM   #14
MoloBolo
Junior Member
MoloBolo began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Nov 2021
Device: Kobo Libra H20
Hi,

I just got this error message today (the line in french means "conversion failed") :

Spoiler:
calibre, version 5.39.1 (win32, embedded-python: True)
Erreur lors de la conversion: Échoué: Récupérer des actualités à partir de Mediapart

Récupérer des actualités à partir de Mediapart
Conversion options changed from defaults:
output_profile: 'tablet'
verbose: 2
Resolved conversion options
calibre version: 5.39.1
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_split_on_page_breaks': True,
'duplicate_links_in_toc': False,
'embed_all_fonts': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'epub_inline_toc': False,
'epub_toc_at_end': False,
'epub_version': '2',
'expand_css': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x000001D758B8B070>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.TabletOutput object at 0x000001D758B8BA00>,
'page_breaks_before': None,
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'search_replace': None,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'transform_css_rules': None,
'transform_html_rules': None,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: Recipe Input running
Downloading recipe urn: custom:1000
Using user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36
Traceback (most recent call last):
File "runpy.py", line 194, in _run_module_as_main
File "runpy.py", line 87, in _run_code
File "site.py", line 82, in <module>
File "site.py", line 77, in main
File "site.py", line 49, in run_entry_point
File "calibre\utils\ipc\worker.py", line 215, in main
File "calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_recipe
File "calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "calibre\ebooks\conversion\plumber.py", line 1108, in run
File "calibre\customize\conversion.py", line 242, in __call__
File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 137, in convert
File "calibre\web\feeds\news.py", line 1056, in download
File "calibre\web\feeds\news.py", line 1233, in build_index
File "<string>", line 74, in parse_feeds
File "<string>", line 215, in my_parse_index
File "<string>", line 131, in get_articles
AttributeError: 'NoneType' object has no attribute 'find'


Not sure what's the issue ? It was working fine yesterday and I didn't change anything since.

Thanks !
MoloBolo is offline   Reply With Quote
Old 03-26-2022, 11:32 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,715
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It means the website has changed and the recipe needs to be updated.
kovidgoyal is offline   Reply With Quote
Reply

Tags
mediapart recipe calibre


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Subscribing in Mediapart [new] recipe don't work j33p Recipes 1 09-14-2020 10:14 AM
PRS-T1 Recovery Mode doesn't work anymore bookp Sony Reader Dev Corner 2 07-13-2012 07:13 PM
Connect to itunes for Calibre doesn't work anymore Marquis Apple Devices 9 02-18-2012 08:21 PM
DIE ZEIT Premium recipe doesn't work anymore Moik Recipes 1 07-16-2011 02:46 PM
Downloading a cover doesn't work anymore? JGB Calibre 13 12-05-2008 02:40 PM


All times are GMT -4. The time now is 02:42 PM.


MobileRead.com is a privately owned, operated and funded community.