Mediapart recipe doesn't work anymore

MoloBolo · 11-25-2021, 01:35 PM

Hi,

The website Mediapart was updated recently and since then the recipe doesn't work. The ePub created seems normal but the articles are blank, with just the titles.
It worked fine before the update. I had this issue a few times before but I just had to redownload a second time and that was it.

I think this is the recipe.

Is there anything I can do about this ? I checked the URLs (pretty much the only thing I understand lol). The feed in the recipe is HTTP instead of HTTPS (I changed it in a custom recipe to be sure but it didn't change anything) and the rest is ok.

Thanks !

kovidgoyal · 11-26-2021, 06:58 AM

You will likely need to change the keep_only_tags setting.

MoloBolo · 11-26-2021, 10:45 AM

Okay so I deleted lines 45 to 49

Code:

keep_only_tags = [
        dict(name='h1'),
        dict(name='div', **classes('author')),
        classes('introduction content-article')
    ]

I used Calibre's ereader and it worked (although it downloaded pretty much the whole web page instead of just the article, but I imagine that's the point).

The issue is that when I transfer the ePub to my Libra I can't access the articles. Going to the next page, even choosing a particular page, send me to the next article's first page (the one with links to the table of content).
The ePub is 1700 pages long (!!) when it used to be ~250.

The articles are there, it just won't let me read it

Thanks for your help !

kovidgoyal · 11-26-2021, 12:23 PM

You will need to figure out what to replace those with to extract the article's contents. See https://manual.calibre-ebook.com/news.html for an overview of the process.

MoloBolo · 11-27-2021, 06:37 AM

The print version looks great but the URL doesn't seem to match like the BBC exemple.

Article : https://www.mediapart.fr/journal/france/261121/les-lecons-de-l-affaire-nicolas-hulot

Print version : https://www.mediapart.fr/tools/print/996766

I can roughly see where the article is in the code but I'm not sure how to use keep/remove tags.

senacra · 12-04-2021, 06:22 AM

Hi,

I just noticed the same issue.
After analysis, i think i found a fix :
keep_only_tags = [
dict(name='h1'),
dict(name='div', **classes('author')),
classes('news__heading__top__intro news__body__center__article')
]

Working for me

MoloBolo · 12-04-2021, 10:51 AM

Thanks a lot, sadly it's not working for me. It's perfect with Calibre's ereader but the articles are randomly cut on my Libra.
It always cut just before the links to other articles ("à lire aussi"). Kepub conversion didn't help so I ran a book check and got 19 "Parsing failed: redefinition of the xmlns prefix is forbidden" errors (screen attached).

Here's the full recipe :

Spoiler:

I'm assuming those links are the issue, at least with Kobo Libra (that's where it cuts and where the errors are), but I have no idea how to fix this.

kovidgoyal · 12-04-2021, 11:50 PM

Just add svg to remove_tags in the recipe.

MoloBolo · 12-05-2021, 06:35 AM

I added this :

Code:

remove_tags = [dict(name='svg')]

and ran a book check and now I have 150+ new errors

I tried to add stuff like "class" which didn't help, and eventually decided to test it on my Libra anyway...

It's working

I didn't notice any cuts and the table of content is functional.

The full recipe :

Spoiler:

Code:

#!/usr/bin/env python
# vim:fileencoding=utf-8
#
# 11 Jan 2021 -  L. Houpert - Major changes in the Mediapart recipe:
#   1) Summary of the article are noow available
#   2) Additional sections  International, France, Economie and Culture have
# been added through custom entries in the function my_parse_index.
#   3) Fix the cover image so it doesnt disappear from the Kindle menu
# ( cover image format is changed to .jpeg)
# 14 Jan 2021 - Add Mediapart Logo url as masthead_url and change cover
#   by overlaying the date on top of the Mediapart cover
from __future__ import unicode_literals

__license__ = 'GPL v3'
__copyright__ = '2021, Loïc Houpert <houpertloic at gmail .com>. Adapted from: 2016, Daniel Bonnery; 2009, Mathieu Godlewski; 2010-2012, Louis Gesbert'  # noqa
'''
Mediapart
'''

import re
from datetime import date, datetime, timezone, timedelta
from calibre.web.feeds import feeds_from_index
from calibre.web.feeds.news import BasicNewsRecipe


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(
        attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)}
    )


class Mediapart(BasicNewsRecipe):
    title = 'Mediapart'
    __author__ = 'Loïc Houpert'
    description = 'Global news in French from news site Mediapart'
    publication_type = 'newspaper'
    language = 'fr'
    needs_subscription = True
    oldest_article = 2

    use_embedded_content = False
    no_stylesheets = True

    keep_only_tags = [
dict(name='h1'),
dict(name='div', **classes('author')),
classes('news__heading__top__intro news__body__center__article')
]
    remove_tags = [classes('login-subscribe print-source_url')]
    remove_tags = [dict(name='svg')]
    conversion_options = {'smarten_punctuation': True}

    masthead_url = "https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart_masthead.png"
    # cover_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg'

    # --

    # Get date in french time zone format
    today = datetime.now(timezone.utc) + timedelta(hours=1)
    oldest_article_date = today - timedelta(days=oldest_article)

    feeds = [
        ('La Une', 'http://www.mediapart.fr/articles/feed'),
    ]

    # The feed at 'http://www.mediapart.fr/articles/feed' only displayed the 10
    # last elements so the articles are indexed on specific pages
    # in the function my_parse_index. In this function the article are parsed
    # using the funtion get_articles and the dict values dict_article_sources

    def parse_feeds(self):
        feeds = super(Mediapart, self).parse_feeds()
        feeds += feeds_from_index(self.my_parse_index(feeds))
        return feeds

    def my_parse_index(self, la_une):

        dict_article_sources = [
            {
                'type': 'Brèves',
                'webpage': 'https://www.mediapart.fr/journal/fil-dactualites',
                'separador': {
                    'page': 'ul',
                    'thread': 'li'
                }
            },
            {
                'type': 'International',
                'webpage': 'https://www.mediapart.fr/journal/international',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
            {
                'type': 'France',
                'webpage': 'https://www.mediapart.fr/journal/france',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
            {
                'type': 'Économie',
                'webpage': 'https://www.mediapart.fr/journal/economie',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
            {
                'type': 'Culture',
                'webpage': 'https://www.mediapart.fr/journal/culture-idees',
                'separador': {
                    'page': 'div',
                    'thread': 'div'
                }
            },
        ]

        def get_articles(
            type_of_article, webpage, separador_page='ul', separador_thread='li'
        ):

            specific_articles = []

            webpage_article = []
            soup = self.index_to_soup(webpage)
            page = soup.find('main', {'class': 'global-wrapper'})
            fils = page.find(separador_page, {'class': 'post-list universe-journal'})

            all_articles = fils.findAll(separador_thread)
            for article in all_articles:
                try:
                    title = article.find('h3', recursive=False)
                    if title is None or ''.join(title['class']) == 'title-specific':
                        # print(f"[BAD title entry] Print value of title:\n {title}")
                        continue
                    # print(f"\n[OK title entry] Print value of title:\n {title}\n")

                    try:
                        article_mot_cle = article.find(
                            'a', {
                                'href': re.compile(r'.*\/mot-cle\/.*')
                            }
                        ).renderContents().decode('utf-8')
                    except Exception:
                        article_mot_cle = ''

                    try:
                        article_type = article.find(
                            'a', {
                                'href': re.compile(r'.*\/type-darticles\/.*')
                            }
                        ).renderContents().decode('utf-8')
                    except Exception:
                        article_type = ''

                    for s in title('span'):
                        s.replaceWith(s.renderContents().decode('utf-8') + "\n")
                    url = title.find('a', href=True)['href']

                    date = article.find('time', datetime=True)['datetime']
                    article_date = datetime.strptime(date, '%Y-%m-%d')
                    # Add French timezone to date of the article for date check
                    article_date = article_date.replace(tzinfo=timezone.utc) + timedelta(hours=1)
                    if article_date < self.oldest_article_date:
                        print("article_date < self.oldest_article_date\n")
                        continue

                    # print("-------- Recent article added to the list ------- \n")
                    all_authors = article.findAll(
                        'a', {'class': re.compile(r'\bjournalist\b')}
                    )
                    authors = [self.tag_to_string(a) for a in all_authors]
                    # print(f"Authors in tag <a>: {authors}")

                    # If not link to the author profile is available the
                    # html separador is a span tag
                    if not all_authors:
                        try:
                            all_authors = article.findAll(
                                'span', {'class': re.compile(r'\bjournalist\b')}
                            )
                            authors = [self.tag_to_string(a) for a in all_authors]
                            # print(f"Authors in tag <span>: {authors}")
                        except:
                            authors = 'unknown'

                    description = article.find('p').renderContents().decode('utf-8')
                    # print(f" <p> in article : {self.tag_to_string(description).strip()} ")

                    summary = {
                        'title': self.tag_to_string(title).strip(),
                        'description': description,
                        'date': article_date.strftime("%a, %d %b, %Y %H:%M"),
                        'author': ', '.join(authors),
                        'article_type': article_type,
                        'mot_cle': article_mot_cle.capitalize(),
                        'url': 'https://www.mediapart.fr' + url,
                    }

                    webpage_article.append(summary)
                except Exception:
                    pass

            specific_articles += [(type_of_article,
                                   webpage_article)] if webpage_article else []
            return specific_articles

        articles = []

        for category in dict_article_sources:
            articles += get_articles(
                category['type'], category['webpage'], category['separador']['page'],
                category['separador']['thread']
            )

        return articles

    # non-locale specific date parse (strptime("%d %b %Y",s) would work with
    # french locale)
    def parse_french_date(self, date_str):
        date_arr = date_str.lower().split()
        return date(
            day=int(date_arr[0]),
            year=int(date_arr[2]),
            month=[
                None, 'janvier', 'février', 'mars', 'avril', 'mai', 'juin',
                'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'
            ].index(date_arr[1])
        )

    def get_browser(self):
        # -- Handle login

        def is_form_login(form):
            return "id" in form.attrs and form.attrs['id'] == "logFormEl"

        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            br.open('https://www.mediapart.fr/login')
            br.select_form(predicate=is_form_login)
            br['name'] = self.username
            br['password'] = self.password
            br.submit()
        return br

    def default_cover(self, cover_file):
        '''
        Create a generic cover for recipes that don't have a cover
        '''
        from PyQt5.Qt import QImage, QPainter, QPen, Qt, QFont, QRect
        from calibre.gui2 import ensure_app, load_builtin_fonts, pixmap_to_data

        def init_environment():
            ensure_app()
            load_builtin_fonts()

        def create_cover_mediapart(date):
            ' Create a cover for mediapart adding the date on Mediapart Cover'
            init_environment()
            # Get data
            image_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg'
            data = self.index_to_soup(image_url, raw=True)
            # Get date and hour corresponding to french time zone
            today = datetime.now(timezone.utc) + timedelta(hours=1)
            wkd = today.weekday()
            french_weekday={0:'Mon',1:'Mar',2:'Mer',3:'Jeu',4:'Ven',5:'Sam',6:'Dim'}
            day = french_weekday[wkd]+'.'
            date = day + ' ' + today.strftime('%d %b. %Y')
            edition = today.strftime('Édition de %Hh')

            # Get Cover data
            img  = QImage()
            img.loadFromData(data)

            # Overlay date on cover
            p = QPainter(img)
            pen = QPen(Qt.black)
            pen.setWidth(6)
            p.setPen(pen)
            font = QFont()
            font.setFamily('Times')
            font.setPointSize(78)
            p.setFont(font)
            r = QRect(0, 600, 744,100)
            p.drawText(r, Qt.AlignmentFlag.AlignJustify | Qt.AlignmentFlag.AlignVCenter | Qt.AlignmentFlag.AlignCenter, date)
            p.end()

            # Overlay edition information on cover
            p = QPainter(img)
            pen = QPen(Qt.black)
            pen.setWidth(4)
            p.setPen(pen)
            font = QFont()
            font.setFamily('Times')
            font.setItalic(True)
            font.setPointSize(66)
            p.setFont(font)
            # Add date
            r = QRect(0, 720, 744,100)
            p.drawText(r, Qt.AlignmentFlag.AlignJustify | Qt.AlignmentFlag.AlignVCenter | Qt.AlignmentFlag.AlignCenter, edition)
            p.end()
            return pixmap_to_data(img)

        try:
            today=datetime.today()
            date = today.strftime('%d %b %Y')
            img_data = create_cover_mediapart(date)
            cover_file.write(img_data)
            cover_file.flush()
        except Exception:
            self.log.exception('Failed to generate default cover')
            return False
        return True


calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'

I can post screenshots of the errors if anyone is interested, errors are :

Missing standard property 'border-bottom-right-radius' to go along with '-webkit-border-bottom-right-radius'. [stylesheet.css]
Link points to a location not present in the target file [feed_0/article_0/index_u27.html]
The linked resource 'w.colibris-lemouvement.org' does not exist [feed_0/article_2/index_u1.html]

Thanks a lot for the help !

UniversalRead · 12-11-2021, 09:03 AM

Thank you all, worked for me too.

Oddly, I didn't make any change by myself in the code, but I can see that the recipe has been been updated within Calibre, because the code is different from the version of january 2021 (see Github page : https://github.com/lhoupert/calibre_...diapart.recipe).

Is it possible, even though I didn't updated the whole app (Calibre) ? @Kovid ?

kovidgoyal · 12-11-2021, 09:09 AM

recipes are updated automatically. you dont need to update calibre for it.

UniversalRead · 12-11-2021, 09:16 AM

Forget it, the versioning system of Github indicates that Kovid officially made the update (here).
Thank you Kovid.

UniversalRead · 12-11-2021, 09:19 AM

Quote:

Originally Posted by kovidgoyal

recipes are updated automatically. you dont need to update calibre for it.

This is another cool feature of your app.

MoloBolo · 03-26-2022, 07:01 AM

Hi,

I just got this error message today (the line in french means "conversion failed") :

Spoiler:

Not sure what's the issue ? It was working fine yesterday and I didn't change anything since.

Thanks !

kovidgoyal · 03-26-2022, 10:32 PM

It means the website has changed and the recipe needs to be updated.

11-25-2021, 01:35 PM	#1
MoloBolo Junior Member Posts: 8 Karma: 10 Join Date: Nov 2021 Device: Kobo Libra H20	Mediapart recipe doesn't work anymore Hi, The website Mediapart was updated recently and since then the recipe doesn't work. The ePub created seems normal but the articles are blank, with just the titles. It worked fine before the update. I had this issue a few times before but I just had to redownload a second time and that was it. I think this is the recipe. Is there anything I can do about this ? I checked the URLs (pretty much the only thing I understand lol). The feed in the recipe is HTTP instead of HTTPS (I changed it in a custom recipe to be sure but it didn't change anything) and the rest is ok. Thanks !

12-04-2021, 10:51 AM	#7
MoloBolo Junior Member Posts: 8 Karma: 10 Join Date: Nov 2021 Device: Kobo Libra H20	Thanks a lot, sadly it's not working for me. It's perfect with Calibre's ereader but the articles are randomly cut on my Libra. It always cut just before the links to other articles ("à lire aussi"). Kepub conversion didn't help so I ran a book check and got 19 "Parsing failed: redefinition of the xmlns prefix is forbidden" errors (screen attached). Here's the full recipe : Spoiler: #!/usr/bin/env python # vim:fileencoding=utf-8 # # 11 Jan 2021 - L. Houpert - Major changes in the Mediapart recipe: # 1) Summary of the article are noow available # 2) Additional sections International, France, Economie and Culture have # been added through custom entries in the function my_parse_index. # 3) Fix the cover image so it doesnt disappear from the Kindle menu # ( cover image format is changed to .jpeg) # 14 Jan 2021 - Add Mediapart Logo url as masthead_url and change cover # by overlaying the date on top of the Mediapart cover from __future__ import unicode_literals __license__ = 'GPL v3' __copyright__ = '2021, Loïc Houpert <houpertloic at gmail .com>. Adapted from: 2016, Daniel Bonnery; 2009, Mathieu Godlewski; 2010-2012, Louis Gesbert' # noqa ''' Mediapart ''' import re from datetime import date, datetime, timezone, timedelta from calibre.web.feeds import feeds_from_index from calibre.web.feeds.news import BasicNewsRecipe def classes(classes): q = frozenset(classes.split(' ')) return dict( attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)} ) class Mediapart(BasicNewsRecipe): title = 'Mediapart' __author__ = 'Loïc Houpert' description = 'Global news in French from news site Mediapart' publication_type = 'newspaper' language = 'fr' needs_subscription = True oldest_article = 2 use_embedded_content = False no_stylesheets = True keep_only_tags = [ dict(name='h1'), dict(name='div', *classes('author')), classes('news__heading__top__intro news__body__center__article') ] remove_tags = [classes('login-subscribe print-source_url')] conversion_options = {'smarten_punctuation': True} masthead_url = "https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart_masthead.png" # cover_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg' # -- # Get date in french time zone format today = datetime.now(timezone.utc) + timedelta(hours=1) oldest_article_date = today - timedelta(days=oldest_article) feeds = [ ('La Une', 'http://www.mediapart.fr/articles/feed'), ] # The feed at 'http://www.mediapart.fr/articles/feed' only displayed the 10 # last elements so the articles are indexed on specific pages # in the function my_parse_index. In this function the article are parsed # using the funtion get_articles and the dict values dict_article_sources def parse_feeds(self): feeds = super(Mediapart, self).parse_feeds() feeds += feeds_from_index(self.my_parse_index(feeds)) return feeds def my_parse_index(self, la_une): dict_article_sources = [ { 'type': 'Brèves', 'webpage': 'https://www.mediapart.fr/journal/fil-dactualites', 'separador': { 'page': 'ul', 'thread': 'li' } }, { 'type': 'International', 'webpage': 'https://www.mediapart.fr/journal/international', 'separador': { 'page': 'div', 'thread': 'div' } }, { 'type': 'France', 'webpage': 'https://www.mediapart.fr/journal/france', 'separador': { 'page': 'div', 'thread': 'div' } }, { 'type': 'Économie', 'webpage': 'https://www.mediapart.fr/journal/economie', 'separador': { 'page': 'div', 'thread': 'div' } }, { 'type': 'Culture', 'webpage': 'https://www.mediapart.fr/journal/culture-idees', 'separador': { 'page': 'div', 'thread': 'div' } }, ] def get_articles( type_of_article, webpage, separador_page='ul', separador_thread='li' ): specific_articles = [] webpage_article = [] soup = self.index_to_soup(webpage) page = soup.find('main', {'class': 'global-wrapper'}) fils = page.find(separador_page, {'class': 'post-list universe-journal'}) all_articles = fils.findAll(separador_thread) for article in all_articles: try: title = article.find('h3', recursive=False) if title is None or ''.join(title['class']) == 'title-specific': # print(f"[BAD title entry] Print value of title:\n {title}") continue # print(f"\n[OK title entry] Print value of title:\n {title}\n") try: article_mot_cle = article.find( 'a', { 'href': re.compile(r'.\/mot-cle\/.') } ).renderContents().decode('utf-8') except Exception: article_mot_cle = '' try: article_type = article.find( 'a', { 'href': re.compile(r'.\/type-darticles\/.*') } ).renderContents().decode('utf-8') except Exception: article_type = '' for s in title('span'): s.replaceWith(s.renderContents().decode('utf-8') + "\n") url = title.find('a', href=True)['href'] date = article.find('time', datetime=True)['datetime'] article_date = datetime.strptime(date, '%Y-%m-%d') # Add French timezone to date of the article for date check article_date = article_date.replace(tzinfo=timezone.utc) + timedelta(hours=1) if article_date < self.oldest_article_date: print("article_date < self.oldest_article_date\n") continue # print("-------- Recent article added to the list ------- \n") all_authors = article.findAll( 'a', {'class': re.compile(r'\bjournalist\b')} ) authors = [self.tag_to_string(a) for a in all_authors] # print(f"Authors in tag <a>: {authors}") # If not link to the author profile is available the # html separador is a span tag if not all_authors: try: all_authors = article.findAll( 'span', {'class': re.compile(r'\bjournalist\b')} ) authors = [self.tag_to_string(a) for a in all_authors] # print(f"Authors in tag <span>: {authors}") except: authors = 'unknown' description = article.find('p').renderContents().decode('utf-8') # print(f" <p> in article : {self.tag_to_string(description).strip()} ") summary = { 'title': self.tag_to_string(title).strip(), 'description': description, 'date': article_date.strftime("%a, %d %b, %Y %H:%M"), 'author': ', '.join(authors), 'article_type': article_type, 'mot_cle': article_mot_cle.capitalize(), 'url': 'https://www.mediapart.fr' + url, } webpage_article.append(summary) except Exception: pass specific_articles += [(type_of_article, webpage_article)] if webpage_article else [] return specific_articles articles = [] for category in dict_article_sources: articles += get_articles( category['type'], category['webpage'], category['separador']['page'], category['separador']['thread'] ) return articles # non-locale specific date parse (strptime("%d %b %Y",s) would work with # french locale) def parse_french_date(self, date_str): date_arr = date_str.lower().split() return date( day=int(date_arr[0]), year=int(date_arr[2]), month=[ None, 'janvier', 'février', 'mars', 'avril', 'mai', 'juin', 'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre' ].index(date_arr[1]) ) def get_browser(self): # -- Handle login def is_form_login(form): return "id" in form.attrs and form.attrs['id'] == "logFormEl" br = BasicNewsRecipe.get_browser(self) if self.username is not None and self.password is not None: br.open('https://www.mediapart.fr/login') br.select_form(predicate=is_form_login) br['name'] = self.username br['password'] = self.password br.submit() return br def default_cover(self, cover_file): ''' Create a generic cover for recipes that don't have a cover ''' from PyQt5.Qt import QImage, QPainter, QPen, Qt, QFont, QRect from calibre.gui2 import ensure_app, load_builtin_fonts, pixmap_to_data def init_environment(): ensure_app() load_builtin_fonts() def create_cover_mediapart(date): ' Create a cover for mediapart adding the date on Mediapart Cover' init_environment() # Get data image_url = 'https://raw.githubusercontent.com/lhoupert/calibre_contrib/main/mediapart.jpeg' data = self.index_to_soup(image_url, raw=True) # Get date and hour corresponding to french time zone today = datetime.now(timezone.utc) + timedelta(hours=1) wkd = today.weekday() french_weekday={0:'Mon',1:'Mar',2:'Mer',3:'Jeu',4: 'Ven',5:'Sam',6:'Dim'} day = french_weekday[wkd]+'.' date = day + ' ' + today.strftime('%d %b. %Y') edition = today.strftime('Édition de %Hh') # Get Cover data img = QImage() img.loadFromData(data) # Overlay date on cover p = QPainter(img) pen = QPen(Qt.black) pen.setWidth(6) p.setPen(pen) font = QFont() font.setFamily('Times') font.setPointSize(78) p.setFont(font) r = QRect(0, 600, 744,100) p.drawText(r, Qt.AlignmentFlag.AlignJustify \| Qt.AlignmentFlag.AlignVCenter \| Qt.AlignmentFlag.AlignCenter, date) p.end() # Overlay edition information on cover p = QPainter(img) pen = QPen(Qt.black) pen.setWidth(4) p.setPen(pen) font = QFont() font.setFamily('Times') font.setItalic(True) font.setPointSize(66) p.setFont(font) # Add date r = QRect(0, 720, 744,100) p.drawText(r, Qt.AlignmentFlag.AlignJustify \| Qt.AlignmentFlag.AlignVCenter \| Qt.AlignmentFlag.AlignCenter, edition) p.end() return pixmap_to_data(img) try: today=datetime.today() date = today.strftime('%d %b %Y') img_data = create_cover_mediapart(date) cover_file.write(img_data) cover_file.flush() except Exception: self.log.exception('Failed to generate default cover') return False return True calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36' I'm assuming those links are the issue, at least with Kobo Libra (that's where it cuts and where the errors are), but I have no idea how to fix this. Attached Thumbnails

12-11-2021, 09:03 AM	#10
UniversalRead Junior Member Posts: 7 Karma: 10 Join Date: Feb 2015 Device: Kobo Touch	Recipe updated in Calibre ? Thank you all, worked for me too. Oddly, I didn't make any change by myself in the code, but I can see that the recipe has been been updated within Calibre, because the code is different from the version of january 2021 (see Github page : https://github.com/lhoupert/calibre_...diapart.recipe). Is it possible, even though I didn't updated the whole app (Calibre) ? @Kovid ?

03-26-2022, 07:01 AM	#14
MoloBolo Junior Member Posts: 8 Karma: 10 Join Date: Nov 2021 Device: Kobo Libra H20	Hi, I just got this error message today (the line in french means "conversion failed") : Spoiler: calibre, version 5.39.1 (win32, embedded-python: True) Erreur lors de la conversion: Échoué: Récupérer des actualités à partir de Mediapart Récupérer des actualités à partir de Mediapart Conversion options changed from defaults: output_profile: 'tablet' verbose: 2 Resolved conversion options calibre version: 5.39.1 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'epub_version': '2', 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x000001D758B8B070>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.TabletOutput object at 0x000001D758B8BA00>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'transform_css_rules': None, 'transform_html_rules': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Downloading recipe urn: custom:1000 Using user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36 Traceback (most recent call last): File "runpy.py", line 194, in _run_module_as_main File "runpy.py", line 87, in _run_code File "site.py", line 82, in <module> File "site.py", line 77, in main File "site.py", line 49, in run_entry_point File "calibre\utils\ipc\worker.py", line 215, in main File "calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_recipe File "calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "calibre\ebooks\conversion\plumber.py", line 1108, in run File "calibre\customize\conversion.py", line 242, in __call__ File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 137, in convert File "calibre\web\feeds\news.py", line 1056, in download File "calibre\web\feeds\news.py", line 1233, in build_index File "<string>", line 74, in parse_feeds File "<string>", line 215, in my_parse_index File "<string>", line 131, in get_articles AttributeError: 'NoneType' object has no attribute 'find' Not sure what's the issue ? It was working fine yesterday and I didn't change anything since. Thanks !

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Subscribing in Mediapart [new] recipe don't work	j33p	Recipes	1	09-14-2020 09:14 AM
PRS-T1 Recovery Mode doesn't work anymore	bookp	Sony Reader Dev Corner	2	07-13-2012 06:13 PM
Connect to itunes for Calibre doesn't work anymore	Marquis	Apple Devices	9	02-18-2012 07:21 PM
DIE ZEIT Premium recipe doesn't work anymore	Moik	Recipes	1	07-16-2011 01:46 PM
Downloading a cover doesn't work anymore?	JGB	Calibre	13	12-05-2008 01:40 PM

11-26-2021, 06:58 AM	#2
kovidgoyal creator of calibre Posts: 45,334 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You will likely need to change the keep_only_tags setting.

11-26-2021, 10:45 AM	#3
MoloBolo Junior Member Posts: 8 Karma: 10 Join Date: Nov 2021 Device: Kobo Libra H20	Okay so I deleted lines 45 to 49 Code: keep_only_tags = [ dict(name='h1'), dict(name='div', **classes('author')), classes('introduction content-article') ] I used Calibre's ereader and it worked (although it downloaded pretty much the whole web page instead of just the article, but I imagine that's the point). The issue is that when I transfer the ePub to my Libra I can't access the articles. Going to the next page, even choosing a particular page, send me to the next article's first page (the one with links to the table of content). The ePub is 1700 pages long (!!) when it used to be ~250. The articles are there, it just won't let me read it Thanks for your help !

11-26-2021, 12:23 PM	#4
kovidgoyal creator of calibre Posts: 45,334 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You will need to figure out what to replace those with to extract the article's contents. See https://manual.calibre-ebook.com/news.html for an overview of the process.

11-27-2021, 06:37 AM	#5
MoloBolo Junior Member Posts: 8 Karma: 10 Join Date: Nov 2021 Device: Kobo Libra H20	The print version looks great but the URL doesn't seem to match like the BBC exemple. Article : https://www.mediapart.fr/journal/france/261121/les-lecons-de-l-affaire-nicolas-hulot Print version : https://www.mediapart.fr/tools/print/996766 I can roughly see where the article is in the code but I'm not sure how to use keep/remove tags.

12-04-2021, 06:22 AM	#6
senacra Junior Member Posts: 1 Karma: 10 Join Date: Dec 2021 Device: Kindle 4	Hi, I just noticed the same issue. After analysis, i think i found a fix : keep_only_tags = [ dict(name='h1'), dict(name='div', **classes('author')), classes('news__heading__top__intro news__body__center__article') ] Working for me

12-04-2021, 11:50 PM	#8
kovidgoyal creator of calibre Posts: 45,334 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Just add svg to remove_tags in the recipe.

12-11-2021, 09:09 AM	#11
kovidgoyal creator of calibre Posts: 45,334 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	recipes are updated automatically. you dont need to update calibre for it.

12-11-2021, 09:16 AM	#12
UniversalRead Junior Member Posts: 7 Karma: 10 Join Date: Feb 2015 Device: Kobo Touch	Forget it, the versioning system of Github indicates that Kovid officially made the update (here). Thank you Kovid.

03-26-2022, 10:32 PM	#15
kovidgoyal creator of calibre Posts: 45,334 Karma: 27182818 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It means the website has changed and the recipe needs to be updated.

Advert

Advert