07-07-2023, 10:08 AM | #1 |
Connoisseur
Posts: 64
Karma: 10
Join Date: May 2016
Device: Koreader running on Kobo Libra 2
|
Remove-tag when class contains a whitespace
Hello
How can I remove this div tag : <div class="actions-under-image js-paywall-remove-element"> How can I deal with the whitespace between the two elements ? Thank you Villard |
07-07-2023, 10:17 PM | #2 |
creator of calibre
Posts: 44,017
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
spaces are handled automatically, just specify the class name you want to remove.
|
07-08-2023, 03:51 PM | #3 |
Connoisseur
Posts: 64
Karma: 10
Join Date: May 2016
Device: Koreader running on Kobo Libra 2
|
Thanks Kovid for your reply.
I'd tried : remove_tags = [dict(name='div', class_='actions-under-image js-paywall-remove-element')] but it does not work. I must have an error somewhere else. I have to investigate further to find it. Villard |
07-09-2023, 12:31 AM | #4 |
creator of calibre
Posts: 44,017
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
dict(name='div', attrs={'class':['actions_under_image']}) |
07-09-2023, 03:08 AM | #5 |
Connoisseur
Posts: 64
Karma: 10
Join Date: May 2016
Device: Koreader running on Kobo Libra 2
|
It does not work.
I think the original website may have some problems because when I run the recipe with only the remove_tags instruction (see below), I get no news at all : the section page is blank and no article is included ! So this remove_tags instruction has an effect on the section page !! If I replace the remove_tags instruction by an other one like remove_tags = [dict(name='div', class_='read-also')], it works fine : the section page is correct and the article page is cleaned as required. I will investigate more. Villard class LaCroix(BasicNewsRecipe):
title = 'La Croix' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = False no_stylesheets = True needs_subscription = 'optional' language = 'fr' feeds = [ ('Actualités : France', 'https://www.la-croix.com/RSS/UNIVERS_WFRA'), ] remove_tags = dict(name='div', attrs={'class':['actions-under-image js-paywall-remove-element']}) Last edited by Villard; 07-09-2023 at 04:17 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
why does every tag have a class? | hobnail | ePub | 15 | 03-24-2022 09:39 AM |
Tag Mapper: Remove tag if another specific tag exists? | ownedbycats | Library Management | 2 | 07-23-2020 10:32 PM |
Wondering if there is a way to remove end tag with beginning tag | LadyKate | Editor | 5 | 06-29-2016 04:32 PM |
How do you remove class="whitespace"? | greenlees | Conversion | 8 | 07-03-2011 02:54 AM |
Detect chapters without using tag or class. | tonyx3 | Calibre | 21 | 09-14-2010 09:30 PM |