Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-13-2019, 02:48 PM   #1
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Replacement of Replacement Character

Once that I'm about to adjust my news download, I've still got a tiny little question: My news have in the online original quotation marks of this sort:
Code:
„...“
.
In the downloaded news they are replaced by the replacement character:
Code:
�...�
.
No big problem, but ... ugly.
Is it possible to edit the recipe in a way that replaces the replacement characters by quotation marks (of any kind)?
The original site is encoded in ISO-8859-1, and so is the encoding of the recipe. I replaced it by utf-8, but this didn't help.
Leonatus is offline   Reply With Quote
Old 05-14-2019, 02:32 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,303
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Make sure the encoding field in the recipe matches the encoding of the website and you will be fine. if you want to do search and replace in the recipe you can use preprocess_regexps
kovidgoyal is offline   Reply With Quote
Advert
Old 05-14-2019, 02:49 AM   #3
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
This has been the first thing I've been trying in spite of my technical ignorance: to check if the encoding of the original website where the news is from corresponded to the encoding of the recipe - and to my astonishment it did. So this is not the culprit, as it seems.

How do I use preprocess_regexps "step by step", please (for I'm really technically ignorant, sorry)?

Edit: In the mean time I noticed that in single articles the quotation marks are displayed correctly, maintaining the same source code as the other articles. Hm .. the thing becomes interesting.

Edit': There is one difference, however: In the articles with replacement character, quotes are represented by „...“, whereas in the correctly dispayed articles they are "...".

Last edited by Leonatus; 05-14-2019 at 03:22 AM.
Leonatus is offline   Reply With Quote
Old 05-14-2019, 06:18 AM   #4
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
I read in Calibre's documentation that the preprocess_regexps should look like that:
Code:
preprocess_regexps = [
   (re.compile(r'<!--Article ends here-->.*</body>', re.DOTALL|re.IGNORECASE),
    lambda match: '</body>'),
]
Unfortunately, I have no idea how to progreed in order replace all „ and “ by ". Could one of the pros here give me, please, a hint how to do this?
Leonatus is offline   Reply With Quote
Old 05-14-2019, 02:14 PM   #5
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Untested:

Code:
preprocess_regexps = [
   (re.compile(r'[„“]'),
    lambda match: '"'),
]
siebert is offline   Reply With Quote
Advert
Old 05-14-2019, 02:56 PM   #6
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Thank you, but doesn't work. The replacement characters still appear.
Leonatus is offline   Reply With Quote
Old 05-14-2019, 03:05 PM   #7
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
I don't think I ever used unicode in regular expressions. Did you just copy my code or did you try to replace the „“ chars in it with the ones copied from the source webpage?

Otherwise this variant might work better:

Code:
preprocess_regexps = [
   (re.compile(r'„|“'),
    lambda match: '"'),
]
Or you could post the whole recipe here, so I can test it.
siebert is offline   Reply With Quote
Old 05-14-2019, 03:14 PM   #8
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
The variant didn't work either. I had simply copy/pasted the code fromyour post, the characters reproduced in #1 beeing originally copied from the website resp. the ebook-viewer of Calibre (the display is the same as on my reader).
The recipe is originally this:
Code:
from calibre.web.feeds.news import BasicNewsRecipe


class AdvancedUserRecipe1295262156(BasicNewsRecipe):
    title = u'kath.net'
    __author__ = 'Bobus'
    description = u'Katholische Nachrichten'
    oldest_article = 7
    language = 'de'
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'iso-8859-1'

    feeds = [(u'kath.net', u'https://www.kath.net/2005/xml/index.xml')]

    def print_version(self, url):
        return url + "/print/yes"

    def get_browser(self, *a, **kwargs):
        kwargs['verify_ssl_certificates'] = False
        return BasicNewsRecipe.get_browser(self, *a, **kwargs)

    extra_css = 'td.textb {font-size: medium;}'
thank you for testing!
Leonatus is offline   Reply With Quote
Old 05-14-2019, 04:12 PM   #9
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Sorry, all the things I googled and tried didn't work. I'm running out of ideas.
siebert is offline   Reply With Quote
Old 05-15-2019, 12:49 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,303
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you need to replace the replacement character, not the quote, since the quote will already have been repaced by the replacement character at the time preprocess_regexp runs
kovidgoyal is offline   Reply With Quote
Old 05-15-2019, 02:06 AM   #11
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Quote:
Originally Posted by kovidgoyal View Post
you need to replace the replacement character, not the quote, since the quote will already have been repaced by the replacement character at the time preprocess_regexp runs
Hm, that has been my consideration, too, but it didn't work either at least following Siebert's suggestion. Anyway, thanks for the help!
Leonatus is offline   Reply With Quote
Old 05-15-2019, 10:21 AM   #12
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Should I perhaps escape the replacement character, and how do I do this?
Leonatus is offline   Reply With Quote
Old 05-15-2019, 10:51 AM   #13
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,341
Karma: 58032210
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Leonatus View Post
Should I perhaps escape the replacement character, and how do I do this?
the backslash is the 'escape'. \\ allows the \ to be the target.
in theory you could escape any character \e\s\c\a\p\e
(if in doubt, I escape symbols I search for. Not all, really need to be escaped)
theducks is offline   Reply With Quote
Old 05-15-2019, 11:11 AM   #14
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,033
Karma: 11123121
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
Quote:
Originally Posted by theducks View Post
the backslash is the 'escape'. \\ allows the \ to be the target.
in theory you could escape any character \e\s\c\a\p\e
(if in doubt, I escape symbols I search for. Not all, really need to be escaped)
I did this, but at no avail. My thought now is that perhaps the ISO 8859-1 code for the replacement character should be searched for, but this is very much beyond my capacities.
Edit: In Wikipedia Specials (Unicode block) I found this: "... It has become increasingly common for software to interpret invalid UTF-8 by guessing the bytes are in another byte-based encoding such as ISO-8859-1."

Last edited by Leonatus; 05-15-2019 at 11:19 AM.
Leonatus is offline   Reply With Quote
Old 05-15-2019, 07:40 PM   #15
lui1
Enthusiast
lui1 began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
According to wikipedia (see ISO-8859-1 and Windows-1252) webpages and emails are commonly mislabeled with the encoding ISO-8859-1 when it should be Windows-1252. Most web browsers and email clients will treat this encoding as Windows-1252. This practice is so prevalent that it became part of the HTML5 specification. So any webpage which claims to be encoded with ISO-8859-1 should be treated as being encoded with Windows-1252.

Code:
encoding = 'windows-1252'

Last edited by lui1; 05-15-2019 at 07:51 PM. Reason: fix typos
lui1 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Should I go for a replacement? n33raj18 Amazon Kindle 14 08-28-2014 07:18 AM
Replacement Character Frustration amo48 Sigil 4 05-18-2012 12:43 PM
Touch Replacement Plan PeterT Kobo Reader 3 06-18-2011 08:09 PM
regex for character replacement, em-dash questions cybmole Calibre 3 10-18-2010 03:09 PM
PRS-600 So, should I ask for a replacement? ziegl027 Sony Reader 8 01-25-2010 10:40 AM


All times are GMT -4. The time now is 04:23 AM.


MobileRead.com is a privately owned, operated and funded community.