Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 08-24-2024, 12:59 PM   #1
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,069
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Cloudfare problem

Hi everyone.

I have a plugin called Skoob Sync. It uses webscraping to sync the user info. Now, the site started using Cloudfare when you try to login, giving me a 'HTTP Error 403: Forbidden'. The metadata plugin (Skoob Books) is not affected as it does not need to login.

I know this is not a calibre question, but since many of the plugin developers use webscraping, I thought that I might get some help here. Is there any way to bypass the Cloudfare check?

This is the relevant code for login:

Spoiler:

Code:
            self.opener = six.moves.urllib.request.build_opener(six.moves.urllib.request.HTTPCookieProcessor(self.cj))

            # Install our opener (note that this changes the global opener to the one
            # we just made, but you can also just call opener.open() if you want)
            six.moves.urllib.request.install_opener(self.opener)

            # Authentication page
            authentication_url = 'https://api.skoob.com.br/login'

            # Credentials
            payload = {
                'data[Usuario][email]': self.prefs['user'],
                'data[Usuario][senha]': self.key_password,
            }

            # Use urllib to encode the payload
            data = six.moves.urllib.parse.urlencode(payload).encode()

            # Build our Request object (supplying 'data' makes it a POST)
            login_req = six.moves.urllib.request.Request(authentication_url, data, headers=random_ua())

            login = six.moves.urllib.request.urlopen(login_req)

Spoiler:

Code:
# Get a random user agent from calibre. This is used on Skoob access.
def random_ua():
    try:
        from calibre import random_user_agent
        try:
            hdr = {'User-Agent': random_user_agent(allow_ie=False)}
            return hdr
        except TypeError:
            hdr = {'User-Agent': random_user_agent()}
            return hdr
    except ImportError:
        hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT .1; Win64; x64)'}
        return hdr

Last edited by thiago.eec; 08-24-2024 at 01:49 PM.
thiago.eec is offline   Reply With Quote
Old 08-26-2024, 07:24 PM   #2
JimmXinu
Plugin Developer
JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.
 
JimmXinu's Avatar
 
Posts: 6,590
Karma: 4600349
Join Date: Dec 2011
Location: Midwest USA
Device: Kindle Paperwhite(10th)
I'll offer a few pointers, but I haven't researched it recently.

Cloudfare has more than one level of blocking it can be running at for a given site.

Some lower levels can be bypassed with cloudscraper.

FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare.

FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached.

The actual cache reading code isn't mine and I don't pretend to understand it more than superficially.
JimmXinu is online now   Reply With Quote
Advert
Old 08-27-2024, 12:43 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,509
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py

Last edited by kovidgoyal; 08-27-2024 at 04:03 AM.
kovidgoyal is offline   Reply With Quote
Old 08-27-2024, 08:38 AM   #4
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,069
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by JimmXinu View Post
I'll offer a few pointers, but I haven't researched it recently.

Cloudfare has more than one level of blocking it can be running at for a given site.

Some lower levels can be bypassed with cloudscraper.

FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare.

FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached.

The actual cache reading code isn't mine and I don't pretend to understand it more than superficially.
Thank you for your help. I did try clouscraper and FlareSolverr, but could not make it work.

Quote:
Originally Posted by kovidgoyal View Post
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py
I tried using WebEngineBroswer to access the login page (protected by Cloudfare) but seems like JavaScript is not enable by default. The HTML returned has this message: 'Enable JavaScript and cookies to continue'.
How can I enable full JavaScript and Cookies support for a WebEngineBrowser instance?

This is necessary because the Cloudfare check basically gives the browser a few JS calculations to solve, before allowing the user to see the login page.
thiago.eec is offline   Reply With Quote
Old 08-27-2024, 12:09 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,509
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It doesnt support javascript, once you enable javascript detecting that the browser is not a real browser is pretty easy so it wont help you with cloudflare.
kovidgoyal is offline   Reply With Quote
Advert
Old 08-27-2024, 02:03 PM   #6
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,069
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
It doesnt support javascript, once you enable javascript detecting that the browser is not a real browser is pretty easy so it wont help you with cloudflare.
I see. I'm out of options, then.
thiago.eec is offline   Reply With Quote
Old 08-27-2024, 02:59 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,335
Karma: 136006010
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by thiago.eec View Post
I see. I'm out of options, then.
See my reply in the Skoob thread.
JSWolf is online now   Reply With Quote
Old 08-27-2024, 06:27 PM   #8
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,125
Karma: 8796706
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Why don't you use a little courtesy and post a link.

bernie
Quote:
Originally Posted by JSWolf View Post
See my reply in the Skoob thread.
gbm is offline   Reply With Quote
Old 08-27-2024, 06:48 PM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,335
Karma: 136006010
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by gbm View Post
Why don't you use a little courtesy and post a link.

bernie
The plugin thread for Skoob in the calibre forum as this isn't the correct thread to reply in.

https://www.mobileread.com/forums/sh...d.php?t=320753
JSWolf is online now   Reply With Quote
Old 08-27-2024, 08:12 PM   #10
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,069
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Thank you, guys.
I took a look at the Koob plugin as @JSWolf suggested in the Skoob Books thread. The developer is using cloudscraper, but I already tried that and it didn't work. But the developer did mentioned something about changing the JavaScript interpreter. I'll contact him and see if I can get a workaround.
thiago.eec is offline   Reply With Quote
Old 09-02-2024, 01:20 AM   #11
Bradles
Connoisseur
Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.Bradles is not intimidated by interfenestral monkeys.
 
Bradles's Avatar
 
Posts: 80
Karma: 26914
Join Date: Nov 2020
Location: Perth, Western Australia
Device: Apple Books & Kobo Libra H20
LibraryThing also stared using cloudflare a while ago. For my plugin (LibraryThing Match) I had to change the user agent twice to avoid 403 errors. This is what it is currently:

'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
Bradles is offline   Reply With Quote
Old 09-02-2024, 08:09 AM   #12
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,069
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by Bradles View Post
LibraryThing also stared using cloudflare a while ago. For my plugin (LibraryThing Match) I had to change the user agent twice to avoid 403 errors. This is what it is currently:

'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
Thanks for tip. I tried it here just to be sure, but didn't work.

A couple of years ago I started using complete headers, as Skoob was blocking requests with default header. Then, I moved to using calibre's randon_ua method, so I wouldn't hit the server always with the same header. This worked for more than four years, but fails now.

When you access the login page, it lands on a cloudflare page, where you need to wait a few seconds before proceeding. No need to click anything. For what I could gather, cloudflare use some javascript code to create a challenge for the 'browser' to solve. If you are using standard python libraries (urllib or requests), you have no javascript capabilities and get blocked. A headless browser could solve it, but it doesn't seem to be so simple. Calibre headless browser, for instance, has no javascript enabled, as Kovid stated.

FlareSolverr and cloudscraper didn't work either. I even tried testing a paid service, like https://scrapeops.io/. It worked to access the login page, but I couldn't make a successful POST request (even with their support's help). They have a js_scenario option, that you could use to manipulate the form and press the button, instead of using a POST request, but that also didn't work.

Anyway, looks like a lost cause.
thiago.eec is offline   Reply With Quote
Old 09-02-2024, 08:14 AM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,335
Karma: 136006010
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
It could get to the point where any site with Cloudflare will be unable to be access from a plugin.
JSWolf is online now   Reply With Quote
Old 09-03-2024, 06:19 PM   #14
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,328
Karma: 90943357
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by JSWolf View Post
It could get to the point where any site with Cloudflare will be unable to be access from a plugin.
And maybe only four browsers, without customisation. They are a pain. Too aggressive.
Quoth is offline   Reply With Quote
Old 09-03-2024, 08:00 PM   #15
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,335
Karma: 136006010
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Quoth View Post
And maybe only four browsers, without customisation. They are a pain. Too aggressive.
If I find I'm unable to enter a website because of Cloudflare, I am going to be mighty annoyed especially if it's a site I use a lot. I have my customized version of Firefox that I run and I don't want ti excluded because Cloudflare are being an ass.
JSWolf is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
problem! massive problem! Persian Calibre 8 07-13-2011 05:39 PM
PRS-500 battery problem, but the battery's not the problem ZachC Sony Reader 7 01-12-2010 12:46 AM
Calibré problem (may be XP problem) Hildebrandt Calibre 3 07-23-2009 03:04 PM


All times are GMT -4. The time now is 03:18 PM.


MobileRead.com is a privately owned, operated and funded community.