Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 08-24-2024, 11:59 AM   #1
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 983
Karma: 1183425
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Cloudfare problem

Hi everyone.

I have a plugin called Skoob Sync. It uses webscraping to sync the user info. Now, the site started using Cloudfare when you try to login, giving me a 'HTTP Error 403: Forbidden'. The metadata plugin (Skoob Books) is not affected as it does not need to login.

I know this is not a calibre question, but since many of the plugin developers use webscraping, I thought that I might get some help here. Is there any way to bypass the Cloudfare check?

This is the relevant code for login:

Spoiler:

Code:
            self.opener = six.moves.urllib.request.build_opener(six.moves.urllib.request.HTTPCookieProcessor(self.cj))

            # Install our opener (note that this changes the global opener to the one
            # we just made, but you can also just call opener.open() if you want)
            six.moves.urllib.request.install_opener(self.opener)

            # Authentication page
            authentication_url = 'https://api.skoob.com.br/login'

            # Credentials
            payload = {
                'data[Usuario][email]': self.prefs['user'],
                'data[Usuario][senha]': self.key_password,
            }

            # Use urllib to encode the payload
            data = six.moves.urllib.parse.urlencode(payload).encode()

            # Build our Request object (supplying 'data' makes it a POST)
            login_req = six.moves.urllib.request.Request(authentication_url, data, headers=random_ua())

            login = six.moves.urllib.request.urlopen(login_req)

Spoiler:

Code:
# Get a random user agent from calibre. This is used on Skoob access.
def random_ua():
    try:
        from calibre import random_user_agent
        try:
            hdr = {'User-Agent': random_user_agent(allow_ie=False)}
            return hdr
        except TypeError:
            hdr = {'User-Agent': random_user_agent()}
            return hdr
    except ImportError:
        hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT .1; Win64; x64)'}
        return hdr

Last edited by thiago.eec; 08-24-2024 at 12:49 PM.
thiago.eec is offline   Reply With Quote
Old Yesterday, 06:24 PM   #2
JimmXinu
Plugin Developer
JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.
 
JimmXinu's Avatar
 
Posts: 6,487
Karma: 4277235
Join Date: Dec 2011
Location: Midwest USA
Device: Kindle Paperwhite(10th)
I'll offer a few pointers, but I haven't researched it recently.

Cloudfare has more than one level of blocking it can be running at for a given site.

Some lower levels can be bypassed with cloudscraper.

FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare.

FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached.

The actual cache reading code isn't mine and I don't pretend to understand it more than superficially.
JimmXinu is online now   Reply With Quote
Old Yesterday, 11:43 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,256
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py

Last edited by kovidgoyal; Today at 03:03 AM.
kovidgoyal is offline   Reply With Quote
Old Today, 07:38 AM   #4
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 983
Karma: 1183425
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by JimmXinu View Post
I'll offer a few pointers, but I haven't researched it recently.

Cloudfare has more than one level of blocking it can be running at for a given site.

Some lower levels can be bypassed with cloudscraper.

FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare.

FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached.

The actual cache reading code isn't mine and I don't pretend to understand it more than superficially.
Thank you for your help. I did try clouscraper and FlareSolverr, but could not make it work.

Quote:
Originally Posted by kovidgoyal View Post
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py
I tried using WebEngineBroswer to access the login page (protected by Cloudfare) but seems like JavaScript is not enable by default. The HTML returned has this message: 'Enable JavaScript and cookies to continue'.
How can I enable full JavaScript and Cookies support for a WebEngineBrowser instance?

This is necessary because the Cloudfare check basically gives the browser a few JS calculations to solve, before allowing the user to see the login page.
thiago.eec is offline   Reply With Quote
Old Today, 11:09 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,256
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It doesnt support javascript, once you enable javascript detecting that the browser is not a real browser is pretty easy so it wont help you with cloudflare.
kovidgoyal is offline   Reply With Quote
Old Today, 01:03 PM   #6
thiago.eec
Guru
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 983
Karma: 1183425
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
It doesnt support javascript, once you enable javascript detecting that the browser is not a real browser is pretty easy so it wont help you with cloudflare.
I see. I'm out of options, then.
thiago.eec is offline   Reply With Quote
Old Today, 01:59 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,639
Karma: 134254544
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by thiago.eec View Post
I see. I'm out of options, then.
See my reply in the Skoob thread.
JSWolf is offline   Reply With Quote
Old Today, 05:27 PM   #8
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,109
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Why don't you use a little courtesy and post a link.

bernie
Quote:
Originally Posted by JSWolf View Post
See my reply in the Skoob thread.
gbm is offline   Reply With Quote
Old Today, 05:48 PM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,639
Karma: 134254544
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by gbm View Post
Why don't you use a little courtesy and post a link.

bernie
The plugin thread for Skoob in the calibre forum as this isn't the correct thread to reply in.

https://www.mobileread.com/forums/sh...d.php?t=320753
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
problem! massive problem! Persian Calibre 8 07-13-2011 04:39 PM
PRS-500 battery problem, but the battery's not the problem ZachC Sony Reader 7 01-11-2010 11:46 PM
Calibré problem (may be XP problem) Hildebrandt Calibre 3 07-23-2009 02:04 PM


All times are GMT -4. The time now is 06:29 PM.


MobileRead.com is a privately owned, operated and funded community.