08-24-2024, 12:59 PM | #1 |
Wizard
Posts: 1,073
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Cloudfare problem
Hi everyone.
I have a plugin called Skoob Sync. It uses webscraping to sync the user info. Now, the site started using Cloudfare when you try to login, giving me a 'HTTP Error 403: Forbidden'. The metadata plugin (Skoob Books) is not affected as it does not need to login. I know this is not a calibre question, but since many of the plugin developers use webscraping, I thought that I might get some help here. Is there any way to bypass the Cloudfare check? This is the relevant code for login: Spoiler:
Spoiler:
Last edited by thiago.eec; 08-24-2024 at 01:49 PM. |
08-26-2024, 07:24 PM | #2 |
Plugin Developer
Posts: 6,590
Karma: 4600349
Join Date: Dec 2011
Location: Midwest USA
Device: Kindle Paperwhite(10th)
|
I'll offer a few pointers, but I haven't researched it recently.
Cloudfare has more than one level of blocking it can be running at for a given site. Some lower levels can be bypassed with cloudscraper. FlareSolverr is a proxy that runs a headless browser to handle requests. As I understand it, this uses it's own proxy API, not a standard one. Also, this proxy works for web pages, but not images or binaries. I don't believe it works for the highest "under attack" levels of Cloudfare. FanFicFare has code that can use either of these, as well as code that can read cached pages out of your regular browser's cache directory. This is a pain, because you have to load the page in your browser first to cache it--and not all pages are cached. The actual cache reading code isn't mine and I don't pretend to understand it more than superficially. |
Advert | |
|
08-27-2024, 12:43 AM | #3 |
creator of calibre
Posts: 44,520
Karma: 24495784
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@JimmXinu: Just FYI as of calibre 7.17 there is a headless browser (chromium, via Qt WebEngine) in calibre you can use to get URLs. See the WebEngineBrowser class in scraper/qt.py
Last edited by kovidgoyal; 08-27-2024 at 04:03 AM. |
08-27-2024, 08:38 AM | #4 | ||
Wizard
Posts: 1,073
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Quote:
Quote:
How can I enable full JavaScript and Cookies support for a WebEngineBrowser instance? This is necessary because the Cloudfare check basically gives the browser a few JS calculations to solve, before allowing the user to see the login page. |
||
08-27-2024, 12:09 PM | #5 |
creator of calibre
Posts: 44,520
Karma: 24495784
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It doesnt support javascript, once you enable javascript detecting that the browser is not a real browser is pretty easy so it wont help you with cloudflare.
|
Advert | |
|
08-27-2024, 02:03 PM | #6 |
Wizard
Posts: 1,073
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
|
08-27-2024, 02:59 PM | #7 |
Resident Curmudgeon
Posts: 76,354
Karma: 136006198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
08-27-2024, 06:27 PM | #8 |
Wizard
Posts: 2,125
Karma: 8796706
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
|
|
08-27-2024, 06:48 PM | #9 |
Resident Curmudgeon
Posts: 76,354
Karma: 136006198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The plugin thread for Skoob in the calibre forum as this isn't the correct thread to reply in.
https://www.mobileread.com/forums/sh...d.php?t=320753 |
08-27-2024, 08:12 PM | #10 |
Wizard
Posts: 1,073
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Thank you, guys.
I took a look at the Koob plugin as @JSWolf suggested in the Skoob Books thread. The developer is using cloudscraper, but I already tried that and it didn't work. But the developer did mentioned something about changing the JavaScript interpreter. I'll contact him and see if I can get a workaround. |
09-02-2024, 01:20 AM | #11 |
Connoisseur
Posts: 80
Karma: 26914
Join Date: Nov 2020
Location: Perth, Western Australia
Device: Apple Books & Kobo Libra H20
|
LibraryThing also stared using cloudflare a while ago. For my plugin (LibraryThing Match) I had to change the user agent twice to avoid 403 errors. This is what it is currently:
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' |
09-02-2024, 08:09 AM | #12 | |
Wizard
Posts: 1,073
Karma: 1221485
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite
|
Quote:
A couple of years ago I started using complete headers, as Skoob was blocking requests with default header. Then, I moved to using calibre's randon_ua method, so I wouldn't hit the server always with the same header. This worked for more than four years, but fails now. When you access the login page, it lands on a cloudflare page, where you need to wait a few seconds before proceeding. No need to click anything. For what I could gather, cloudflare use some javascript code to create a challenge for the 'browser' to solve. If you are using standard python libraries (urllib or requests), you have no javascript capabilities and get blocked. A headless browser could solve it, but it doesn't seem to be so simple. Calibre headless browser, for instance, has no javascript enabled, as Kovid stated. FlareSolverr and cloudscraper didn't work either. I even tried testing a paid service, like https://scrapeops.io/. It worked to access the login page, but I couldn't make a successful POST request (even with their support's help). They have a js_scenario option, that you could use to manipulate the form and press the button, instead of using a POST request, but that also didn't work. Anyway, looks like a lost cause. |
|
09-02-2024, 08:14 AM | #13 |
Resident Curmudgeon
Posts: 76,354
Karma: 136006198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
It could get to the point where any site with Cloudflare will be unable to be access from a plugin.
|
09-03-2024, 06:19 PM | #14 |
the rook, bossing Never.
Posts: 12,338
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
|
09-03-2024, 08:00 PM | #15 |
Resident Curmudgeon
Posts: 76,354
Karma: 136006198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
If I find I'm unable to enter a website because of Cloudflare, I am going to be mighty annoyed especially if it's a site I use a lot. I have my customized version of Firefox that I run and I don't want ti excluded because Cloudflare are being an ass.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
problem! massive problem! | Persian | Calibre | 8 | 07-13-2011 05:39 PM |
PRS-500 battery problem, but the battery's not the problem | ZachC | Sony Reader | 7 | 01-12-2010 12:46 AM |
Calibré problem (may be XP problem) | Hildebrandt | Calibre | 3 | 07-23-2009 03:04 PM |