Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 10-11-2024, 03:10 AM   #1
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Question [SOLVED] Download web page and turn it into EPUB?

Hello,

I'd like to download a web page that's too long to read on a computer, and have it turned into an EPUB file.

Neither Pandoc, Calibre, nor mutool work, either because of wrong layout, wrong characters (ligatures at least), or even "Couldn't render this page".

I assumed turning HTML into EPUB (where pages are actually HTML) would be a breeze… but it looks more involved than expected.

Does someone know of a reliable, no-brainer solution (for Windows, CLI and/or GUI)?

Thank you.

Last edited by Shohreh; 10-11-2024 at 11:23 AM.
Shohreh is offline   Reply With Quote
Old 10-11-2024, 03:36 AM   #2
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,361
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
You should be able to right click on the start of the text and select Inspect.
Then, scroll down the <div> until you hit the one that causes all the text to turn blue
Then context menu > Copy > Inner HTML.
Attached Thumbnails
Click image for larger version

Name:	CopyInnerHTML.jpg
Views:	55
Size:	314.7 KB
ID:	211343  
Karellen is offline   Reply With Quote
Advert
Old 10-11-2024, 03:38 AM   #3
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Thanks. Is there no easier solution?

Google didn't help finding the right options to tell wget to download a web page with what's required to then turn it into a readable EPUB file, even with a no-thrill, single-column web page.
Shohreh is offline   Reply With Quote
Old 10-11-2024, 04:10 AM   #4
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,361
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
There probably is, but I am not aware of them.
Karellen is offline   Reply With Quote
Old 10-11-2024, 06:24 AM   #5
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,225
Karma: 19000635
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
It’s not necessarily easier in the short run, but you can learn python (a very easy/straight forward programming language) and you can make a short spider scraper that will grab all the html from all the pages and put it in a text file. Then you just need to do some simple massaging to format it as epub.

In the long run you will know python and be the ruler of your universe!!

In the medium run make sure you follow copyright restrictions and/or have the website/author’s permission before you scrape those pages.
Turtle91 is online now   Reply With Quote
Advert
Old 10-11-2024, 06:28 AM   #6
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
I know Python, but surely applications like wget etc. can download a web page and its resources (CSS, JPG/PNG) before feeding them into Pandoc etc. to get an EPUB?
Shohreh is offline   Reply With Quote
Old 10-11-2024, 06:43 AM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,474
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
What is URL some maybe someone can have a go at it?
JSWolf is offline   Reply With Quote
Old 10-11-2024, 06:48 AM   #8
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,225
Karma: 19000635
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
The issue is parsing the html to get just the book and not all the fluff/ads. Those ads are likely what is causing the issues. Soup and the script can do all the scraping and 99% of the massaging to output a text file with all the book contents and associated html tags. Then just copy/paste the contents of the output file into pandoc/sigil/calibre for final epub massaging.

I wrote a program to do all that as a project to learn python and made a gui for it. That was fun! However, there aren’t any websites that I’m aware of which allow its use. You are pretty much restricted to converting your own webpage to an epub.

Last edited by Turtle91; 10-11-2024 at 06:52 AM.
Turtle91 is online now   Reply With Quote
Old 10-11-2024, 06:52 AM   #9
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Any web page will do, it's not a specific page.

Ads are not the problem. The problem is 1) getting the CSS and pictures, and 2) turning those into a working EPUB.

For different reasons (chopped page, wrong characters, "Couldn't render this page" in one section), neither pandoc, Calibre nor Sigil worked.

Surely, I'm not the first person to want to turn a long web article into an EPUB file to read on an e-reader.
Shohreh is offline   Reply With Quote
Old 10-11-2024, 07:17 AM   #10
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,225
Karma: 19000635
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
You can parse/soup to get the src url for images/css and download those files separately. If there are only a few files it’d be faster to do them manually via the inspector as Karellen mentioned.

You’re not the first to want that. As I mentioned, the program is not difficult to make, but since there are very few opportunities for legal use there isn’t a big incentive for a company to make one publicly available.

Some browsers allow you to saveas if you don’t mind getting the whole page including all the junk. But most of that junk doesn’t work in an epub and needs to be cleaned out.

Last edited by Turtle91; 10-11-2024 at 07:22 AM.
Turtle91 is online now   Reply With Quote
Old 10-11-2024, 07:24 AM   #11
Comfy.n
want to learn what I want
Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.
 
Posts: 1,284
Karma: 6433040
Join Date: Sep 2020
Device: Calibre E-book viewer
this firefox add-on works fine in most cases, I love it.

https://addons.mozilla.org/en-US/fir...n/saveasebook/

there's also epubpress https://addons.mozilla.org/en-US/fir...e-web-offline/
Comfy.n is online now   Reply With Quote
Old 10-11-2024, 08:02 AM   #12
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 674
Karma: 4568205
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
I miss bloxp.... I typically use Pocket. Works most of the time, but a bit too often not.
patrik is offline   Reply With Quote
Old 10-11-2024, 10:20 AM   #13
msel
Connoisseur
msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.msel is at one with the great books of the world.
 
msel's Avatar
 
Posts: 67
Karma: 143132
Join Date: Sep 2010
Device: Kindle Keyboard 3G
Three Suggestions

Hello,

1. with Firefox: Use the add-on Readability based Reader View
https://addons.mozilla.org/en-US/fir...ed-reader-view
Open (and edit) the page with the Reader View and save the webpage with the save button.
The saved html file edit with the Calibre E-Book-Editor. The missing pictures can be downloaded in the editor with Tools > External Links > Download external ressources.
Another solution would be to use SingleFile (https://addons.mozilla.org/en-US/fir...n/single-file/). It would save all of the page or the selected to one (big) html file with the images.
2. for Google chrome based browser: There is an add-on rePub - especially for Remarkable, but you can also create simple epub without Remarkable.
https://chromewebstore.google.com/de...cgdapmikoaolpb
3. If you use the Pale Moon browser and you have installed the Classic Add-on Archive you can install the add-on GrabMyBooks. This is the solution I use.

Greetings, Maria
msel is offline   Reply With Quote
Old 10-11-2024, 11:22 AM   #14
Shohreh
Groupie
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 181
Karma: 304158
Join Date: Jan 2016
Device: none
Thanks much. rePub in Chrome is perfect.

FWIW, the following command in wget is pretty close to download a web page and its resources, but URLs still need to be post-edited to remove the garbage added after picture filenames (eg. .jpeg becomes .jpeg?blah, causing errors):

Code:
wget --restrict-file-names=ascii,windows --convert-links  --random-wait -U mozilla -e robots=off --span-hosts --domains=acme.com,cdn.acme.com --page-requisites --no-parent --directory-prefix=.\mydir https://acme.com/2024/09/22/blah.html
---
Edit: I also noted that the URLs of some pictures were not converted to point to a local file so won't be displayed in the EPUB. Also, SumatraPDF didn't like some useless <div> section in the EPUB created by Pandoc ("Couldn't render the page"; didn't try to see if it worked in the e-reader). Bottom line: First try one of the browser extensions before trying pandoc (or wget + Sigil/Calibre).

Last edited by Shohreh; 10-12-2024 at 04:16 AM.
Shohreh is offline   Reply With Quote
Old 10-11-2024, 11:32 AM   #15
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Quote:
Originally Posted by msel View Post
Hello,

1. with Firefox: Use the add-on Readability based Reader View
https://addons.mozilla.org/en-US/fir...ed-reader-view
Open (and edit) the page with the Reader View and save the webpage with the save button.
The saved html file edit with the Calibre E-Book-Editor. The missing pictures can be downloaded in the editor with Tools > External Links > Download external ressources.
Another solution would be to use SingleFile (https://addons.mozilla.org/en-US/fir...n/single-file/). It would save all of the page or the selected to one (big) html file with the images.
2. for Google chrome based browser: There is an add-on rePub - especially for Remarkable, but you can also create simple epub without Remarkable.
https://chromewebstore.google.com/de...cgdapmikoaolpb
3. If you use the Pale Moon browser and you have installed the Classic Add-on Archive you can install the add-on GrabMyBooks. This is the solution I use.

Greetings, Maria
Missing a character from the URL

It's actually https://chromewebstore.google.com/de...gdapmikoaolpbl
PeterT is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Forma Navigate web page with the page turn button on Kobo Forma? labose123 Kobo Reader 2 08-17-2020 02:18 AM
Download and convert web page nkormanik Conversion 15 01-12-2019 09:14 PM
Creating a web page to download .mobi files to Kindle Steve00932 Amazon Kindle 15 12-02-2011 01:36 PM
PRS-300 Lost Symbol ePub: 12 Seconds to turn a page budbrainmegademo Sony Reader 16 11-06-2009 07:34 PM


All times are GMT -4. The time now is 05:32 AM.


MobileRead.com is a privately owned, operated and funded community.