06-18-2024, 12:51 PM | #1 |
Addict
Posts: 218
Karma: 2818790
Join Date: Nov 2015
Device: none
|
Opinions on Archive.org as free ebook source?
I'm finding that I'm relying more and more on them compared to other sources. Project Gutenberg, Standard Ebooks.... They don't go with reflowable text, they just offer scans of original books in as high quality as possible. They do offer epubs, in some cases, but they're often poorly formatted. There's flawed ocr, and no custom proofreading, then again, none of that is needed. Proofreading is often a must, and a problem on epubs. Corrupted pages, ink splats, or just ocr that misbehaved, all that can cause text errors that then drag across various commercial or free ebooks for years. In a scanned page, mistakes are plainly visible, so the reader can see and often deduce what the word that's corrupted was.
Then there's another benefit. Illustrations. You rarely see them even in commercial ebooks. And when they are available, they're often of low quality. On archive.org, you can choose from various editions, compare which looks the best and go with it. There is size and availability concern, though. Some books can't be downloaded, and when they are files tend to be quite big. Massive compared to epubs. Likewise, pdfs can be of worse quality than processed jpegs, so I often download them instead and convert them into pdfs on my own. Last edited by poohbear_nc; 06-18-2024 at 01:05 PM. |
06-18-2024, 01:14 PM | #2 |
Grand Sorcerer
Posts: 11,521
Karma: 230505500
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
|
I only read reflowable ebooks, PDF is out of the question for me. Too clumsy and uncomfortable even on a computer, let alone ereader. And I couldn't care less about illustrations. Give me pure text any day.
|
Advert | |
|
06-18-2024, 01:56 PM | #3 |
Grand Sorcerer
Posts: 5,531
Karma: 100606001
Join Date: Apr 2011
Device: pb360
|
I think of the (public domain) scans on archive.org as valuable raw material for ebook producers that can't or don't want to do the scanning themselves. (Just as Project Gutenberg can be a starting point for someone wanting to produce high quality formatting.)
|
06-18-2024, 02:27 PM | #4 |
the rook, bossing Never.
Posts: 12,368
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Archive Org is only good for PDF scans. You need to check they are really PD and do your own clean up / OCR / Proof. Their OCR is almost enough just for search.
You need a decent QHD screen, better than basic HD anyway, to read them. Often the quality is poor. |
06-18-2024, 03:27 PM | #5 |
Wizard
Posts: 1,361
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Yes, definitely agree.
|
Advert | |
|
06-18-2024, 03:30 PM | #6 |
Gentleman and scholar
Posts: 11,352
Karma: 110455811
Join Date: Jun 2015
Location: Space City, Texas
Device: Clara BW; Nook ST w/Glowlight, Paperwhite 3
|
I personally wouldn't enjoy reading one of those Archive.org (or OpenLibrary) book scans on my reader. They are a mess.
But they can serve as valuable raw material and I'm glad they are doing what they are doing (even the scans of in copyright stuff). |
06-18-2024, 04:49 PM | #7 |
Grand Sorcerer
Posts: 6,772
Karma: 26974049
Join Date: Apr 2009
Location: USA
Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3
|
While a lot of it available for ePub download, PDF is almost always better in the end. Some of the scans are pretty good, for example the ones you have to check out an hour at a time.
I've been sufficiently motivated to clean a few up and re-OCR, but it is a lot of work and difficult to automate any part of the workflow for doing so. |
06-18-2024, 05:55 PM | #8 |
Custom User Title
Posts: 9,571
Karma: 64960981
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
The PDFs scanned by the Internet Archive itself use a Luratech layering/transparency for compression reasons. I've had issues with them on my Kobo devices.
The OCR'd ePubs I consider a lost cause. |
06-18-2024, 07:45 PM | #9 |
want to learn what I want
Posts: 1,279
Karma: 6433040
Join Date: Sep 2020
Device: Calibre E-book viewer
|
Sometimes I see very low "brightness" in Archive.org scans.
Then, a while ago, when looking at one of those, I found an awesome Windows program that allows adjusting all images in a PDF file, in just one click: https://www.naps2.com/ Last edited by Comfy.n; 06-18-2024 at 07:48 PM. |
06-18-2024, 07:51 PM | #10 | |
Wizard
Posts: 1,361
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
I don't use it for the pdf adjustment, but I do us the batch scanning mode which works really well. =edit= Thanks for the reminder. I see I am a bit out of date using v6.1. Best upgrade to the latest version. |
|
06-18-2024, 09:16 PM | #11 |
want to learn what I want
Posts: 1,279
Karma: 6433040
Join Date: Sep 2020
Device: Calibre E-book viewer
|
|
06-18-2024, 09:30 PM | #12 |
Grand Sorcerer
Posts: 45,043
Karma: 56751447
Join Date: Jan 2007
Location: Peru
Device: Kindle: Oasis 3, Voyage WiFi; Kobo: Libra 2, Aura One
|
I just do this in calibre:
VIEW (open the PDF file) FILE EXPORT QUARTZ FILTER (change to BLACK & WHITE) SAVE I then take the saved file from Documents (Macbook Air) and load the file onto calibre and then transfer it to my Kindle Scribe. Beautiful. The covers are in black and white, but I don't care. Then (pay attention ), I READ the book as a PDF file. Most of my PDF files come from Archive.org WONDERFUL. SIMPLE. ENJOYABLE. PDF FILE. KINDLE SCRIBE. YEAH! Last edited by Dr. Drib; 06-18-2024 at 09:33 PM. |
06-19-2024, 11:27 AM | #13 |
Guru
Posts: 617
Karma: 12345678
Join Date: Jan 2015
Location: Canada
Device: none
|
While they are not my favorite source of Public Domain books I have grabbed a number of their PDFs.
I use EBookDroid to read them on my tablet as it provides a number of options to improve PDF readability. |
06-19-2024, 02:59 PM | #14 | |
Addict
Posts: 241
Karma: 1600689
Join Date: Jul 2023
Device: Kindle Scribe
|
Quote:
|
|
06-19-2024, 03:08 PM | #15 |
Grand Sorcerer
Posts: 12,756
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
There seems to be some multi core discussions at https://sourceforge.net/p/naps2/discussion/general/
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Archive.org ePub | Ghitulescu | ePub | 12 | 06-01-2021 03:55 AM |
archive.org downloads | abrogard | Calibre | 2 | 08-11-2018 07:08 PM |
Archive.org | crutledge | General Discussions | 129 | 08-28-2015 07:22 AM |
Making Table of Contents on free eBook files from Archive.org | automa | Sigil | 2 | 11-18-2012 08:00 AM |
Archive.org opens huge ebook lending library | rogue_librarian | News | 37 | 02-27-2011 09:16 AM |