06-19-2024, 07:28 PM | #16 |
want to learn what I want
Posts: 1,289
Karma: 6543210
Join Date: Sep 2020
Device: Calibre E-book viewer
|
Glad you liked it! I see four processes on Windows most of the time: three NAPS2.exe and one NAPS2.Worker.exe. Haven't used it intensively yet, though.
|
06-19-2024, 07:59 PM | #17 |
Addict
Posts: 245
Karma: 1600689
Join Date: Jul 2023
Device: Scribe, OA2, Glo HD, PRS-350
|
That sounds like you are using the built in macOS Preview application.
|
06-20-2024, 07:54 AM | #18 |
Grand Sorcerer
Posts: 45,058
Karma: 56751447
Join Date: Jan 2007
Location: Peru
Device: Kindle: Oasis 3, Voyage WiFi; Kobo: Libra 2, Aura One
|
|
06-21-2024, 08:38 PM | #19 |
Addict
Posts: 221
Karma: 2818790
Join Date: Nov 2015
Device: none
|
I can't agree with that. There are books that are enriched by illustrations, and there are even some where they're essential for the story. In the first category, you have books designed with specific illustrations in mind. Often they are dawn by authors themselves (Hobbit, Decline and Fall...), one can argue that they offer an intended reading experience. While with the latter, they're only way to experience the whole story fully. Take Path to Rome from Belloc, for example. It was avant guard, modern travelogue at the time, and it still feels fresh today with it's unique 'chapter' structure. Author talks to the reader, describes his drawings, his maps. Illustrations, in this case, are vital for the storytelling. The novel simply doesn't work without them.
|
06-23-2024, 04:22 PM | #20 |
Evangelist
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
How to make a good EPUB from an IA PDF
The Internet Archive PDFs have become most of the raw material for my retirement hobby: getting pulp magazine and similar stories (and whole issues) into good, illustrated epub books. Work yes, but with good tools I can often do a 70,000-word magazine issue or novel in a few days. (The first one I ever did probably took two months!)
General method, if anyone wants to try. I do all this on a Linux box, but most everything or an equivalent is probably available on any platform: Find the PDF I want, something like Dime Detective January 1950 just as an example. (If the book is "borrow for an hour", you might need https://financial-accounting-acg2021.../download.html to help get the pdf. The Calibre de-ACSM plugin works for this without Adobe.) Pull out usable images with pdftopng (https://poppler.freedesktop.org) If the page images are really bad, which is rare, clean them up with ImageMagick or Scan Taylor Advanced. https://imagemagick.org/script/download.php and/or https://github.com/4lex4/scantailor-advanced Run OCR page by page, doing each column one at a time, avoiding ads and following "continued on page nnn" instructions: Tesseract OCR using OCRFeeder front end. This sounds horrible but is actually quite quick, several pages a minute with practice. https://wiki.gnome.org/action/show/Apps/OCRFeeder Take the page-by-page text output and copy it into LibreOffice Writer. I have made a template, .ott file, with about 15 custom styles to handle most all needs for fiction of this type. I use GIMP to extract and fix up images I want in the book. Scale large images to max 1200 px in largest dimension since target is e-ink readers. In Writer, anchor images "as characters" and keep it simple. Proofread and correct the Writer document, formatting it with the custom styles. I do this maybe 20 or 30 pages at a time. Each input doc seems to give repeated OCR errors due to typesetting and/or scanning, so this allows find-and-replace corrections in large chunks. (I wish Writer had saved searches like Sigil or Calibre Editor.) Import the .odt file into Sigil using the new version of the ODT Import plugin--it is just terrific. I have a custom css file in the plugin that exactly matches the styles in my Writer template file, but dimensions in em instead of pt or cm. Just delete the default css and the custom one takes over like magic. This setup is in the manual. This input plugin works so well that the only work needed at the epub code level is a little fix-up on images, 3 or 4 standard code changes I like, and adding metadata. Depending on the doc, I use Sigil or the Calibre Editor for this--they have different tools, both are very good. I almost never have to touch the actual coding of any book text. No one-button conversion is ever going to get you a book like this from old, multi-column text full of advertisements! Remaining problem, what to do with the resulting books? Most of the stuff I do is really old, but I'm not about to try and understand the copyright issues. You can probably find some of this if you search in likely places. |
06-23-2024, 09:45 PM | #21 | |
Evangelist
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
Quote:
Great little program for doing some simple things very easily. Tried the "black and white" tool on a very brown and ugly magazine scan from IA. It used all 24 of my cores (Thelio box from System76 on Pop_os, an Ubuntu based OS). Very fast. And excellent output (on that sample of one...) |
|
06-23-2024, 11:27 PM | #22 |
Addict
Posts: 245
Karma: 1600689
Join Date: Jul 2023
Device: Scribe, OA2, Glo HD, PRS-350
|
|
06-23-2024, 11:51 PM | #23 | |
Addict
Posts: 221
Karma: 2818790
Join Date: Nov 2015
Device: none
|
Quote:
Just looking at the surface of it, and that blogspot link. I'm wondering how are PDF's generated for borrowed books? They look like they're their own PDF's that just aren't listed. I tend to see similar artefacts when comparing PDF's of books that they have freely available and .jp2 source files. It's making their PDF's appalling when it comes to illustrations and photos, and a major reason why I'm making my own. I'll post my own workflow later, if anyone is interested in it. I'm taking jp2.zip or riping .jpg's from webpage, and I'm converting them into PDF's. I'm just wondering if it would be possible to get jp2 files using the same process? In vast majority of cases jp2's are massive overkill regarding file sizes for the quality that they offer, but they're needed simply because their PDF process obliterates quality. Example of PDF compared to Webpage source and jp2 zip files: Sample book: https://archive.org/details/atlanticcrossing0000melv PDF: https://archive.org/download/atlanti...ng0000melv.pdf Webpage source: https://archive.org/details/atlantic...age/6/mode/2up JP2 processed source: https://archive.org/download/atlanti...00melv_jp2.zip (224mb file) Comparisons in attachments. Artefacts should be plainly visible. I also included a spread of two pages in full resolution in zip file. ZIP is required as this site doesn't support such high resolutions. Jpeg is compressed compared to .jp2 source to keep file size down, but quality is actually comparable to other comparisons. Extra resolution isn't doing much, simply because scans aren't perfect to begin with. |
|
06-24-2024, 06:43 AM | #24 | |
Fanatic
Posts: 502
Karma: 2267928
Join Date: Nov 2015
Device: none
|
Quote:
The same software could also use dictionaries to improve recognition, which Tesseract still cannot do. |
|
07-03-2024, 01:35 PM | #25 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jul 2024
Device: Few of them....
|
Sharing Your opinion....
|
08-21-2024, 03:03 PM | #26 | |
Member
Posts: 21
Karma: 95132
Join Date: Jan 2013
Device: Nook HD+, HD (Cyanogenmod, LineageOS), Simple Touch (rooted), various
|
Quote:
https://github.com/elementdavv/inter...ive_downloader |
|
08-21-2024, 04:41 PM | #27 | |
the rook, bossing Never.
Posts: 12,387
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
No-one needs that. If it's in their library it may be pirated. |
|
08-21-2024, 04:51 PM | #28 |
Wizard
Posts: 1,367
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
|
08-21-2024, 09:57 PM | #29 | |
Custom User Title
Posts: 9,588
Karma: 65099765
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Quote:
I've noticed a lot of previously-borrowable books also going "preview only" since the lawsuit started. Last edited by ownedbycats; 08-21-2024 at 10:18 PM. |
|
08-22-2024, 09:57 AM | #30 | |
the rook, bossing Never.
Posts: 12,387
Karma: 92073397
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
If you want your own copy of a copyright work, rather than public library, Kindle Unlimited, or IA's dodgy Open Library, then you have to buy it. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Archive.org ePub | Ghitulescu | ePub | 12 | 06-01-2021 03:55 AM |
archive.org downloads | abrogard | Calibre | 2 | 08-11-2018 07:08 PM |
Archive.org | crutledge | General Discussions | 129 | 08-28-2015 07:22 AM |
Making Table of Contents on free eBook files from Archive.org | automa | Sigil | 2 | 11-18-2012 08:00 AM |
Archive.org opens huge ebook lending library | rogue_librarian | News | 37 | 02-27-2011 09:16 AM |