Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 06-19-2024, 06:28 PM   #16
Comfy.n
want to learn what I want
Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.
 
Posts: 1,248
Karma: 6426810
Join Date: Sep 2020
Device: Calibre E-book viewer
Quote:
Originally Posted by Frogm4n View Post
Thanks for linking that. I am glad it is multi-platform. However, on Linux it seems to be single-threaded. Is it the same on Windows/macOS? My main workstation has 14 cores, so seeing it do a compute-heavy task on just one of them is a bit frustrating.
Glad you liked it! I see four processes on Windows most of the time: three NAPS2.exe and one NAPS2.Worker.exe. Haven't used it intensively yet, though.
Comfy.n is offline   Reply With Quote
Old 06-19-2024, 06:59 PM   #17
Frogm4n
Addict
Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.
 
Posts: 229
Karma: 1600689
Join Date: Jul 2023
Device: Kindle Scribe
Quote:
Originally Posted by Dr. Drib View Post
I just do this in calibre:

VIEW (open the PDF file)
FILE
EXPORT
QUARTZ FILTER (change to BLACK & WHITE)
SAVE

I then take the saved file from Documents (Macbook Air) and load the file onto calibre and then transfer it to my Kindle Scribe.
That sounds like you are using the built in macOS Preview application.
Frogm4n is offline   Reply With Quote
Old 06-20-2024, 06:54 AM   #18
Dr. Drib
Grand Sorcerer
Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.
 
Dr. Drib's Avatar
 
Posts: 45,034
Karma: 56650819
Join Date: Jan 2007
Location: Peru
Device: Kindle: Oasis 3, Voyage WiFi; Kobo: Libra 2, Aura One
Quote:
Originally Posted by Frogm4n View Post
That sounds like you are using the built in macOS Preview application.
I don't know, but if that is it, the application gets rid of the yellow tint and speeds up the PDF (when turning pages) to where it is almost like reading a regular file.
Dr. Drib is offline   Reply With Quote
Old 06-21-2024, 07:38 PM   #19
jackm8
Addict
jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.
 
jackm8's Avatar
 
Posts: 211
Karma: 2818790
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by Sirtel View Post
...And I couldn't care less about illustrations. Give me pure text any day.
I can't agree with that. There are books that are enriched by illustrations, and there are even some where they're essential for the story. In the first category, you have books designed with specific illustrations in mind. Often they are dawn by authors themselves (Hobbit, Decline and Fall...), one can argue that they offer an intended reading experience. While with the latter, they're only way to experience the whole story fully. Take Path to Rome from Belloc, for example. It was avant guard, modern travelogue at the time, and it still feels fresh today with it's unique 'chapter' structure. Author talks to the reader, describes his drawings, his maps. Illustrations, in this case, are vital for the storytelling. The novel simply doesn't work without them.
Attached Thumbnails
Click image for larger version

Name:	srgb iec PNG pathtorome00bell_0158.jpg
Views:	100
Size:	663.2 KB
ID:	209073   Click image for larger version

Name:	srgb iec PNG pathtorome00bell_0203.jpg
Views:	94
Size:	511.2 KB
ID:	209074  
jackm8 is offline   Reply With Quote
Old 06-23-2024, 03:22 PM   #20
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
How to make a good EPUB from an IA PDF

The Internet Archive PDFs have become most of the raw material for my retirement hobby: getting pulp magazine and similar stories (and whole issues) into good, illustrated epub books. Work yes, but with good tools I can often do a 70,000-word magazine issue or novel in a few days. (The first one I ever did probably took two months!)

General method, if anyone wants to try. I do all this on a Linux box, but most everything or an equivalent is probably available on any platform:

Find the PDF I want, something like Dime Detective January 1950 just as an example. (If the book is "borrow for an hour", you might need https://financial-accounting-acg2021.../download.html to help get the pdf. The Calibre de-ACSM plugin works for this without Adobe.)

Pull out usable images with pdftopng (https://poppler.freedesktop.org)

If the page images are really bad, which is rare, clean them up with ImageMagick or Scan Taylor Advanced. https://imagemagick.org/script/download.php and/or https://github.com/4lex4/scantailor-advanced

Run OCR page by page, doing each column one at a time, avoiding ads and following "continued on page nnn" instructions: Tesseract OCR using OCRFeeder front end. This sounds horrible but is actually quite quick, several pages a minute with practice. https://wiki.gnome.org/action/show/Apps/OCRFeeder

Take the page-by-page text output and copy it into LibreOffice Writer. I have made a template, .ott file, with about 15 custom styles to handle most all needs for fiction of this type.

I use GIMP to extract and fix up images I want in the book. Scale large images to max 1200 px in largest dimension since target is e-ink readers. In Writer, anchor images "as characters" and keep it simple.

Proofread and correct the Writer document, formatting it with the custom styles. I do this maybe 20 or 30 pages at a time. Each input doc seems to give repeated OCR errors due to typesetting and/or scanning, so this allows find-and-replace corrections in large chunks. (I wish Writer had saved searches like Sigil or Calibre Editor.)

Import the .odt file into Sigil using the new version of the ODT Import plugin--it is just terrific. I have a custom css file in the plugin that exactly matches the styles in my Writer template file, but dimensions in em instead of pt or cm. Just delete the default css and the custom one takes over like magic. This setup is in the manual.

This input plugin works so well that the only work needed at the epub code level is a little fix-up on images, 3 or 4 standard code changes I like, and adding metadata. Depending on the doc, I use Sigil or the Calibre Editor for this--they have different tools, both are very good. I almost never have to touch the actual coding of any book text.

No one-button conversion is ever going to get you a book like this from old, multi-column text full of advertisements!

Remaining problem, what to do with the resulting books? Most of the stuff I do is really old, but I'm not about to try and understand the copyright issues. You can probably find some of this if you search in likely places.
retiredbiker is offline   Reply With Quote
Old 06-23-2024, 08:45 PM   #21
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
Quote:
Originally Posted by Frogm4n View Post
Thanks for linking that. I am glad it is multi-platform. However, on Linux it seems to be single-threaded. Is it the same on Windows/macOS? My main workstation has 14 cores, so seeing it do a compute-heavy task on just one of them is a bit frustrating.
9 million downloads, 38,000 per week -- where have I been?
Great little program for doing some simple things very easily.
Tried the "black and white" tool on a very brown and ugly magazine scan from IA. It used all 24 of my cores (Thelio box from System76 on Pop_os, an Ubuntu based OS).
Very fast. And excellent output (on that sample of one...)
retiredbiker is offline   Reply With Quote
Old 06-23-2024, 10:27 PM   #22
Frogm4n
Addict
Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.
 
Posts: 229
Karma: 1600689
Join Date: Jul 2023
Device: Kindle Scribe
Quote:
Originally Posted by retiredbiker View Post
9It used all 24 of my cores (Thelio box from System76 on Pop_os, an Ubuntu based OS).
Lucky! I tried it (and just tried again) with both the DEB download and the apt repo, as well as the Flatpak and all only run a single thread on my Ubuntu 23.10 install.
Frogm4n is offline   Reply With Quote
Old 06-23-2024, 10:51 PM   #23
jackm8
Addict
jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.
 
jackm8's Avatar
 
Posts: 211
Karma: 2818790
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by retiredbiker View Post
General method, if anyone wants to try. I do all this on a Linux box, but most everything or an equivalent is probably available on any platform:
I haven't looked deeply into your post, yet, but I'm sure that I'll prove a goldmine of useful information.

Just looking at the surface of it, and that blogspot link. I'm wondering how are PDF's generated for borrowed books? They look like they're their own PDF's that just aren't listed. I tend to see similar artefacts when comparing PDF's of books that they have freely available and .jp2 source files. It's making their PDF's appalling when it comes to illustrations and photos, and a major reason why I'm making my own.


I'll post my own workflow later, if anyone is interested in it. I'm taking jp2.zip or riping .jpg's from webpage, and I'm converting them into PDF's.

I'm just wondering if it would be possible to get jp2 files using the same process? In vast majority of cases jp2's are massive overkill regarding file sizes for the quality that they offer, but they're needed simply because their PDF process obliterates quality.



Example of PDF compared to Webpage source and jp2 zip files:

Sample book: https://archive.org/details/atlanticcrossing0000melv
PDF: https://archive.org/download/atlanti...ng0000melv.pdf
Webpage source: https://archive.org/details/atlantic...age/6/mode/2up
JP2 processed source: https://archive.org/download/atlanti...00melv_jp2.zip (224mb file)

Comparisons in attachments. Artefacts should be plainly visible. I also included a spread of two pages in full resolution in zip file. ZIP is required as this site doesn't support such high resolutions. Jpeg is compressed compared to .jp2 source to keep file size down, but quality is actually comparable to other comparisons. Extra resolution isn't doing much, simply because scans aren't perfect to begin with.
Attached Thumbnails
Click image for larger version

Name:	compared2.jpg
Views:	88
Size:	967.0 KB
ID:	209112   Click image for larger version

Name:	compared.jpg
Views:	81
Size:	1.49 MB
ID:	209113  
Attached Files
File Type: zip original res.zip (3.26 MB, 59 views)
jackm8 is offline   Reply With Quote
Old 06-24-2024, 05:43 AM   #24
Sarmat89
Fanatic
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 502
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by retiredbiker View Post
Run OCR page by page, doing each column one at a time, avoiding ads and following "continued on page nnn" instructions: Tesseract OCR using OCRFeeder front end.
Just to add some context, there was OCR software with interactive correction back in 1990.
The same software could also use dictionaries to improve recognition, which Tesseract still cannot do.
Sarmat89 is offline   Reply With Quote
Old 07-03-2024, 12:35 PM   #25
toronto92
Junior Member
toronto92 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2024
Device: Few of them....
Quote:
Originally Posted by j.p.s View Post
I think of the (public domain) scans on archive.org as valuable raw material for ebook producers that can't or don't want to do the scanning themselves. (Just as Project Gutenberg can be a starting point for someone wanting to produce high quality formatting.)
Sharing Your opinion....
toronto92 is offline   Reply With Quote
Old 08-21-2024, 02:03 PM   #26
Ozymango
Member
Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.Ozymango knows how to set a laser printer to stun.
 
Ozymango's Avatar
 
Posts: 21
Karma: 95132
Join Date: Jan 2013
Device: Nook HD+, HD (Cyanogenmod, LineageOS), Simple Touch (rooted), various
Quote:
Originally Posted by retiredbiker View Post
Find the PDF I want, something like Dime Detective January 1950 just as an example. (If the book is "borrow for an hour", you might need https://financial-accounting-acg2021.../download.html to help get the pdf. The Calibre de-ACSM plugin works for this without Adobe.)
You can save yourself a heck of a lot of work by just installing the Internet Archive Downloader extension to your browser -- then you can just directly download a PDF file or collection of JPG images of the book you want to work with.

https://github.com/elementdavv/inter...ive_downloader
Ozymango is offline   Reply With Quote
Old 08-21-2024, 03:41 PM   #27
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,325
Karma: 90943357
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by Ozymango View Post
You can save yourself a heck of a lot of work by just installing the Internet Archive Downloader extension to your browser -- then you can just directly download a PDF file or collection of JPG images of the book you want to work with.

https://github.com/elementdavv/inter...ive_downloader
Trivial to download IA PDF or image collections without adding another browser extension and increasing attack surface.

No-one needs that.

If it's in their library it may be pirated.
Quoth is offline   Reply With Quote
Old 08-21-2024, 03:51 PM   #28
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,351
Karma: 6794938
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by Quoth View Post
Trivial to download IA PDF
Really? How?
I am never able to download any pdf, no matter which book I check, it always tells me there are no options for download at this time.
Karellen is offline   Reply With Quote
Old 08-21-2024, 08:57 PM   #29
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 9,505
Karma: 64500003
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
Quote:
Originally Posted by Karellen View Post
Really? How?
I am never able to download any pdf, no matter which book I check, it always tells me there are no options for download at this time.
For loans, you can only get a full loan (more than 1 hour) and download if there's at least two copies — has to be the exact same edition, too.

I've noticed a lot of previously-borrowable books also going "preview only" since the lawsuit started.

Last edited by ownedbycats; 08-21-2024 at 09:18 PM.
ownedbycats is offline   Reply With Quote
Old 08-22-2024, 08:57 AM   #30
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 12,325
Karma: 90943357
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by Karellen View Post
Really? How?
I am never able to download any pdf, no matter which book I check, it always tells me there are no options for download at this time.
Then it's not PD and only in their library system which cheats authors. They think by having checkout and loans on copyright material that the laws that apply to everyone else mysteriously don't apply to them.

If you want your own copy of a copyright work, rather than public library, Kindle Unlimited, or IA's dodgy Open Library, then you have to buy it.
Quoth is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Archive.org ePub Ghitulescu ePub 12 06-01-2021 02:55 AM
archive.org downloads abrogard Calibre 2 08-11-2018 06:08 PM
Archive.org crutledge General Discussions 129 08-28-2015 06:22 AM
Making Table of Contents on free eBook files from Archive.org automa Sigil 2 11-18-2012 07:00 AM
Archive.org opens huge ebook lending library rogue_librarian News 37 02-27-2011 08:16 AM


All times are GMT -4. The time now is 03:29 AM.


MobileRead.com is a privately owned, operated and funded community.