Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 06-19-2024, 06:28 PM   #16
Comfy.n
want to learn what I want
Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.Comfy.n ought to be getting tired of karma fortunes by now.
 
Posts: 1,069
Karma: 6425108
Join Date: Sep 2020
Device: Calibre E-book viewer
Quote:
Originally Posted by Frogm4n View Post
Thanks for linking that. I am glad it is multi-platform. However, on Linux it seems to be single-threaded. Is it the same on Windows/macOS? My main workstation has 14 cores, so seeing it do a compute-heavy task on just one of them is a bit frustrating.
Glad you liked it! I see four processes on Windows most of the time: three NAPS2.exe and one NAPS2.Worker.exe. Haven't used it intensively yet, though.
Comfy.n is offline   Reply With Quote
Old 06-19-2024, 06:59 PM   #17
Frogm4n
Zealot
Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.
 
Posts: 136
Karma: 1512687
Join Date: Jul 2023
Device: Kindle Scribe
Quote:
Originally Posted by Dr. Drib View Post
I just do this in calibre:

VIEW (open the PDF file)
FILE
EXPORT
QUARTZ FILTER (change to BLACK & WHITE)
SAVE

I then take the saved file from Documents (Macbook Air) and load the file onto calibre and then transfer it to my Kindle Scribe.
That sounds like you are using the built in macOS Preview application.
Frogm4n is offline   Reply With Quote
Old 06-20-2024, 06:54 AM   #18
Dr. Drib
Grand Sorcerer
Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.Dr. Drib ought to be getting tired of karma fortunes by now.
 
Dr. Drib's Avatar
 
Posts: 44,839
Karma: 55647515
Join Date: Jan 2007
Location: Peru
Device: Kindle: Oasis 3, Voyage WiFi; Kobo: Libra 2, Aura One
Quote:
Originally Posted by Frogm4n View Post
That sounds like you are using the built in macOS Preview application.
I don't know, but if that is it, the application gets rid of the yellow tint and speeds up the PDF (when turning pages) to where it is almost like reading a regular file.
Dr. Drib is offline   Reply With Quote
Old 06-21-2024, 07:38 PM   #19
jackm8
Zealot
jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.
 
Posts: 149
Karma: 1253386
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by Sirtel View Post
...And I couldn't care less about illustrations. Give me pure text any day.
I can't agree with that. There are books that are enriched by illustrations, and there are even some where they're essential for the story. In the first category, you have books designed with specific illustrations in mind. Often they are dawn by authors themselves (Hobbit, Decline and Fall...), one can argue that they offer an intended reading experience. While with the latter, they're only way to experience the whole story fully. Take Path to Rome from Belloc, for example. It was avant guard, modern travelogue at the time, and it still feels fresh today with it's unique 'chapter' structure. Author talks to the reader, describes his drawings, his maps. Illustrations, in this case, are vital for the storytelling. The novel simply doesn't work without them.
Attached Thumbnails
Click image for larger version

Name:	srgb iec PNG pathtorome00bell_0158.jpg
Views:	44
Size:	663.2 KB
ID:	209073   Click image for larger version

Name:	srgb iec PNG pathtorome00bell_0203.jpg
Views:	37
Size:	511.2 KB
ID:	209074  
jackm8 is offline   Reply With Quote
Old 06-23-2024, 03:22 PM   #20
retiredbiker
Addict
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 392
Karma: 1673278
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
How to make a good EPUB from an IA PDF

The Internet Archive PDFs have become most of the raw material for my retirement hobby: getting pulp magazine and similar stories (and whole issues) into good, illustrated epub books. Work yes, but with good tools I can often do a 70,000-word magazine issue or novel in a few days. (The first one I ever did probably took two months!)

General method, if anyone wants to try. I do all this on a Linux box, but most everything or an equivalent is probably available on any platform:

Find the PDF I want, something like Dime Detective January 1950 just as an example. (If the book is "borrow for an hour", you might need https://financial-accounting-acg2021.../download.html to help get the pdf. The Calibre de-ACSM plugin works for this without Adobe.)

Pull out usable images with pdftopng (https://poppler.freedesktop.org)

If the page images are really bad, which is rare, clean them up with ImageMagick or Scan Taylor Advanced. https://imagemagick.org/script/download.php and/or https://github.com/4lex4/scantailor-advanced

Run OCR page by page, doing each column one at a time, avoiding ads and following "continued on page nnn" instructions: Tesseract OCR using OCRFeeder front end. This sounds horrible but is actually quite quick, several pages a minute with practice. https://wiki.gnome.org/action/show/Apps/OCRFeeder

Take the page-by-page text output and copy it into LibreOffice Writer. I have made a template, .ott file, with about 15 custom styles to handle most all needs for fiction of this type.

I use GIMP to extract and fix up images I want in the book. Scale large images to max 1200 px in largest dimension since target is e-ink readers. In Writer, anchor images "as characters" and keep it simple.

Proofread and correct the Writer document, formatting it with the custom styles. I do this maybe 20 or 30 pages at a time. Each input doc seems to give repeated OCR errors due to typesetting and/or scanning, so this allows find-and-replace corrections in large chunks. (I wish Writer had saved searches like Sigil or Calibre Editor.)

Import the .odt file into Sigil using the new version of the ODT Import plugin--it is just terrific. I have a custom css file in the plugin that exactly matches the styles in my Writer template file, but dimensions in em instead of pt or cm. Just delete the default css and the custom one takes over like magic. This setup is in the manual.

This input plugin works so well that the only work needed at the epub code level is a little fix-up on images, 3 or 4 standard code changes I like, and adding metadata. Depending on the doc, I use Sigil or the Calibre Editor for this--they have different tools, both are very good. I almost never have to touch the actual coding of any book text.

No one-button conversion is ever going to get you a book like this from old, multi-column text full of advertisements!

Remaining problem, what to do with the resulting books? Most of the stuff I do is really old, but I'm not about to try and understand the copyright issues. You can probably find some of this if you search in likely places.
retiredbiker is offline   Reply With Quote
Old 06-23-2024, 08:45 PM   #21
retiredbiker
Addict
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 392
Karma: 1673278
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
Quote:
Originally Posted by Frogm4n View Post
Thanks for linking that. I am glad it is multi-platform. However, on Linux it seems to be single-threaded. Is it the same on Windows/macOS? My main workstation has 14 cores, so seeing it do a compute-heavy task on just one of them is a bit frustrating.
9 million downloads, 38,000 per week -- where have I been?
Great little program for doing some simple things very easily.
Tried the "black and white" tool on a very brown and ugly magazine scan from IA. It used all 24 of my cores (Thelio box from System76 on Pop_os, an Ubuntu based OS).
Very fast. And excellent output (on that sample of one...)
retiredbiker is offline   Reply With Quote
Old 06-23-2024, 10:27 PM   #22
Frogm4n
Zealot
Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.Frogm4n ought to be getting tired of karma fortunes by now.
 
Posts: 136
Karma: 1512687
Join Date: Jul 2023
Device: Kindle Scribe
Quote:
Originally Posted by retiredbiker View Post
9It used all 24 of my cores (Thelio box from System76 on Pop_os, an Ubuntu based OS).
Lucky! I tried it (and just tried again) with both the DEB download and the apt repo, as well as the Flatpak and all only run a single thread on my Ubuntu 23.10 install.
Frogm4n is offline   Reply With Quote
Old 06-23-2024, 10:51 PM   #23
jackm8
Zealot
jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.jackm8 ought to be getting tired of karma fortunes by now.
 
Posts: 149
Karma: 1253386
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by retiredbiker View Post
General method, if anyone wants to try. I do all this on a Linux box, but most everything or an equivalent is probably available on any platform:
I haven't looked deeply into your post, yet, but I'm sure that I'll prove a goldmine of useful information.

Just looking at the surface of it, and that blogspot link. I'm wondering how are PDF's generated for borrowed books? They look like they're their own PDF's that just aren't listed. I tend to see similar artefacts when comparing PDF's of books that they have freely available and .jp2 source files. It's making their PDF's appalling when it comes to illustrations and photos, and a major reason why I'm making my own.


I'll post my own workflow later, if anyone is interested in it. I'm taking jp2.zip or riping .jpg's from webpage, and I'm converting them into PDF's.

I'm just wondering if it would be possible to get jp2 files using the same process? In vast majority of cases jp2's are massive overkill regarding file sizes for the quality that they offer, but they're needed simply because their PDF process obliterates quality.



Example of PDF compared to Webpage source and jp2 zip files:

Sample book: https://archive.org/details/atlanticcrossing0000melv
PDF: https://archive.org/download/atlanti...ng0000melv.pdf
Webpage source: https://archive.org/details/atlantic...age/6/mode/2up
JP2 processed source: https://archive.org/download/atlanti...00melv_jp2.zip (224mb file)

Comparisons in attachments. Artefacts should be plainly visible. I also included a spread of two pages in full resolution in zip file. ZIP is required as this site doesn't support such high resolutions. Jpeg is compressed compared to .jp2 source to keep file size down, but quality is actually comparable to other comparisons. Extra resolution isn't doing much, simply because scans aren't perfect to begin with.
Attached Thumbnails
Click image for larger version

Name:	compared2.jpg
Views:	24
Size:	967.0 KB
ID:	209112   Click image for larger version

Name:	compared.jpg
Views:	21
Size:	1.49 MB
ID:	209113  
Attached Files
File Type: zip original res.zip (3.26 MB, 7 views)
jackm8 is offline   Reply With Quote
Old 06-24-2024, 05:43 AM   #24
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 485
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by retiredbiker View Post
Run OCR page by page, doing each column one at a time, avoiding ads and following "continued on page nnn" instructions: Tesseract OCR using OCRFeeder front end.
Just to add some context, there was OCR software with interactive correction back in 1990.
The same software could also use dictionaries to improve recognition, which Tesseract still cannot do.
Sarmat89 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Archive.org ePub Ghitulescu ePub 12 06-01-2021 02:55 AM
archive.org downloads abrogard Calibre 2 08-11-2018 06:08 PM
Archive.org crutledge General Discussions 129 08-28-2015 06:22 AM
Making Table of Contents on free eBook files from Archive.org automa Sigil 2 11-18-2012 07:00 AM
Archive.org opens huge ebook lending library rogue_librarian News 37 02-27-2011 08:16 AM


All times are GMT -4. The time now is 03:35 PM.


MobileRead.com is a privately owned, operated and funded community.