08-14-2023, 12:41 AM | #1 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Jul 2023
Device: none
|
Converting scanned images for use in epubs.
I am converting OCR text of old magazines into epubs and have some questions.
1. How should I balance image quality and size? What size should I try to keep my epubs under to be easily usable? The images are not fine art and often not even good art. I have been downsampling and saving with reduced quality, but do not want to go too far. 2. Some of the pictures have text in blank spaces and I don't know if I can do that in html. I am attaching one of the more extreme examples and I would like to know what other people would do with it. It is two pages. |
08-14-2023, 01:35 PM | #2 |
Grand Sorcerer
Posts: 5,531
Karma: 100606001
Join Date: Apr 2011
Device: pb360
|
Brushing or erasing large areas of the background to pure white wiould reduce file size a lot.
|
08-14-2023, 03:08 PM | #3 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Jul 2023
Device: none
|
I'm no graphic artist, and I'm trying not do things that would offend people who have an eye for art. I'm pretty sure my ham-handed attempts to change the background would be one of those things.
|
08-14-2023, 04:23 PM | #4 |
Resident Curmudgeon
Posts: 76,487
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Editing the background to make it white would give the images more contrast.
|
08-14-2023, 08:42 PM | #5 |
Grand Sorcerer
Posts: 5,531
Karma: 100606001
Join Date: Apr 2011
Device: pb360
|
|
08-15-2023, 01:13 AM | #6 |
A Hairy Wizard
Posts: 3,225
Karma: 19000635
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
There are basic 'eraser' type tools in most graphics programs that allow you to first, select the primary object, second, invert the selection so everything BUT the object is selected, and third, hit the delete key to remove everything but the object.
Then you can either leave it as a transparent channel .png, or fill the background with pure white and save as a .jpg . Then you can get your hands dirty and play around with changing the levels… if you mess something up, there is always 'undo'. There are also plenty of YouTube videos that demonstrate how to remove the background using whichever software you have. Personally, I would also stitch those two images back into a single image. That makes it much easier to flow the text around or otherwise manipulate. Last edited by Turtle91; 08-15-2023 at 01:15 AM. |
08-15-2023, 12:12 PM | #7 |
Enthusiast
Posts: 49
Karma: 10
Join Date: Jul 2023
Device: none
|
|
08-15-2023, 12:51 PM | #8 | |
Bibliophagist
Posts: 40,595
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
08-15-2023, 05:27 PM | #9 |
Resident Curmudgeon
Posts: 76,487
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Have a look at the eBook Three Men in a Boat available on MR. There are three L-shaped images and you'll be able to see the code for these images.
https://www.mobileread.com/forums/sh...ad.php?t=48377 |
08-22-2023, 03:53 PM | #10 | |||
Wizard
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Some people prefer manual image cleanup/steps. Like others already said though, it sometimes helps to:
that will severely cut down on filesize. See some of my discussion here on some potential tools/methods/examples:
That covers a lot of the "yellowed pages" and similar issues. Personally, I have zero image editing skills, so I mostly rely on the semi-automated tools to get me an "okay" image out of the original scans. Quote:
Like DNSB + JSWolf said, from there, you could mess with code for "L-shaped images", but those are very tricky to make work, plus each image will require manual code/tweaking. - - - Side Note: In the future, there might be better support for this type of thing with CSS3 Shapes + automatic shapes based on the image's alpha/transparency: but for now, that advanced code probably wouldn't work well in most ereaders. - - - Side Note #2: For more, similar discussion, also see: Quote:
Then you could just plop the image into the ebook and treat it like a normal rectangle. :P But to try to recreate some of these advanced two-page-spread layouts in an ebook... probably not the best idea! Last edited by Tex2002ans; 08-22-2023 at 04:02 PM. |
|||
09-02-2023, 06:11 PM | #11 |
Evangelist
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
I've done a lot of old magazine OCR from Internet Archive stuff. I use OCRFeeder, a front end for Tesseract, for the OCR. It lets me select blocks of text around images, avoid adverts, and deal with the dreaded "continued on page nn" right up front, so I don't have a mess in the text file to fix.
I use Gimp to edit individual images, but I'm no expert. I size full-page images for the resulting epubs with the longest dimension around 1200 px, and around 150 px/in resolution. This gives PLENTY of quality for zooming in on my Kobo, if needed. It keeps the file size reasonable, too. Of course if an image in the magazine is small, I just leave it that way, and eye-ball how to make it look on the reader; see below. I do some cleanup, it depends on the image. Anything muddy in the original will be just terrible on e-ink. So going to grayscale and playing with contrast are common options. Color images are completely different, and frankly I struggle there, if the original is bad. But again, getting higher contrast is good for e-ink. Turning spotty backgrounds to white is definitely worth it for many old/yellow/brown images. Those two-page title spreads I always stitch together into one image and take out any text, so the result is a plain rectangle for a title image. Never try and get text into some odd-shaped image, e-readers just won't do it. Put the images into the epub with a css class that gives % height or width, and the other "auto". Never code them in with hard dimensions. And if you are doing books for general consumption, have a heart for us old nearly blind folks...test your book on e-ink at really huge text sizes, like 24 or 36 points on the reader. That is like 3 or 4 words per line. A lot of fancy stuff that looks good at small text sizes just falls apart when you do that. Last edited by retiredbiker; 09-02-2023 at 06:14 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Keeping Scanned Books As Images Only | Agent69 | Workshop | 3 | 11-16-2014 11:21 AM |
Converting a scanned book from 1DollarScan to ePub | adrenaline | Workshop | 30 | 10-04-2014 03:24 AM |
trouble when converting many epubs to epubs | comet | Conversion | 13 | 03-21-2012 02:57 AM |
Enhancing text in scanned images | crackhammer | General Discussions | 15 | 03-12-2012 07:09 AM |
pdf with scanned images | Leite | iRex | 5 | 08-18-2008 01:54 PM |