Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 08-14-2023, 12:41 AM   #1
jwes
Enthusiast
jwes began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Jul 2023
Device: none
Converting scanned images for use in epubs.

I am converting OCR text of old magazines into epubs and have some questions.

1. How should I balance image quality and size? What size should I try to keep my epubs under to be easily usable? The images are not fine art and often not even good art. I have been downsampling and saving with reduced quality, but do not want to go too far.

2. Some of the pictures have text in blank spaces and I don't know if I can do that in html. I am attaching one of the more extreme examples and I would like to know what other people would do with it. It is two pages.
Attached Thumbnails
Click image for larger version

Name:	left.jpg
Views:	199
Size:	335.9 KB
ID:	203161   Click image for larger version

Name:	right.jpg
Views:	219
Size:	271.2 KB
ID:	203162  
jwes is offline   Reply With Quote
Old 08-14-2023, 01:35 PM   #2
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,531
Karma: 100606001
Join Date: Apr 2011
Device: pb360
Brushing or erasing large areas of the background to pure white wiould reduce file size a lot.
j.p.s is offline   Reply With Quote
Old 08-14-2023, 03:08 PM   #3
jwes
Enthusiast
jwes began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Jul 2023
Device: none
Quote:
Originally Posted by j.p.s View Post
Brushing or erasing large areas of the background to pure white wiould reduce file size a lot.
I'm no graphic artist, and I'm trying not do things that would offend people who have an eye for art. I'm pretty sure my ham-handed attempts to change the background would be one of those things.
jwes is offline   Reply With Quote
Old 08-14-2023, 04:23 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,487
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by jwes View Post
I'm no graphic artist, and I'm trying not do things that would offend people who have an eye for art. I'm pretty sure my ham-handed attempts to change the background would be one of those things.
Editing the background to make it white would give the images more contrast.
JSWolf is offline   Reply With Quote
Old 08-14-2023, 08:42 PM   #5
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,531
Karma: 100606001
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by JSWolf View Post
Editing the background to make it white would give the images more contrast.
Except that there are all those half toning dots all over the background.
j.p.s is offline   Reply With Quote
Old 08-15-2023, 01:13 AM   #6
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,225
Karma: 19000635
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
There are basic 'eraser' type tools in most graphics programs that allow you to first, select the primary object, second, invert the selection so everything BUT the object is selected, and third, hit the delete key to remove everything but the object.

Then you can either leave it as a transparent channel .png, or fill the background with pure white and save as a .jpg .

Then you can get your hands dirty and play around with changing the levels… if you mess something up, there is always 'undo'.

There are also plenty of YouTube videos that demonstrate how to remove the background using whichever software you have.

Personally, I would also stitch those two images back into a single image. That makes it much easier to flow the text around or otherwise manipulate.

Last edited by Turtle91; 08-15-2023 at 01:15 AM.
Turtle91 is offline   Reply With Quote
Old 08-15-2023, 12:12 PM   #7
jwes
Enthusiast
jwes began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Jul 2023
Device: none
Quote:
Originally Posted by Turtle91 View Post
Personally, I would also stitch those two images back into a single image. That makes it much easier to flow the text around or otherwise manipulate.
How do I get the text to flow into the two blank areas on top right and bottom right of the combined image?
jwes is offline   Reply With Quote
Old 08-15-2023, 12:51 PM   #8
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 40,595
Karma: 157444382
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by jwes View Post
How do I get the text to flow into the two blank areas on top right and bottom right of the combined image?
I did a quick search on "L-shaped images wrap text site:mobileread.com" and it came up with multiple hits. See Text wrap around irregular shapes as an example.
DNSB is offline   Reply With Quote
Old 08-15-2023, 05:27 PM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,487
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Have a look at the eBook Three Men in a Boat available on MR. There are three L-shaped images and you'll be able to see the code for these images.

https://www.mobileread.com/forums/sh...ad.php?t=48377
JSWolf is offline   Reply With Quote
Old 08-22-2023, 03:53 PM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,304
Karma: 12587727
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by jwes View Post
I am converting OCR text of old magazines into epubs and have some questions.

1. How should I balance image quality and size? What size should I try to keep my epubs under to be easily usable? The images are not fine art and often not even good art. I have been downsampling and saving with reduced quality, but do not want to go too far.
There was a few topics way back in:

Some people prefer manual image cleanup/steps.

Like others already said though, it sometimes helps to:
  • Convert backgrounds to pure white
    • By adjusting the contrast/levels.
  • Convert lines to pure B&W or grayscale
    • 256 or less shades of gray.

that will severely cut down on filesize.

See some of my discussion here on some potential tools/methods/examples:

That covers a lot of the "yellowed pages" and similar issues.

Personally, I have zero image editing skills, so I mostly rely on the semi-automated tools to get me an "okay" image out of the original scans.

Quote:
Originally Posted by jwes View Post
2. Some of the pictures have text in blank spaces and I don't know if I can do that in html. I am attaching one of the more extreme examples and I would like to know what other people would do with it. It is two pages.
For oddly shaped images—like one with text below the horse's hooves—it's best to completely cut the text out of the images and replace with pure blank background.

Like DNSB + JSWolf said, from there, you could mess with code for "L-shaped images", but those are very tricky to make work, plus each image will require manual code/tweaking.

- - -

Side Note: In the future, there might be better support for this type of thing with CSS3 Shapes + automatic shapes based on the image's alpha/transparency:

but for now, that advanced code probably wouldn't work well in most ereaders.

- - -

Side Note #2: For more, similar discussion, also see:

Quote:
Originally Posted by Turtle91 View Post
Personally, I would also stitch those two images back into a single image. That makes it much easier to flow the text around or otherwise manipulate.
Yep! Exactly! Merge the left/right halves of the horse together into a single image.

Then you could just plop the image into the ebook and treat it like a normal rectangle. :P

But to try to recreate some of these advanced two-page-spread layouts in an ebook... probably not the best idea!

Last edited by Tex2002ans; 08-22-2023 at 04:02 PM.
Tex2002ans is offline   Reply With Quote
Old 09-02-2023, 06:11 PM   #11
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 420
Karma: 2737916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
I've done a lot of old magazine OCR from Internet Archive stuff. I use OCRFeeder, a front end for Tesseract, for the OCR. It lets me select blocks of text around images, avoid adverts, and deal with the dreaded "continued on page nn" right up front, so I don't have a mess in the text file to fix.

I use Gimp to edit individual images, but I'm no expert. I size full-page images for the resulting epubs with the longest dimension around 1200 px, and around 150 px/in resolution. This gives PLENTY of quality for zooming in on my Kobo, if needed. It keeps the file size reasonable, too. Of course if an image in the magazine is small, I just leave it that way, and eye-ball how to make it look on the reader; see below.

I do some cleanup, it depends on the image. Anything muddy in the original will be just terrible on e-ink. So going to grayscale and playing with contrast are common options. Color images are completely different, and frankly I struggle there, if the original is bad. But again, getting higher contrast is good for e-ink. Turning spotty backgrounds to white is definitely worth it for many old/yellow/brown images.

Those two-page title spreads I always stitch together into one image and take out any text, so the result is a plain rectangle for a title image. Never try and get text into some odd-shaped image, e-readers just won't do it.

Put the images into the epub with a css class that gives % height or width, and the other "auto". Never code them in with hard dimensions.

And if you are doing books for general consumption, have a heart for us old nearly blind folks...test your book on e-ink at really huge text sizes, like 24 or 36 points on the reader. That is like 3 or 4 words per line. A lot of fancy stuff that looks good at small text sizes just falls apart when you do that.

Last edited by retiredbiker; 09-02-2023 at 06:14 PM.
retiredbiker is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Keeping Scanned Books As Images Only Agent69 Workshop 3 11-16-2014 11:21 AM
Converting a scanned book from 1DollarScan to ePub adrenaline Workshop 30 10-04-2014 03:24 AM
trouble when converting many epubs to epubs comet Conversion 13 03-21-2012 02:57 AM
Enhancing text in scanned images crackhammer General Discussions 15 03-12-2012 07:09 AM
pdf with scanned images Leite iRex 5 08-18-2008 01:54 PM


All times are GMT -4. The time now is 04:43 PM.


MobileRead.com is a privately owned, operated and funded community.