05-13-2006, 11:38 AM | #1 |
Fully Converged
Posts: 18,170
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Scanning books from your own library
Branko of Teleread came up with some interesting statistics suggesting that - unlike distributed proofreaders - many of us would love to digitize their own personal libraries.
If you've ever tried to scan a full-length book without having access to a high-end $150k+ scanner, you'll understand why professional proofreaders who deal with books every day are not so fond of the idea of scanning their own content. Manual scanning and OCR'ing is a pain since both tasks are time-consuming and usually prone to errors. Now, as many of you know, Google is working with various major libraries to digitally scan books from their collections so that users worldwide can search them online. But don't expect some poor first-year student to sit all day and night in front of a low-cost scanner flipping pages. These libraries have access to fully automated page-turning and scanning devices that produces high quality digital images of bound materials (nondestructive) at throughput rates as high as 2400 pages per hour. It'd be great if one day you could just visit a Kinko's outlet and rent a Kirtas scanning device for a short period of time. Only then would I be willing to turn my dusty library into a bunch of e-book. |
05-13-2006, 12:23 PM | #2 |
Books and more books
Posts: 917
Karma: 69499
Join Date: Mar 2006
Location: White Plains, NY, USA
Device: Nook Color, Itouch, Nokia770, Sony 650, Sony 700(dead), Ebk(given)
|
Hi,
My experience with scanning is as follows: - opticbook 3600 scanner (~250$) with decent ocr included (abby) - I scan double page, 300 dpi, b&w, tif or pbm (mostly tif but sometimes pbm is easier to manipulate) - I do 10 pages (5 dp sheets) per minute for hc/tp, 14 p per minute pb and just watch a movie on my portable dvd player when scanning - pc does the ocr in about 20-30 minutes per book and I just send word and then text since that eliminates most strange characters - since everything is for my personal use, I do not bother correcting, the software is good enough for the results to be nicely readable (once you get used with several quirks like "die" instead of "the" sometimes) I have done maybe 20 books and read about 5 fully on my Nokia 770 or Ebookwise 1150, partially from others. Also you can do picture books and embed the scanned pages (maybe transformed to jpg) in html to read on pc/tablet with uBook, though I do it rarely since I do not like reading fiction on pc/tablet/laptop. Hope this helps, Liviu |
Advert | |
|
05-14-2006, 06:07 AM | #3 |
Addict
Posts: 314
Karma: 1002965
Join Date: Mar 2006
Location: UK
Device: ILiad. Gen 3, PocketBook 360, Kobo Aura HD, Kindle Oasis 2
|
I have scanned over 250 books from my personal library.
Eyesight problems are my main reason for doing this. I can create an eBook with a larger font size which makes for easier reading on my eBookwise than from the original paper book. I use an Optibook 3600 or a Canon Lide 60 to scan two pages at a time into Abbyy Fine Reader. After editing with Abbyy for spelling and scanning errors I then send the pages to Word. It is in Word that I arrange for a larger font size and other special formatting for chapter headings etc and removal of page numbers. I save the file as an .rtf file and then convert this to the .imp format required by the eBookwise. I never save in .txt format because all formatting such as bold, italics etc are lost. Italics in particular are necessary to follow the storyline in some novels because they often represent thought or telepathy etc. Project Gutenberg overcomes this by using all upper case letters for emphasis but I find this distracting. So, time consuming — yes — but I can usually manage to produce a finished ebook in less than a day and I also find the work very therapeutic and rewarding. This process means that when browsing in my local bookstore I don't have to put most of what interests me back on the shelf because I can't read the text. |
05-14-2006, 06:25 PM | #4 |
Addict
Posts: 270
Karma: 298
Join Date: Mar 2005
|
I just scan the images into a PDF. It takes a lot less time, and OCR errors really bug me for some reason.
|
05-14-2006, 09:22 PM | #5 | |
Recovering Gadget Addict
Posts: 5,381
Karma: 676161
Join Date: May 2004
Location: Pittsburgh, PA
Device: iPad
|
Quote:
The main problem I see with scanning to pdf without OCR is if you want to read on a small screen device or if you need small file sizes. It just wouldn't seem to be useful for mobile reading unless you are using a laptop. Even the new UMPCs might be too small for a scanned book, wouldn't they? |
|
Advert | |
|
05-15-2006, 03:52 AM | #6 |
Fully Converged
Posts: 18,170
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
There's an in-depth article on (the need of) book scanning in the NYT today:
http://www.nytimes.com/2006/05/14/ma...gewanted=print "In a regime of superabundant free copies, copies lose value. They are no longer the basis of wealth. Now relationships, links, connection and sharing are. Value has shifted away from a copy toward the many ways to recall, annotate, personalize, edit, authenticate, display, mark, transfer and engage a work. Authors and artists can make (and have made) their livings selling aspects of their works other than inexpensive copies of them. They can sell performances, access to the creator, personalization, add-on information, the scarcity of attention (via ads), sponsorship, periodic subscriptions -- in short, all the many values that cannot be copied." |
05-15-2006, 12:52 PM | #7 | |
Books and more books
Posts: 917
Karma: 69499
Join Date: Mar 2006
Location: White Plains, NY, USA
Device: Nook Color, Itouch, Nokia770, Sony 650, Sony 700(dead), Ebk(given)
|
Hi,
If you do not (or cannot due to formulas/diagrams) OCR, you can read the images directly with your favourite slideshow software, or embed them in a blank html and use uBook or your favourite pc software reader. Pdf's take less space true, but unless we get a portable reader that can read them properly (no scrolling or zooming necessary, pdf page to pdf portable device screen - here portable means something I can use one handed and without mouse/pen) size does not really matter since in all of the above ways you read an image at a time so speed is not an issue and actually it is less memory consuming this way than reading a pdf, you just need enough hard drive space for the images. This is how I read selected pdf's with my Nokia 770, by cutting the pages (through djvudigital and ddjvu) in half (portrait) or 4 (landscape dble page scan), making sure that each image is 800x480, and using lower quality pnmtojpeg to get manageable size (~40 kb/image or 80 kb/page) since the Nokia screen is good enough. The result is very nicely readable, very fast since Fbreader gets an image at a time, though I lose navigation except page by page. But it is worth it since even with evince pdf's are slow and you need scrolling and so on... Whenever you have a fast html reader that takes embedded images and enough hard memory this method works nicely as long as you cut to screen size and the result is readable (even on Ebookwise it works for most scans with cutting in half and resizing to 318x448), but of course I would rather read the pdf directly and not have to write the scripts to cut and so on... We have to see but I think that the Iliad may be able to read nicely a portrait pdf scan, though not a landscape scan, while the Sony reader will not be able to do that due to lower resolution. It may read "reflowable" pdf's, but scans no. Liviu Quote:
|
|
05-17-2006, 11:21 AM | #8 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
I've wondered myself if anyone else has tried to improve OCR by taking a 2-step scanning process... that is, photocopy-enlarging the pages to letter size, then doing the scan and OCR. This has worked for me on small article scans, but I've never gone through the trouble for an entire book.
(Frankly, my head would blow up if I considered digitizing my entire library, and it's not that big!) |
05-19-2006, 07:23 AM | #9 |
Junior Member
Posts: 6
Karma: 10
Join Date: May 2006
Device: TungstenE
|
Sounds like a useful service might be where one could post a book away and have it scanned and proofed into a format of their choosing.
Gav |
05-23-2006, 06:45 PM | #10 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
That sounds like a service that would have to be funded by a non-profit of some kind, because I can't imagine it ever being a profitable venture.
Eh? Any non-profits interested? Speak up... |
06-14-2006, 03:03 PM | #11 | |
Addict
Posts: 222
Karma: 110
Join Date: Jun 2006
Location: Malmo, Sweden
Device: iLiad, Sony PRS-505, Kindle Paperwhite & Oasis
|
Quote:
However, with a reasonably modern scanner, capable of real 300 dpi resolution, and OCR software with the functionality of, say, FineReader 8, you don't need it. You'll need to check thresholding levels (unless you go for greyscale) before you start working, and you may have to check for light levels drifting as the scanner gets warm, but apart from that it's rather plain sailing. In higher resolution and with good print work, the problem more or less goes away. I've done 600dpi work, and had something like one misread per two pages with only one or two pages of training beforehand. |
|
06-14-2006, 08:08 PM | #12 | |
Addict
Posts: 260
Karma: 4256
Join Date: Feb 2006
Device: SHARP Zaurus C1000
|
Quote:
Digitization service ... hmmm ... |
|
06-14-2006, 09:29 PM | #13 |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
You think? How much would you be willing to spend to have your $6 book scanned and digitized? $10? $50? $100?
How much do you think you as the scanner would have to charge, to make it worth your while in equipment, time, manpower, etc? $100? $50? $10? I think that, unless the process becomes much more automatic, faster, and more dependable, few customers will be willing to pay the amount vendors would ask to do the work. |
06-16-2006, 12:28 AM | #14 | |
Addict
Posts: 260
Karma: 4256
Join Date: Feb 2006
Device: SHARP Zaurus C1000
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Scanning in your own books | gazza | News | 125 | 01-24-2016 04:42 PM |
calibre crashes when scanning and adding books | oncdoc | Calibre | 8 | 04-21-2010 03:03 PM |
Scanning books - New need help | Sporadic | Workshop | 9 | 04-19-2009 01:11 PM |
Scanning paper (out of copyright) books. | Charles Gray | Workshop | 18 | 03-25-2009 02:06 PM |
Scanning books | Nate the great | Lounge | 10 | 11-04-2007 01:20 AM |