Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-29-2008, 06:50 PM   #31
mphuie
Member
mphuie began at the beginning.
 
Posts: 11
Karma: 16
Join Date: Nov 2007
Device: nokia 770
Quote:
Originally Posted by recycledelectron View Post
As for the GB size, my PRS-505 changes to the next pic as quickly as it flips between pages in a PDF. The zoom works MUCH better on JPEGs than it does on PDFs. I like JPEGs better than PDFs on the PRS-505.
You don't even OCR the pictures, you actually view them on your Sony? It is even possible to read textbook sized pages scaled down an ebook screen? You'd have to manually zoom in and pan around to read anything

Execution sounds highly flawed.
mphuie is offline   Reply With Quote
Old 02-06-2008, 12:38 AM   #32
Gladtobemom
Connoisseur
Gladtobemom began at the beginning.
 
Gladtobemom's Avatar
 
Posts: 63
Karma: 10
Join Date: Feb 2008
Device: Cybook, Palm Z72
I've put about 30 of my technical references on my tablet PC.

We prepared a little room by installing two daylight ceiling fixtures (each with 4x4ft. daylight bulbs. Then DH put hooks on the ceiling and grommets on a king sized white sheet--and slung it up to tent under the lights.

He deconstructs the books for me by taking the spines off and trimming out the signatures and the sewing. He tries to cut the pages as close to the center of the book as possible.

Then I photograph them with my Pentax K100D (I bought this camera because it takes all my old pentax lenses).

DH and I can do the photography on a 1700 page text in about 8 hours. Yes it's time consuming. Then I make an html web page of them and turn them into a PDF or a Mobi book. IT works great.

I have all the texts I need for reference and teaching in my tablet PC.

I also have them in my little VAIO TR2A.

Total outlay in money, about 50$ for the fixtures and lightbulbs, maybe $10 for the hooks and grommets (had the sheet). The camera was about $500, but I bought it for other reasons.

It is an investment in time. I am NOT distributing these and I own multiple copies. One advantage, I took pictures of the ones with my notes in the margin and linked each page of the clean version with it's annotated version.

I've also put the 3 textbooks that I wrote on Mobi and freely offer the copies to students (after they've bought a copy) in class. I just note it on the copyright page of their copy.

Yep, I destroy the books, so far I've been keeping the copyright pages, pages 16, 99, and the cover. Just to prove that I "own" a legal copy.
Gladtobemom is offline   Reply With Quote
Advert
Old 07-08-2010, 11:25 AM   #33
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
I've just posted a similar question on another thread (before I found this).

Basically, I have 5000 paperbacks and want to scan them. What's currently the most reliable ADF scanner (ideally duplex) for this and what software would you recommend?

Iain
Iain is offline   Reply With Quote
Old 07-17-2010, 08:30 AM   #34
nyrath
Addict
nyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfolded
 
nyrath's Avatar
 
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
Based on recommendations from this forum, I got a Plustek optibook 3600. I've scanned four paperbacks so far, and it has worked reasonably well.

However, I have read reviews that suggest the bulb in the scanner tends to burn out quickly. Though those reviews were several years old.

The main problem I found is that some paperbacks print so close to the book spine that occasionally a couple of letters get clipped from the words. This is not a problem with hardbacks or larger books.

The bundled OCR package seems to work as well as the $100 OCR program I bought years ago (TextBridge Pro 9.0). About one mis-recognized word every four pages or so.

It saves all the scanned pages on your hard drive, so you could use another OCR program if you wish. I have a one terabyte external hard drive so space is not an issue. I scan grayscale 300 dpi TIFF format, so an entire paperback can take up 400 meg or so. Of course, once you've done OCR, you can delete all the TIFF files.

The time consuming part is the post production. I scan, use OCR, it loads it into Microsoft Word, and I save it as filtered HTML (I want to keep all the italic and bold formatting). I use a text processor (UltraEdit) to strip out all the <SPAN> tags, and turn all the <P attribute1="xxx", attribute2="xxx"... tags into <P> tags. I use Calibre to turn the HTML into ePub. Then I use Sigil to put <h1> tags on the chapter headings (which generates the table of contents), manually strip out the footers/headers that say NOVEL NAME page x, and manually correct any spelling mistakes.

I can get a paperback up to the Sigil step in an hour or two, but proofreading and correcting can take quite a long time.

Last edited by nyrath; 08-01-2010 at 07:20 PM.
nyrath is offline   Reply With Quote
Old 07-17-2010, 10:12 AM   #35
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by nyrath View Post
I can get a paperback up to the Sigil step in an hour or two, but proofreading and correcting can take quite a long time.
I'm afraid there are no shortcuts to thorough proof-reading. I'm in the process (and have been for a couple of years now) of creating a thoroughly-proofed "complete works of Dickens" here at MR. Each novel takes me about 2 months to proof-read, working at it a couple of hours a day. But that's proofing at the "every comma correct" level, which perhaps isn't required for the average paperback.
HarryT is offline   Reply With Quote
Advert
Old 07-23-2010, 09:53 AM   #36
Franky
Nameless Being
 
i've given myself a Plustek OpticBook 3600 plus. big word for a small scanner. i did a couple of books and i'm satisfied with the results. not bad for such a simple printer. it takes about 50 min to scan a book of 260 pages. that's the consuming part. the negative part is that the PDF is about 10mb big. that's something i try to change. with calibre you're able to convert into EPub and add the front-page to it.
  Reply With Quote
Old 07-23-2010, 08:09 PM   #37
nyrath
Addict
nyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfolded
 
nyrath's Avatar
 
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
I save my scanned books to MicrosoftWord/ WordPad, not to Adobe. This turns them into text. They wind up being about half a megabyte in size.
nyrath is offline   Reply With Quote
Old 07-23-2010, 08:54 PM   #38
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Quote:
Originally Posted by Iain View Post
I have 5000 paperbacks and want to scan them.
Unless you have a staff of 100 people ready to work on this project full-time, the best advice I can give you is to forget it.

It's fair to say it takes around an hour to scan a book to image format (no OCR, no conversion, no corrections). That means it will take you over 208 days, working 24/7 around the clock without any breaks, to turn your collection into a huge number of jpegs. It will take several multiples of that time to turn that stack of jpegs into something that is readable, depending on the amount of proofreading you perform.

If you have a few highly-prized books that are out-of-print and unlikely to be released as ebooks, then scanning and converting them would be a worthwhile project that you could perform in a few months of spare time. If your goal is to scan an entire library on your own before you die of old age, then you're chasing a rainbow.
charleski is offline   Reply With Quote
Old 07-26-2010, 02:23 PM   #39
nyrath
Addict
nyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfolded
 
nyrath's Avatar
 
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
Agreed.

I can scan in a 400 page paperback in about two hours, takes about 15 minutes for the OCR program to convert it to text.

Doing an exceedingly rough proofreading job on it can take a week. Doing a perfect job can take months.
nyrath is offline   Reply With Quote
Old 07-30-2010, 04:24 AM   #40
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
Actually, it's too quick

I'm still in R&D mode.

I have bought a book guillotine and a fujistsu 6130 scanner.

I'm still evaluating software. I very much like FineReader, but it seems to have less automation (at the bottom end of the price range) than OmniPage.

Next step is to write some scanning software which will pull in a book at a time and check that it has the right number of pages. I am astonished at the quality of the ADF on this scanner. I don't think it misfeeds at all.

But to my title. at 600dpi, it's taking something like 1 second per page (probably half that) and I'm loading pages in chunks of 50 (100 sides). So scanning a book takes 4-5 minutes. The problem is it takes that in 3 - 5 chunks which is exactly the wrong timing. I plan to work whilst this is happening and reckon I can handle something as mechanical as throwing paper in a hopper without too much distraction. However, I'm concerned about this and am considering a robotic device which will take batches of pages from a stacker of some kind under control of my scanning program. At the moment I'm looking at (the equivalent of) a radio shack robotic arm and a hopper made of balsa wood - just to show the spirit of Heat Robinson still lives over her in blighty!

So if I can bear the tedium (and with a little bit of overhead for slicing books up and management) in theory, I could do 50 books a day (so the whole lot in 6 months).

FineReader takes roughly the same time to process as the scanning does (on a quad core machine, at least), though Omnipage seems a bit slower. However, providing I can automate them (ideally without paying too much for the privilige), then they can run on overnight and tie things up.

Needless to say, this is destructive, which may not appeal to many.

I'll keep you all posted!

Iain
Iain is offline   Reply With Quote
Old 08-01-2010, 07:19 PM   #41
nyrath
Addict
nyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfolded
 
nyrath's Avatar
 
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
Unhappy

Quote:
Originally Posted by nyrath View Post
The bundled OCR package seems to work as well as the $100 OCR program I bought years ago (TextBridge Pro 9.0). About one mis-recognized word every four pages or so. But sometimes it loses entire sentences!
Nope, I was wrong. The bundled OCR program works just fine, it does NOT loose entire sentences.

What happened was I was doing some post-processing, and my poorly formed search-and-replace was deleting the sentences. The OCR was fine, the lost sentences were my fault.
nyrath is offline   Reply With Quote
Old 08-03-2010, 07:18 PM   #42
Lady Fitzgerald
Wizard
Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.
 
Lady Fitzgerald's Avatar
 
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
I'm using a Fujitsu Scansnap s1500 ADF scanner to digitize my books. Since I'm doing over 1200 books, I'm just making PDF copies of the pages, concatenated into a single PDF by the scanner software via Adobe Acrobat Standard (included with the scanner) and dispensing with OCR. I have to cut the spines off the books to do this. I started using a bandsaw to do this but that process left a friable cut edge that shed paperdust like a long haired dog sheds hair in the spring. No amount of cleaning could get rid of that dust. Glue from the binding also got onto the blade and tires and had to be cleaned off frequently. The dust was so bad it got into the scanner cameras and had to be professionally serviced, fortunately under warranty (although it took a bunch of esplainin'). I'm now using a guillotine type paper cutter that can handle up to 1 1/2" at a whack (thicker books have to be split in half first). That has dramatically reduced the dust. I'm using a small vacuum to remove the dust that does get on the scanner surfaces.

I first scan the covers, inside and out, to individual PDFs using a color setting. Then I scan the book pages themselves using the black and white setting at a light setting to help "filter" out specks on the page to a single PDF. The B&W setting also eliminates paper yellowing and gives fairly clean, clear text on a white background (heavily illistrated books would require grayscale or color settings which would give somewhat less desireable results). I then use Adobe Acrobat 9 Standard (came with the scanner) to insert the covers into the text PDF. I also scroll through the book to be sure pages were scanned in the correct order (human error happens) and that there aren't any pages that are oversized due to added margins (happens rarely; they are easily cropped in Acrobat). The whole process averages 15 minutes per book.

The scanned books read fine in Adobe Acrobat Reader or in Acrobat Standard (I use the latter since it works just fine and there is no point in having a redundant program). I found the JetBook Light has settings that allow the books to be read on a portable e-book reader (it fits in my purse) with some compromises. I set it to landscape and Fit to Width. That eliminates side margins. My largest books (roughly A4 page size) are readable that way although the print size is a bit fine (and I wear trifocals). Smaller books are much easier. Scrolling down each page took a bit of getting used to because each frame overlaps the previous one a bit and the last frame may overlap considerably. I find the advantage of portability outweighs the disadvanges. If a full page has to be viewed in its entirety, a much larger viewer, like a tablet, would be needed.

Of course, one could apply OCR (at least an hour), run a spell checker, check the spell checker, then edit for scanning errors not picked up by the spell checker. Since I'm such a nitpicker, it would take me as long to edit it as to read it. I don't have that much time (or patience; having ADD makes it worse) so I'm content with the PDFs. They are readable with the right readers.

Since the original books are being destroyed in the process, this is a media change and should pose no legal problems.

Last edited by Lady Fitzgerald; 08-03-2010 at 07:21 PM.
Lady Fitzgerald is offline   Reply With Quote
Old 08-04-2010, 01:39 PM   #43
nyrath
Addict
nyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfolded
 
nyrath's Avatar
 
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
Lady Fitzgerald, what is the average file size of your PDF ebooks?
nyrath is offline   Reply With Quote
Old 08-04-2010, 10:42 PM   #44
Lady Fitzgerald
Wizard
Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.
 
Lady Fitzgerald's Avatar
 
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
Huge. Off the top of my head, I would say 15MB. Granted, that is much larger than typical e-books however, I'm estimating that, once I finish scanning my p-book library, my "e-books" will only occupy 20-30GB on the 1T drive in my desktop computer (I still have 768GB free space on the drive). Even my mp3s (roughly the equivalent of 425 CDs) occupy 36GB (they are rather high quality rips). Harddrive space is cheap nowadays. Once I move the innards of my present computer to the new case I've been prepping, I'll have room for 5 more harddrives. With as much room as I have the potential of having more room than I'll ever use anytime soon, even after I start ripping my DVDs.

Obviously, no e-book reader is likely to be able to hold all my books but I don't need for them to. A 1GB card can hold roughly 25 books, more than enough to keep me busy for months since I will read from a reader only when away from the house. I use my 32" TV screen to read from when at home (a wireless mouse makes a fairly decent remote). I had been reading my p-books before cutting them up but I'm finding reading from the computer and e-book reader (currently, the JBL is the only one working) to be so convenient, I'll probably chop and scan the next one in my unread stack and read it from my TV or reader (yes, that means reading two books at the same time; doesn't bother me).
Lady Fitzgerald is offline   Reply With Quote
Old 08-05-2010, 12:07 AM   #45
Mr. Dalliard
Zealot
Mr. Dalliard began at the beginning.
 
Posts: 143
Karma: 35
Join Date: Jan 2009
Location: Osaka, Japan
Device: Kindle 3
If you are prepared to rip the spine off the book, your task will be a lot easier, otherwise it is a lot of work.

It is far from being impossible though.
Mr. Dalliard is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Digitize your own books: The Book Ripper Project anurag News 1 07-23-2009 04:22 PM
Bookshelf reduction: To digitize or not to digitize vivaldirules Lounge 15 12-06-2007 07:00 PM
how to digitize books user Workshop 13 10-05-2007 05:07 PM
Digitize a paper book in 15 minutes! spinoza Sony Reader 17 11-09-2006 12:56 PM
How to digitize a million books Bob Russell Workshop 0 03-01-2006 06:10 PM


All times are GMT -4. The time now is 10:51 PM.


MobileRead.com is a privately owned, operated and funded community.