11-07-2007, 04:46 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Jun 2007
Device: Sony Reader
|
Scanned books - a rant
I'm sorry if this is not constructive, but of all the people I can bother with this (and expect some sort of answer), it would be the readers of this forum.
I am reading "1984", the copy that came with the 500 Reader, and it has convinced me entirely that the people who supply Sony with books sometimes do so by scanning a pbook and little else. I was fairly sure of this fact a few weeks ago, but now my suspicions have been confirmed. "1984" has such spelling errors as "tune" instead of "time". Look at the two words and you'll see that it's fairly simple for a computer to confuse "im" for "un". Similar errors are found throughout, be it spacing, capitalization, or simply missing a period at the end of a sentence (the period is close a letter like 'k' or 'g', or something that has a "thingy" at the bottom right). My question (and partial rant) is this: why do the "publishers" need to scan a pbook to generate an e-book? Hasn't the industry switched to digital? Just go get the .txt file (or whatever) and port it to the Reader format. It's not hard. Why do they need to scan a book when, I'm assuming, there's a perfectly "functional" digital copy of the book somewhere. And also, why do the "publishers" not check the book? I'm not even saying that a human should read the digitized version (that would be asking too much off, wouldn't it?), but simply run the copy through Word or something. I've seen two instances in "1984" where the OCR software spat out numbers, so instead of "Winston", it read something like "Win-148-on". Argh! We're paying money for these books, not "1984", but others, which have been digitized in this fashion, like "Flatland", and the "publishers" don't even have the decency to use a good source or double-check their work?! That being said, I still love my Reader and will not stop carrying it around everywhere I go. I'll just continue to curse Sony and the half-wits at Rosetta Books any time I see an idiotic error that looks like an OCR screw-up. |
11-07-2007, 05:37 AM | #2 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
fuzzy said:
> My question (and partial rant) is this: > why do the "publishers" need to > scan a pbook to generate an e-book? > Hasn't the industry switched to digital? well, of course they've switched to digital. however, their workflow is still targeted at creating paper-books, not electronic-books. and creating paper-books is often piecemeal, in the sense that they will "run out a page" when they need to make a correction to it, and not use the file containing the whole book. indeed, they might not even _have_ one file which contains the entire book, since they might have had each chapter in its own file. indeed, because separate chapters might come in from the author at different times, and/or be sent out to different copy-editors, or desktop-publishers, they might not even be created in the same _program_. one might be done in quark, another in indesign, and another in ms-word. some might be sent to people who can do illustrations, while others don't need that. some pages will be done independently because they have illustrations which require color-seps... so it's usually just a big mish-mash. sometimes, it's a miracle that the book comes together at all. and yeah, that's a real stupid way to work, but things can be too fragmented to do otherwise... and the truth is _most_ publishers are like that. when they started their "look inside the book" program at amazon, they quickly discovered that scanning an actual physical copy was the best way to get their coherent digital version. sad but true... -bowerbird p.s. of course, this doesn't excuse the fact that it's just poor form to not even do a _spellcheck_ on your o.c.r. output. that's just sloppy work... |
Advert | |
|
11-07-2007, 06:22 AM | #3 |
Guru
Posts: 834
Karma: 102419
Join Date: Sep 2007
Location: Vienna, Austria
Device: iPhone
|
maybe they bought one of those "2000 ebooks" cds available on ebay and actually gave you illegal scans?
|
11-07-2007, 07:46 AM | #4 |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
I think that illegal version can be better proof read since as I understand it there is a system of how to produce the books and a lot of man hour is spent on a book.
|
11-07-2007, 09:57 AM | #5 |
Groupie
Posts: 179
Karma: 28
Join Date: May 2007
Device: Sony PRS-505
|
With the public domain books, there isn't necessarily a file that exists for doing a simple copy/paste. Sure, many large publishers who are also selling these titles have a digital file that they created and hand-groomed. But that file is their intellectual property (even though the source material is in the Public Domain). They have no obligation to share their handy work.
What you are seeing is Sony's fault. They obviously didn't put proper care into creating their $2 digital versions of public domain books. |
Advert | |
|
11-07-2007, 10:21 AM | #6 |
Resident Curmudgeon
Posts: 76,038
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Check the PG version and see if it has the same mistakes.
|
11-07-2007, 10:45 AM | #7 |
Reborn Paper User
Posts: 8,616
Karma: 15446734
Join Date: May 2006
Location: Que Nada
Device: iPhone8, iPad Air
|
I agree that a classic to be resold should be manually retyped not just OCR'd. They could this way justify having a price on it.
|
11-07-2007, 11:08 AM | #8 | |
Wizard
Posts: 2,999
Karma: 300001
Join Date: Jan 2007
Location: Citrus Heights, California
Device: TWO Kindle 2s, one each Bookeen Cybook Gen3, Sony PRS-500, Axim X51V
|
Quote:
Nothing could be further from the truth. The big publishing firms are so wedded to their 'traditional' methods that many of them still have word-processors who's whole job is to convert the submitted (*IF* they bother to accept electronic submissions) electronic manuscripts into paper and thence typed back into Quark. And does everyone understand that once the MMPB, TPB or HC layout has been finalized, the galleys have been approved by the author and the print runs set up, those Quark files are trashed!?! Yep. That's right. That means many a popular HC title must be *RE*-entered into Quark to be reformatted to MMPB or TPB format for the *next* release! There is no reason for this, but it's 'just the way things are done' for most major publishers - and even some of the newer ones. Back in the late 90s I worked for Prima Publishing - known for being cutting-edge and with a lot more 'automation' of the publishing process. You guessed it, they ran it just like the 'big boys' and got rid of their Quark-format files after they'd finished setting up the print runs. If they wanted another format or wanted to come back to a title after a few years, they had to start the whole process over again! Derek |
|
11-07-2007, 12:10 PM | #9 |
When's Doughnut Day?
Posts: 10,059
Karma: 13675475
Join Date: Jul 2007
Location: Houston, TX, US
Device: Sony PRS-505, iPad
|
Sigh. Reminds me of the good old days of cuneiform impressions in clay tablets. Are the digital copies discarded because they're too difficult to keep and manage or is this a practice designed to protect employment? Sorry if I'm cynical.
|
11-07-2007, 12:28 PM | #10 | |
Grand Sorcerer
Posts: 8,478
Karma: 5171130
Join Date: Jan 2006
Device: none
|
Quote:
Maybe, as demand for digital copies grows, any companies that create printed books will think twice about discarding electronic files, and might even set up a coherent system to pass those files on to e-book producing departments or sell them to e-book vendors. Sony, for one, ought to be in every publisher's offices suggesting exactly this, though it isn't likely to help the situation with public domain works. And I agree, it was Sony's job to vet those files. Their lack of dedication to the tasks required to sell proper e-books isn't encouraging. |
|
11-07-2007, 12:29 PM | #11 | |
Wizard
Posts: 2,999
Karma: 300001
Join Date: Jan 2007
Location: Citrus Heights, California
Device: TWO Kindle 2s, one each Bookeen Cybook Gen3, Sony PRS-500, Axim X51V
|
Quote:
Derek |
|
11-07-2007, 12:31 PM | #12 | |
Wizard
Posts: 2,999
Karma: 300001
Join Date: Jan 2007
Location: Citrus Heights, California
Device: TWO Kindle 2s, one each Bookeen Cybook Gen3, Sony PRS-500, Axim X51V
|
Quote:
Derek |
|
11-07-2007, 12:55 PM | #13 | |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
Quote:
Many years ago I did work for a small publisher and I did keep in touch over the years. Early on they did move to digital for all phases of their work. They kept one digital master that was used for their "print on demand" publishing to reduce their inventory. They also used the same master as the base input to their RocketBook offerings. (They were one of the first to offer their books in the RB format.) They are still around but have not ventured into the newer ebook formats and seem to be winding down the business. |
|
11-07-2007, 01:46 PM | #14 |
Reborn Paper User
Posts: 8,616
Karma: 15446734
Join Date: May 2006
Location: Que Nada
Device: iPhone8, iPad Air
|
Hey I'd love to have a copy of the OCR for those!
|
11-07-2007, 02:27 PM | #15 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
vivaldirules said:
> Are the digital copies discarded because > they're too difficult to keep and manage or > is this a practice designed to protect employment? um, neither. frankly, it's just because they're stupid. but, still, even if they have been smart enough to keep the files, since "the book" is spread out over dozens of files, many of them in different formats, it can get maddeningly unclear which of them are the "latest" ones -- especially if _none_ seem to be exactly what actually appeared in hard-copy (because what got printed was cut-and-paste) -- so even a straightforward "reassembly" is diffcult, and it's often less time-consuming in the long run to start over from scratch. and yes, that's stupid -- massively so -- but that's how **it happens... we look at "the book" and see it as a coherent entity. but often, from a producer's standpoint, it is a chaotic collection of bits and pieces that are thrown together... so, actually, a good analogy is a _movie_. with a film, we know it wasn't one seamless shot. they did scenes out of order, and sent some segments out for f.x., and had wire-removal done on others, so it's a mish-mash. -bowerbird |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Small scanned books | Paul Moews | iRex | 22 | 02-05-2009 05:58 PM |
Ok I have scanned pdf books....but | DeathtoToasters | Sony Reader | 38 | 11-04-2008 07:51 PM |
A small rant about pdf books | charlieperry | News | 37 | 10-15-2008 12:42 AM |
Howard Hendrix rant against cc free e-books | Liviu_5 | News | 3 | 04-14-2007 10:49 AM |
Rant: Are Books Too Long? | Colin Dunstan | Lounge | 23 | 10-17-2006 09:46 AM |