09-16-2009, 11:42 PM | #1 | ||
Storm Surge'n
Posts: 5,779
Karma: 8213195
Join Date: Nov 2008
Location: Lobster Capital
Device: S0ny PRS-300/350/505/700/T1
|
Google uses anti-fraud tool to help digitize books
Google acquires Carnegie Mellon's anti-fraud tool (Associated Press)
The next time you key in one of those skewed words to enter a website or complete an online transaction, you may be helping Google digitize a word in an eBook. Quote:
Quote:
|
||
09-17-2009, 04:28 AM | #2 |
Guru
Posts: 820
Karma: 11012
Join Date: Nov 2007
Location: Warsaw, Poland
Device: Bookeen Cybook
|
If it doesn't know what the answer is, how does it know if one passed the captcha test or not?
|
Advert | |
|
09-17-2009, 04:45 AM | #3 |
Groupie
Posts: 181
Karma: 2460
Join Date: Jul 2009
Device: Cybook
|
That's a good question...I think there is probability somewhere in the process, in comparing all the answers and taking the average or the most frequent but it doesn't answer your question totally...
|
09-17-2009, 07:17 AM | #4 |
Wizard
Posts: 1,244
Karma: 3439432
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
|
I would do this by merging a couple of images --- several known letters, plus a spliced in unknown letter or word. You're using the known letters for the ``captcha'' aspect, and collecting all responses on the unknown letter or word, determining what it is based on a weighted average of the responses.
William |
09-17-2009, 08:31 AM | #5 |
Groupie
Posts: 184
Karma: 300001
Join Date: May 2009
Device: 505
|
It uses one known and one unknown word. It assumes that if you typed the known word correctly you probably also typed the unknown word correctly. Also each word will be tested multiple times.
|
Advert | |
|
09-17-2009, 08:50 AM | #6 |
Exwyzeeologist
Posts: 535
Karma: 3261
Join Date: Jun 2009
Device: :PRS-505::iPod touch:
|
If I'm not mistaken, the ReCAPTCHA system was being used to digitize the back issues of the NYTimes, among other things. I wonder how Google's acquisition affects projects that have been using this for a while.
|
09-17-2009, 09:45 AM | #7 |
Wizard
Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
|
Which means that a savvy user would only need to type in the "known" word -- presumably the more obvious of the two - and type any old keystrokes for the other, and they would pass the test.
|
09-17-2009, 09:49 AM | #8 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
Google can have their internal employees do a 1000 of these to start with... ought to be trivial, given the number of employees they have. Then, after that, have unknown words (once identified the same way by X number of people) added to the pool of known words. - Ahi |
|
09-17-2009, 11:46 AM | #9 | |
Wizard
Posts: 1,101
Karma: 4388403
Join Date: Oct 2007
Device: Palm>Ebookman>IPaq>Axim>Cybook>Kndl2>IPAD>Kndl3SO>Voyager>Oasis
|
Quote:
The 'secondary' use is to decipher words that the OCR software has low certainty on. In this case it assumes the correct answer is the mode of the persons answers. You could put anything in if you are an anarchist and feel like spreading chaos, I guess. But why? |
|
09-17-2009, 11:50 AM | #10 | |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
Quote:
http://arstechnica.com/old/content/2...anuscripts.ars |
|
09-17-2009, 01:38 PM | #11 |
zeldinha zippy zeldissima
Posts: 27,827
Karma: 921169
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
|
i think reCAPTCHA is a brilliant initiative and i'm glad it's getting more attention. their slogan is "stop spam, read books" which seems like a worthy goal. plus hopefully this will improve google's ocr quality which has been frequently reported as variable at best. i do hope google won't divert it from previous projects though. i'm sure there are enough sites needing bot-protection to go around.
|
09-18-2009, 07:44 AM | #12 |
Wizard
Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
|
My reason for suggesting that 'a savvy user would only need to type in the "known" word' had nothing to do with enabling spam. It was more my way of getting a tiny measure of revenge against the growing nuisance of these devices.
But I didn't mean it to be taken seriously. |
09-18-2009, 10:26 AM | #13 | |
Wizard
Posts: 1,101
Karma: 4388403
Join Date: Oct 2007
Device: Palm>Ebookman>IPaq>Axim>Cybook>Kndl2>IPAD>Kndl3SO>Voyager>Oasis
|
Quote:
In fact, the Internet in general would be a much more pleasant experience if only we didn't have to guard against spam, viruses, adware and the like. It's amazing when you consider the aggravation tax that we all must bear because of the actions of a few graffiti spraying kids and a handful of sleezy 'business' people. |
|
09-18-2009, 02:30 PM | #14 |
Wizard
Posts: 1,479
Karma: 3846231
Join Date: Apr 2009
Location: Edinburgh, Scotland
Device: Kindle 3, Samsung Galaxy
|
I take your point, Emellaich. In fact, I don't object to the traditional Captcha mechanism. My irritation is directed against ReCaptcha in particular (and, by the way, this has been around for quite a while now, long before Google took an interest).
My point about ReCaptcha is that one of the two words is, by definition, "difficult". That's why it's there. So you have to use quite a lot more effort to negotiate the system than you would with the normal sort of Captcha (which is typically based on just four or five letters, displayed fairly clearly). This makes the whole experience that much more trying. Now, if that extra difficulty was directed to defeating Spam, I wouldn't object. But its main purpose is to help the company that promotes it to reduce their proof-reading costs. By all means, make your prospective customers work to give you their business, if you can show that it's in their interest, but not if it's just for the convenience of a third party. |
09-18-2009, 08:51 PM | #15 |
Wizard
Posts: 1,279
Karma: 1002683
Join Date: Nov 2008
Location: New York
Device: PRS-700
|
out of all the captchas I come up against. I prefer Recaptchas. a few of them I have to try 4 or 5 times or more to get cause they are stupidly difficult. re captchas are quick and easy cause they are actual words. the captchas system used at annual credit report .com for instance took me about 5 tries when I tried to do it a couple of weeks ago
I Prefer ReCaptchas first actually followed by Number only Captchas. I ussually get recaptchas the first time, and if I mess up, always on the second |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
I want to digitize my paper books. | llwwss | Workshop | 56 | 09-02-2010 03:49 AM |
Bookshelf reduction: To digitize or not to digitize | vivaldirules | Lounge | 15 | 12-06-2007 07:00 PM |
how to digitize books | user | Workshop | 13 | 10-05-2007 05:07 PM |
Why Dr. Eric Schmidt (Google CEO) may be wrong and right about click fraud | Bob Russell | Lounge | 0 | 07-09-2006 01:35 PM |
How to digitize a million books | Bob Russell | Workshop | 0 | 03-01-2006 06:10 PM |