|
|
Thread Tools | Search this Thread |
07-19-2008, 09:14 PM | #1 |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
Automated dump of Gutenberg in torrent format, 11,000+ books in 7.4 Gb
Hi all,
I stumbled over FangornUK's Gutenberg2lrf converter a while ago, and I was really happy with the end-results - proper chapters, images, text formatting. So I automated it to pull in and convert most of the Gutenberg Project's HTML files (and of course TXT when in a pinch). Then I loaded the whole mess into Calibre, cleaned it up a little, stripped various magazines and journals and multi-volume histories, added a sprinkling of creative commons SF, and exported it before publishing it as a torrent. Over 11,000 books and some 7,4 Gb in total, various languages, and as far as I know, legal to download in most countries (in a nutshell, if reading Gutenberg books is legal for you, this torrent should be legal too). I might make a second torrent to pick up the books I missed. LINKS DELETED. I checked the forums and this seemed to be the most appropriate place for this post, although technically not an upload. Apologies if I should have posted this somewhere else. |
07-19-2008, 10:36 PM | #2 |
creator of calibre
Posts: 44,391
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Interesting, as far as I know you're now the largest scale user of calibre. How did it perform with that many books?
|
Advert | |
|
07-20-2008, 12:08 AM | #3 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Wow that is a huge effort. I think you might want to contact Project Gutenberg and see if they would like to host the individual files. They are broadening their supported formats.
Dale |
07-20-2008, 06:39 AM | #4 | |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
Quote:
I'm on a decent core duo system with around 2 Gb of RAM. Opening, working with, and reading works fine, as does editing the metadata of a single file. Sorting by name/author/date and editing the metadata of a group (which causes a re-sort I think) causes the program to 'hang' for about 1 to 2 minutes. Other than that it is very stable. I've have experienced no real crashes with this load. The torrent is alive! I just need some people to step up and start peering/seeding @Dale: it was not such a big effort, couple of days to pull everything in and a couple of days to sort and select and rename. I could send them a ping I suppose. Last edited by acidzebra; 07-20-2008 at 06:48 AM. |
|
07-20-2008, 07:12 AM | #5 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Question: Did you go through and remove the items that are under copyright? A number of the translated works on PG are under copyright; you need the author's permission to distribute them in a torrent.
|
Advert | |
|
07-20-2008, 07:24 AM | #6 |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
When you say "translated works", do you mean "translated from" or "translated to" English? In the first case I'm okay, in the second I might have a lot of emails to write.
|
07-20-2008, 07:49 AM | #7 |
Fully Converged
Posts: 18,171
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
You may want to look at the official CD/DVD distribution from PG... they are released under a CC that allows sharing:
http://www.gutenberg.org/wiki/Gutenb...nd_DVD_Project My guess is that they've already sorted out those works that are not OK to be distributed. |
07-20-2008, 07:51 AM | #8 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Both to and from English. Any translated work has its own copyright that is completely separate from the original copyright. Even if the original is in the public domain, the translated work is not.
|
07-20-2008, 08:10 AM | #9 |
Wizard
Posts: 2,366
Karma: 12000
Join Date: Jan 2008
Location: Texas, USA
Device: Kindle; Sony PRS 505; Blackberry 8700C
|
The translated work may not be -- depending on the date when the translation was published.
|
07-20-2008, 08:27 AM | #10 |
Resident Curmudgeon
Posts: 76,019
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
acidzebra Can you please delete the torrent you've create and remake it without using RAR? For torrents, there is no need for RAR. Just torrent the books as is without RAR. Once you do that, I'll gladly jump on the torrent and help seed once I download it all. But as it stands, in order for me to do that, I'd need to waste twice the space. Once for the RAR files, and once for the unRARed eBooks. So please fix up this torrent. RAR inside a torrent is just a mess.
|
07-20-2008, 08:59 AM | #11 |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
With all due respect, there is nothing to "fix up".
Compressed with rar : 7.44 Gb Uncompressed on disk: 8,70 Gb Saved bandwidth per complete copy transferred: 1.26 Gb, and I hope to distribute a lot of copies. If you can't spare 8 Gb, then don't download it. For me, bandwidth is a scarcer resource than disk space right now. |
07-20-2008, 02:05 PM | #12 |
creator of calibre
Posts: 44,391
Karma: 23798586
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Glad to hear it. Yeah MySQL would have been faster but it would have made calibre much harder to distribute and deploy.
|
07-27-2008, 06:32 AM | #13 |
Fetch that LRF
Posts: 63
Karma: 44
Join Date: Jul 2008
Device: Sony Reader PRS-505, Treo 650
|
Really great 'distribution'! That makes it so much easier, I hate the original formatting on PG. I'm now downloading (at the snail-est pace) and will gladly seed it, like, a month from now when it's done
|
07-28-2008, 10:31 PM | #14 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jul 2008
Device: deciding between Sony & Kindle
|
Verifying the torrent
I finished downloading the 32 rar files and unfortunately one had a bad cluster which the Windows disk utility fixed (I think it was file 11). In addition files 31 and 32 appear to be empty. How many files are archived? How many bytes should file 32 be?
Thanks, Chuck |
07-29-2008, 01:44 AM | #15 |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
chuckSF, your disk drive has bad clusters - files don't. I just extracted everything and it works fine. It sounds like the disk utility fixed your drive by destroying some information.
The last file should be 59,1 Mb. I recommend you re-download at least the last two files and find out whether part 11 was indeed corrupted and re-download that, too. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
sharing books by email: automated send | alborn | Calibre | 3 | 09-25-2010 03:35 PM |
2,000,000 free e-books from the 4th Annual World eBook Fair | Sonist | Deals and Resources (No Self-Promotion or Affiliate Links) | 4 | 07-15-2009 11:31 PM |
convert to gutenberg format... upload to eucalyptus? | betovarg | Apple Devices | 0 | 05-31-2009 09:35 PM |
Best torrent websites for school books | PA1SA | News | 15 | 05-11-2009 03:51 PM |
Best way to dump list of owned books | jlc | Sony Reader | 3 | 01-07-2009 11:02 AM |