11-10-2006, 10:30 AM | #1 |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Yet Another Gutenberg Book converter
Following my odyssey to find a near perfect Gutenberg automatic conversion method for the Sony Reader...
This program gutlrf.pl (a Perl script) which, along with libprs500 (that library does the hard work), converts Gutenberg HTML books into BBeB LRF (Sony's eBook format) books for the Sony Reader. The process is designed so that no manual editing of the HTML files is required, it even downloads the files for you. It can also convert text based Gutenberg books with the help of Gutenmark. The gutlrf.pl script will retrieve, extract, clean, extract the author and book title and then call HTML2LRF (which does the hard work) to convert into an BBeB LRF file with support for Markup, Images and Contents. It tries to put new chapters onto new pages - which is usually based on the H2 HTML tag. Sometimes the Gutenberg ZIP files don't always contain the correct directory structure, gutlrf.pl will automatically fix this. There's full instructions inside the ZIP file. Download it from here Last edited by FangornUK; 05-29-2007 at 12:44 PM. |
11-10-2006, 11:31 AM | #2 |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Here's a sample Gutenberg HTML book (19695) converted using these tools.
|
Advert | |
|
11-10-2006, 11:36 AM | #3 |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Also here's a Gutenberg text book (number 36) that was run through Gutenmark to generate a HTML file and then run through my scripts (only splitbook.pl).
Last edited by FangornUK; 11-10-2006 at 04:53 PM. |
11-10-2006, 11:42 AM | #4 |
Fanatic
Posts: 556
Karma: 1057213
Join Date: Sep 2006
Location: North Eastern U.S.
Device: Sony Reader
|
The same question that I asked igorsk, and never got the answer: Is html2lrf able to process HTML books larger than about 600KB? I had pretty good luck with smaller books, but the larger ones seemed to crash html2lrf.
|
11-10-2006, 12:31 PM | #5 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
As a workaround try splitting the HTML into several files. All the HTML parsing code is inside LBParser.dll, I don't have anything to do with it
|
Advert | |
|
11-10-2006, 03:47 PM | #6 | |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Quote:
|
|
11-10-2006, 04:34 PM | #7 |
Fanatic
Posts: 556
Karma: 1057213
Join Date: Sep 2006
Location: North Eastern U.S.
Device: Sony Reader
|
All right, great! Is it possible then to modify your script so it takes a direct link to the HTML, and not just a ZIP file? Shouldn't be more than a couple of lines of code to check the extension of the URL/file, and if it is not ZIP, treat it as already unpacked archive? Thanks!
UPD: I see that the gutlrf.pl actually only gets and unzips the HTML, then I have to use the splitbook.pl to split it and feed it to the html2lrf. Sounds like everything is already there, no new changes required. Last edited by porkupan; 11-10-2006 at 04:44 PM. |
11-11-2006, 10:30 AM | #8 |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
I've just added support to gutlrf.pl to support specifying a ZIP file (for an already downloaded Gutenberg file). gutlrf.pl does more than just download and unzip the files, it cleans the Gutenberg file (also adds title and author) in preparation for splitbook.pl & HTML2LRF.
Last edited by FangornUK; 11-13-2006 at 06:39 AM. |
11-17-2006, 01:50 PM | #10 |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Update: Added support for an already unzipped Gutenberg HTML book.
Added option to gutlrf.pl to get it to automatically run splitbook.pl Strip leading spaces from author and title. |
11-18-2006, 08:25 AM | #11 |
Gadget Force®
Posts: 705
Karma: 2733
Join Date: Jun 2006
Location: The Netherlands
Device: Sony PRS-300 + Cybook with funny screen :P
|
I tried it but I get this with different files around 800Kb:
|
11-18-2006, 02:34 PM | #12 |
Member
Posts: 14
Karma: 10
Join Date: Jun 2005
|
I tried debugging in Visual Studio 2005 the source code, but the project fails on a call to CreateNewBook(). I don't have the PDB files to debug why CreateNewBook fails, so if anyone has any suggestions, please post.
|
11-18-2006, 05:09 PM | #13 | |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Quote:
|
|
11-19-2006, 07:13 AM | #14 | |
Gadget Force®
Posts: 705
Karma: 2733
Join Date: Jun 2006
Location: The Netherlands
Device: Sony PRS-300 + Cybook with funny screen :P
|
Quote:
Thanks, I will give that a try! |
|
11-30-2006, 03:04 PM | #15 |
Addict
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
|
Some more updates: Option to pass chapter split from gutlrf to splitbook. Better Chapter name extraction. Some bug fixes.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML from Project Gutenberg? | Rcartes | Sony Reader | 10 | 04-21-2009 08:26 PM |
html to bbeb converter ? | bugsbunny14 | Sony Reader | 10 | 11-07-2008 11:50 PM |
Book Processor - Anything to LRF and HTML converter | LittleDragon | Sony Reader | 11 | 05-13-2008 05:31 PM |
JafSoft AscToRTF - A GREAT Gutenberg Book/Ascii/RTF converter | Prince Bertram | Sony Reader | 11 | 11-25-2006 07:29 AM |
Mazarin - Gutenberg in HTML | Alexander Turcic | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 05-25-2004 04:11 AM |