Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 11-10-2006, 10:30 AM   #1
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Yet Another Gutenberg Book converter

Following my odyssey to find a near perfect Gutenberg automatic conversion method for the Sony Reader...

This program gutlrf.pl (a Perl script) which, along with libprs500 (that library does the hard work), converts Gutenberg HTML books into BBeB LRF (Sony's eBook format) books for the Sony Reader. The process is designed so that no manual editing of the HTML files is required, it even downloads the files for you. It can also convert text based Gutenberg books with the help of Gutenmark.

The gutlrf.pl script will retrieve, extract, clean, extract the author and book title and then call HTML2LRF (which does the hard work) to convert into an BBeB LRF file with support for Markup, Images and Contents. It tries to put new chapters onto new pages - which is usually based on the H2 HTML tag. Sometimes the Gutenberg ZIP files don't always contain the correct directory structure, gutlrf.pl will automatically fix this.

There's full instructions inside the ZIP file.

Download it from here

Last edited by FangornUK; 05-29-2007 at 12:44 PM.
FangornUK is offline   Reply With Quote
Old 11-10-2006, 11:31 AM   #2
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Here's a sample Gutenberg HTML book (19695) converted using these tools.
Attached Files
File Type: zip Fortyone Thieves.zip (182.4 KB, 1416 views)
FangornUK is offline   Reply With Quote
Advert
Old 11-10-2006, 11:36 AM   #3
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Also here's a Gutenberg text book (number 36) that was run through Gutenmark to generate a HTML file and then run through my scripts (only splitbook.pl).
Attached Files
File Type: zip War of the Worlds.zip (333.2 KB, 1103 views)

Last edited by FangornUK; 11-10-2006 at 04:53 PM.
FangornUK is offline   Reply With Quote
Old 11-10-2006, 11:42 AM   #4
porkupan
Fanatic
porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.
 
porkupan's Avatar
 
Posts: 556
Karma: 1057213
Join Date: Sep 2006
Location: North Eastern U.S.
Device: Sony Reader
The same question that I asked igorsk, and never got the answer: Is html2lrf able to process HTML books larger than about 600KB? I had pretty good luck with smaller books, but the larger ones seemed to crash html2lrf.
porkupan is offline   Reply With Quote
Old 11-10-2006, 12:31 PM   #5
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
As a workaround try splitting the HTML into several files. All the HTML parsing code is inside LBParser.dll, I don't have anything to do with it
igorsk is offline   Reply With Quote
Advert
Old 11-10-2006, 03:47 PM   #6
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Quote:
Originally Posted by porkupan
The same question that I asked igorsk, and never got the answer: Is html2lrf able to process HTML books larger than about 600KB? I had pretty good luck with smaller books, but the larger ones seemed to crash html2lrf.
Never noticed it before but I tried a 1.4MBytes HTML file (Gutenberg etext 19725) and it did crash with html2lrf. Ran it through my scripts, which splits it down into separate chapter files, and it converted fine with html2lrf.
FangornUK is offline   Reply With Quote
Old 11-10-2006, 04:34 PM   #7
porkupan
Fanatic
porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.porkupan ought to be getting tired of karma fortunes by now.
 
porkupan's Avatar
 
Posts: 556
Karma: 1057213
Join Date: Sep 2006
Location: North Eastern U.S.
Device: Sony Reader
All right, great! Is it possible then to modify your script so it takes a direct link to the HTML, and not just a ZIP file? Shouldn't be more than a couple of lines of code to check the extension of the URL/file, and if it is not ZIP, treat it as already unpacked archive? Thanks!

UPD: I see that the gutlrf.pl actually only gets and unzips the HTML, then I have to use the splitbook.pl to split it and feed it to the html2lrf. Sounds like everything is already there, no new changes required.

Last edited by porkupan; 11-10-2006 at 04:44 PM.
porkupan is offline   Reply With Quote
Old 11-11-2006, 10:30 AM   #8
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
I've just added support to gutlrf.pl to support specifying a ZIP file (for an already downloaded Gutenberg file). gutlrf.pl does more than just download and unzip the files, it cleans the Gutenberg file (also adds title and author) in preparation for splitbook.pl & HTML2LRF.

Last edited by FangornUK; 11-13-2006 at 06:39 AM.
FangornUK is offline   Reply With Quote
Old 11-13-2006, 12:42 PM   #9
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Just a hint for HTML files that crash HTML2LRF. If you have a file that doesn't look too big but still crashes HTML2LRF try running it through Tidy
FangornUK is offline   Reply With Quote
Old 11-17-2006, 01:50 PM   #10
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Update: Added support for an already unzipped Gutenberg HTML book.
Added option to gutlrf.pl to get it to automatically run splitbook.pl
Strip leading spaces from author and title.
FangornUK is offline   Reply With Quote
Old 11-18-2006, 08:25 AM   #11
diabloNL
Gadget Force®
diabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with others
 
diabloNL's Avatar
 
Posts: 705
Karma: 2733
Join Date: Jun 2006
Location: The Netherlands
Device: Sony PRS-300 + Cybook with funny screen :P
I tried it but I get this with different files around 800Kb:
Attached Thumbnails
Click image for larger version

Name:	Image1.jpg
Views:	1123
Size:	41.8 KB
ID:	2214  
diabloNL is offline   Reply With Quote
Old 11-18-2006, 02:34 PM   #12
susall
Member
susall began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Jun 2005
I tried debugging in Visual Studio 2005 the source code, but the project fails on a call to CreateNewBook(). I don't have the PDB files to debug why CreateNewBook fails, so if anyone has any suggestions, please post.
susall is offline   Reply With Quote
Old 11-18-2006, 05:09 PM   #13
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Quote:
Originally Posted by diabloNL
I tried it but I get this with different files around 800Kb:
splitbook.pl didn't find anything to split the Chapters on so it was still one big file, by default it uses HTML tag <H2> to split.
FangornUK is offline   Reply With Quote
Old 11-19-2006, 07:13 AM   #14
diabloNL
Gadget Force®
diabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with othersdiabloNL plays well with others
 
diabloNL's Avatar
 
Posts: 705
Karma: 2733
Join Date: Jun 2006
Location: The Netherlands
Device: Sony PRS-300 + Cybook with funny screen :P
Quote:
Originally Posted by FangornUK
splitbook.pl didn't find anything to split the Chapters on so it was still one big file, by default it uses HTML tag <H2> to split.

Thanks, I will give that a try!
diabloNL is offline   Reply With Quote
Old 11-30-2006, 03:04 PM   #15
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Some more updates: Option to pass chapter split from gutlrf to splitbook. Better Chapter name extraction. Some bug fixes.
FangornUK is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML from Project Gutenberg? Rcartes Sony Reader 10 04-21-2009 08:26 PM
html to bbeb converter ? bugsbunny14 Sony Reader 10 11-07-2008 11:50 PM
Book Processor - Anything to LRF and HTML converter LittleDragon Sony Reader 11 05-13-2008 05:31 PM
JafSoft AscToRTF - A GREAT Gutenberg Book/Ascii/RTF converter Prince Bertram Sony Reader 11 11-25-2006 07:29 AM
Mazarin - Gutenberg in HTML Alexander Turcic Deals and Resources (No Self-Promotion or Affiliate Links) 0 05-25-2004 04:11 AM


All times are GMT -4. The time now is 05:28 PM.


MobileRead.com is a privately owned, operated and funded community.