|
|
Thread Tools | Search this Thread |
03-03-2009, 05:25 AM | #1 |
Zealot
Posts: 106
Karma: 450
Join Date: Feb 2009
Location: Abu Dhabi, United Arab Emirates
Device: Palm Centro, Acer Aspire One
|
Mass Batch conversion of HTML-Single-File ebooks to .mobi ebooks
Hi all,
First a warning: Looooooonnnnnggggg post ahead !!!!!!!!!!!!! As part of my current activities for migrating from Plucker to Mobipocket I was faced to mass convert approx. 500 ebooksfrom single-file-html format to mobipocket .mobi/.prc format. Actually, a lot of the ebooks were in text format, lit format and pdf format originally and were then converted for reading on a Nokia smartphone into text format some time back using tools ABC Amber Lit converter when appropriate. I did not simply want to drag-and-drop all the text files into the windows mobipocket reader as I want to have at least the title and author tags properly set. Dragging and dropping a bunch of files will not do that - quite the opposite: the file name will be the title of the resulting mobi ebook and the author will either left empty (if you are lucky) or set to some random value (if you are unlucky - depending on your circumstances). Now I tried to mass convert the text files with mobiperl or mobigen instead but they proved unsuitable for direct conversion with either of those two tools. So I downloaded "Easy Text to HTML Converter" and batched my ~500 text files for conversion into HTML using said tool's default template. That was slow but steady - the job was finished ~ 24 hours later (with some other unrelated stuff like DVD burning going on the conversion machine). (see also below) That netted me my ~500 html ebooks now - so far, so good. At this point let me remark not to ever delete your original lit/pdf/any format source ebook files like I did in the past - you never know when you might need them again !!! And don't be cocksure about what can be deleted: I was .... For the html mass conversion I decided to write a script to achieve this goal. I started out with using mobiperl and ended up using the win32 executable of mobi2html with a single line "Windows Command processor" (cmd) which did converted my html to .mobi files just fine. The only problem was that almost every of the ebooks generated showed up in the list of the Mobipocket windows reader just fine but could not be opened resulting in a file corruption error message. The ebooks concerned were the files generated from the text to HTML conversion using "Easy Text to HTML Converter"'s default template. No twiddling would change this result - so I abandoned mobiperl because it has obvious problems with the shitty/complicated/whatever-it-is HTML generated by "Easy Text to HTML Converter"'s default template. I recommend for anybody to stay away from "Easy Text to HTML Converter"' based on my experience. My next approach for mass conversion was to use mobigen. But a opf project file is needed for every ebook to be generated if one wants the author and titles properly set .... I fired up Mobipocket Creator and converted a single HTML file to Mobipocket and looked at the resulting .opf file: To my surprise it was simply XML serialized in a single line text file ... tadaaa. Now I knew that I was almost home free if mobigen could handle the "Easy Text to HTML Converter" output. I ran mobigen on the opf file generated by Mobipocket Creator and the result was to my delight a "rather usable" Mobipocket ebook which worked in the Mobipocket Windows Reader. I then wrote a Visual Basic Script for generating appropriate opf files and running mobigen for the conversion. So this is what I did in the directory where my ebook html files are stored: (0) Change all file extensions .htm to .html. You can use Code:
LUPAS Rename 2000 (1) Preparation of the HTML files' file names: (This is an optional step) I used "LUPAS Rename 2000" to clean up the file names of my HTML files. This step included for me replacing "_" with white space, replacing sequences of two or more white spaces with a single white space and removing angular brackets in the file names. The result of a this are a bunch of files having file names of the form Code:
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>.html Code:
%1 Code:
%2 Code:
%3 (2) Manual creation of a list of ebooks to be converted having the name Code:
00-booklist.txt Code:
dir /B /O:GNE *.html > 00-booklist.txt notepad 00-booklist.txt This will result in a file 00-booklist.txt where each line contains on ebook entry of the form Code:
<Author's last name>, <Author's first names>[, <Author's titles] - <Title> (3) Make sure Code:
mobigen.exe Code:
Microsoft Windows Scripting Host Code:
Microsoft Windows Scripting Host Code:
5.6 Code:
5.7 (4) Make sure the files Code:
00-template.opf Code:
00-2mobi.vbs Code:
00-cover.jpg (5) In your ebook directory run: Code:
cscript 00-2mobi.vbs Here is the script Code:
00-2mobi.vbs Code:
REM 00-2mobi.vbs: Mass conversion of HTML Pages to Mobipocket REM Version 0.1/03-FEB-2009 REM Released under the respective current version of the GPL by cklammer Main() WScript.Quit 0 Sub Main() Const ForReading = 1 Const ForWriting = 2 Const ForAppending = 8 DIM booklistfile Dim book Dim bindestrich Dim author Dim title Dim opffile Dim opftemplate Dim opfcontent Dim opftemplatefile Dim opffilename Dim FSO Set FSO = CreateObject("Scripting.FileSystemObject") Dim oShell Set oShell = WScript.CreateObject ("WSCript.shell") Set opftemplatefile = FSO.OpenTextFile("00-template.opf", ForReading) opftemplate = opftemplatefile.Readline opftemplatefile.Close Set booklistfile = FSO.OpenTextFile("00-booklist.txt", ForReading) Do While (booklistfile.AtEndOfStream = False) book = booklistfile.Readline bindestrich = instr(book, " - ") if bindestrich = 0 or bindestrich = null then author = "Unknown" title = book else author = Trim(Left(book, bindestrich - 1)) title = Trim(Right(book, Len(book) - bindestrich - Len(" - ") + 1)) end if opfcontent = replace(opftemplate, "%1", title) opfcontent = replace(opfcontent, "%2", author) opfcontent = replace(opfcontent, "%3", book & ".html") opffilename = book & ".opf" Set opffile = FSO.CreateTextFile(opffilename, True) opffile.WriteLine(opfcontent) opffile.Close() oShell.run "mobigen " & """" & opffilename & """", 1, True Loop booklistfile.Close() Set FSO = Nothing Set oShell = Nothing End Sub Code:
00-2mobi.vbs Here is the opf template file Code:
00-template.opf Code:
<?xml version="1.0" encoding="utf-8"?><package unique-identifier="uid"><metadata><dc-metadata xmlns:dc="http://purl.org/metadata/dublin_core" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/"><dc:Title>%1</dc:Title><dc:Language>en</dc:Language><dc:Identifier id="uid">0FC99EFF4B</dc:Identifier><dc:Creator>%2</dc:Creator></dc-metadata><x-metadata><output encoding="Windows-1252"></output><EmbeddedCover>00-cover.jpg</EmbeddedCover></x-metadata></metadata><manifest><item id="item1" media-type="text/x-oeb1-document" href="%3"></item></manifest><spine><itemref idref="item1"/></spine><tours></tours><guide></guide></package> The source file for the example is Obama, Barack Hussein - Inaugural Presidential Address. Unpack the html file inside into your ebook document directory and rename it Code:
Obama, Barack Hussein - Inaugural Presidential Address.html Have fun and good luck, cklammer Last edited by cklammer; 03-03-2009 at 05:27 AM. Reason: I fucked up. not enough code tags |
03-03-2009, 05:59 AM | #2 |
Zealot
Posts: 106
Karma: 450
Join Date: Feb 2009
Location: Abu Dhabi, United Arab Emirates
Device: Palm Centro, Acer Aspire One
|
00-mob.zip contains all files.
Hi all,
I wrote my OP on a locked down machine without zip archive creation capability. Pls find now attached all files referred to in the OP attached as Code:
00-2mobi.zip cklammer |
Advert | |
|
03-26-2009, 10:21 AM | #3 |
book creator
Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
|
Hey good work. I am thinking about migrating my Aportis Doc files to Mobi. I think your approach might work there, too (although I would have to mass convert the pdbs to txt)
|
03-26-2009, 04:23 PM | #4 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Mobi2IMP will convert PalmDoc (Text/Read) .pdb ebooks and leaves behind the .HTML and .opf!!!
|
03-26-2009, 04:37 PM | #5 |
book creator
Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
|
|
Advert | |
|
03-27-2009, 02:35 PM | #6 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
BTW, this post explains how to get Mobi2IMP to convert many PalmDoc .pdb files in a directory, recursively.
You can use the supplied prc2imp.bat and edit it to include the /r at the beginning of the for statement or just use this line at the dos prompt: Code:
for /r %i in (*.pdb) do mobi2imp.exe --verbose "%i" "%~ni" |
03-27-2009, 07:56 PM | #7 |
Evangelist
Posts: 488
Karma: 258
Join Date: Mar 2009
Device: kindle
|
I fell asleep, I'm sorry :0
|
08-08-2009, 11:09 PM | #8 |
Enthusiast
Posts: 32
Karma: 2204
Join Date: Jul 2009
Device: none
|
Thank you for your helpful tips, Masters!
But, does anyone know how to make MobiGen run faster? I think if MobiGen uses RAM for storing temporary files, it will be much faster. |
11-18-2009, 01:10 AM | #9 |
Junior Member
Posts: 6
Karma: 10
Join Date: Nov 2009
Device: kindle
|
MORE HTML FILES TO SINGLE MOBI FILE
Hi
I have auto generated HTML Files (nearly 200) and I want to convert as a single MOBI File. I had tried using the Mobipocket Creator.But Only the partial content are Displayed.How to i generate the table of contents. Thanks Velu |
11-20-2009, 04:00 AM | #10 | |
Zealot
Posts: 106
Karma: 450
Join Date: Feb 2009
Location: Abu Dhabi, United Arab Emirates
Device: Palm Centro, Acer Aspire One
|
You need a separate HTML TOC
Quote:
This worked pretty well: Find your document in the TOC, jump to it, read it until you are done and then use the "Back" function in the Mobipocket Reader until you back in the TOC. Good Luck, cklammer P.S.: Don't hesitate to ask but keep in mind that am at GMT+4. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Several xhtml/html to a single epub file help. | clowe1028 | ePub | 3 | 03-21-2010 04:47 AM |
Building eBooks, what happens when HTML file changes? | Guido Henkel | Calibre | 2 | 02-09-2010 10:13 PM |
How To mass-convert ereader files into HTML and then into MOBI | GatorDeb | Kindle Formats | 2 | 12-18-2009 04:51 PM |
ebooks.adelaide Mobi Conversion Failures | ascherjim | Calibre | 16 | 07-14-2009 01:16 PM |
Batch conversion html to lrf | lilpretender | Sony Reader | 5 | 04-22-2008 10:22 PM |