11-26-2007, 06:49 AM | #16 |
Addict
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
|
Something that may be of help to you is the source to a program called "pdbshred". It can extract the HTML and image files from a Mobipocket and Peanut ebook. You can find the program and source (in C) to the program by Googling for "pdbshred source". I would post a direct link, but because of some additional functionality in the program, some here may not like a direct link.
A similar program is called "makedoc", but it doesn't extract the images. |
11-26-2007, 08:45 AM | #17 |
Cynic
Posts: 86
Karma: 514
Join Date: Jul 2007
Location: Edinburgh, Scotland
Device: Lots, started with a Psion 3 circa 1998
|
Looks extremely cute ...
Are you planning on packaging this and sticking it on CPAN when it's stable? |
Advert | |
|
11-26-2007, 09:48 AM | #18 |
Member
Posts: 20
Karma: 65
Join Date: Nov 2007
Device: Amazon Kindle
|
Well done Tompe! Looks like you beat me to the punch
Would you mind if I bolted an XSL backend onto your code, effectively making it "xml2html2mobi?" It's a mouthful, but would be quite useful. God I love Perl. |
11-26-2007, 10:25 AM | #19 | |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
I will release the html2mobi script here in a day or two so I can get some feedback. I have to fix one serious bug and write some documentation. |
|
11-26-2007, 10:32 AM | #20 | |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
I should probable write some packages to make it easier to do a xml2html2mobi. I wanted to have just one file to make it easier to use but maybe I should just split it up and submit it to CPAN. That can be the next step after it works and is tested more. I used XML::Parser::Lite::Tree to parse the opf file but I am not sure this was a good idea. Do you know of any better library for opf files or for XML? I really liked HTML::Element and HTML::TreeBuilder so something similar for XML would be nice. Or a specific opf file library. |
|
Advert | |
|
11-26-2007, 02:42 PM | #21 |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
I have a problem. My converter generates a mobi file that is not entirely correct. It works perfect in FBreader. On my Gen3 it works but the number of pages is 650 when it should be arount 25. There are a lot of empty pages in the end. My Palm T5 refuses to load the file and says corrupt database 0x0209 (2).
What I wondered is if this is a problem with the Palmdoc things or if it is a problem with the html that i packed in the Palmdoc format? I can have forgotten to set some parameter in the Palm::PDB package but I tested to load a working mobi file and than replacing the text and it did not work. Ideas? |
11-26-2007, 05:01 PM | #22 |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
I realised that I had not written any Mobipocket header in record 0 at all and I was fooled by it working so well with FBReader. Were there any specification of the data that should be in record 0 anywhere? I have googled for it but can not find it.
|
11-26-2007, 06:07 PM | #23 | |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
No spec. A few fields are documented in pdbshred but they're probably not what you need. I'm working on a more or less complete doc but here's what you should be able to get away with:
Quote:
|
|
11-26-2007, 08:09 PM | #24 | |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
When I unpacked a mobi file I saw three records after the last image and they have size 36, 52 and 4. What are these? One contained the string FLIS and one the string FCIS. Maybe the end of the document is not detected becasue I have not written these records. |
|
11-26-2007, 08:14 PM | #25 |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
How long must the MOBI header be?
At position 0xF4 I see the string EXTH and after that follows some strings that indicates that the author and titlte are stored there. Does this belong to the header? |
11-26-2007, 09:24 PM | #26 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
I beleive FCIS and FLIS have something to do with dictionary indices. Do you set the unpacked size and number of records in Palmdoc header correctly?
|
11-26-2007, 09:36 PM | #27 | |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
The number of records are correct because I tried to include the image records in that number but then FBReader started to display garbage after the end of the text. I will double check the unpacked size. I have not set this pointer to first image either. Now I have got the strange phenomen that the images in FBReader is correct but on my Gen3 they seem to be shifted. The "library" image seems to work. I just put it in the last record and it was displayed correctly on the Gen3. The change I did was that I set the record "id" to an increasing number for the text content instead of using 0. Well, it moves forward. Hopefully I will fix the problem with the size and the image order soon so I have a first alpha version of the scripts. |
|
11-26-2007, 09:53 PM | #28 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
The "number of records" in palmdoc header (Word at 0x8) needs to be set to the number of records containing only text (no pictures). E.g. if you have compressed text in records 1,2 and 3, then set it to 3. The uncompressed size (dword at 4) has to be the full uncompressed size of all text.
By the way, I was wrong. Mobi format 3 needs MOBI header to be 0x74 bytes long, not 0x18. The fields are mostly irrelevant except for the number of the first record with images I mentioned above (at 0x5C). There are also DATP records that contain mapping from uncompresed offset to record numbers but I didn't figure out their format yet and not sure if they're mandatory... |
11-26-2007, 10:23 PM | #29 |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Got it nearly to work on my Gen3 when I extende the MOBI header. The only problem is now that the title says "libc-2.3.6" and the header information is wrong...
Strangely enough the library image works without me including it. Maybe it takes the first record with an image and uses this. # 4 DWord dwSize //including first two fields (put 0x18 here) If I put 0x18 here it does not work. If I put 0xE4 here as in my example document then it works but the title did not work. So what does this number mean? Last edited by tompe; 11-26-2007 at 10:25 PM. |
11-26-2007, 10:57 PM | #30 | |
Grand Sorcerer
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
Maybe I should try to set this pointer to 0 and see if that means that this block does not exist. |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
html2mobi - html formatting | brunovg | Kindle Formats | 2 | 12-13-2009 06:56 AM |
Old Version Mobigen needed | wilko10 | Kindle Formats | 11 | 11-25-2008 09:10 PM |
Does someone still have Mobigen 6.01 build 37? | IceHand | Kindle Formats | 7 | 03-03-2008 06:04 PM |
lit2mobi written in Perl working | tompe | Bookeen | 7 | 01-19-2008 02:06 PM |
MobiPocket TOC using mobigen | wallcraft | Reading and Management | 4 | 12-07-2007 10:45 AM |