12-18-2008, 12:35 AM | #1 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Reverse-engineering the .IMP format
A primer on the .IMP specification has never been published, but a very detailed explanation of the .IMP file format can be found here. It was reverse-engineered by Jeffrey Kraus-yao back in 2002. Jeffrey indicated that how he reversed engineered the .imp format was by building, with eBook Publisher, .oeb test ebooks. Then he would drop that .oeb onto a desktop shortcut of the imp viewer.exe. Then while the viewer was still running, he would examine his temp folder and noticed one of four .RES folders was complete. He wrote down the changed bits and then repeatedly made small changes to the .oeb ebook. He started this even though he didn't own the REB1200 yet.
It was quite the accomplishment! Now please realize that back then only the REB1200 used the .IMP format as there weren't a lot of GEB1150's (predecessor to the EBW1150) in 2002/3. Oh, how the tables have turned as now for every REB1200 in use there are tens or hundreds more EBW1150's! Jeffrey's website is a great start to understanding the .IMP file format, but lacks such basic information as:
I would like to herein build a knowledge-base for the "definitive" understanding of the .IMP file format. As others have already expressed to me their own foray into .imp "nuts & bolts" investigations, I propose to start off this knowledge-base with my preliminary findings written as a Perl script. That script is imp_dump.pl (along with it's required support files) and can be used to exploded any un-encrypted .imp ebook into it's (decompressed) text and images components. Now, take note, that I said text and images NOT .html and images. The original html is not stored in the .imp file. Only the basic components are, like a record that tells you where all the font/styles changes are located in the file, another record indicates where to end the line so that it doesn't spill over the screen size of that .imp and other records that stores the images, hyperlinks used, etc. Basically all the building blocks are there (scattered) and we require those components to be re-assembled somehow into a .html! BTW, release v4.0 of EBook-Tools should have basic .imp support for .html generation with image linking, but will initially lack table/hyperlink/styles support. Those are planned for future releases. I plan to collect postings from this thread and compile a wiki page with the relevant parts of the .IMP file format specification as reverse-engineered by ALL of us! Below are all the .RES filetypes that exist (thus far) and volunteers can pick the un-documented .RES filetypes on a "first come, first serve" basis. Code:
.IMP file comprises these groups of .RES filetypes: text page_line page_header_footer links misc_info formatting tables images markups form_data where: text: '!!cm' '!!ky' DATA.FRK - decompressor written in Perl, C and soon to be C#. page_line: 'BPgz' 'BPgZ' 'ImRn' - written in Perl (see imp_dump.pl) 'Pcz0' - written in Perl (see imp_dump.pl) 'PcZ0' - written in Perl (see imp_dump.pl) 'Pcz1' - written in Perl (see imp_dump.pl) 'PcZ1' - written in Perl (see imp_dump.pl) page_header_footer: 'HfPz' 'HfPZ' links: 'AncT' - written in Perl (see imp_dump.pl) 'AnTg' 'Lnks' 'eLnk' misc_info: 'Batr' 'Binf' 'BGcl' - written in Perl (see imp_dump.pl) 'BPos' 'Clos' 'Devm' - written in Perl (see imp_dump.pl) 'Dict' 'FRgs' 'Glos' 'MASK' 'Mrgn' - written in Perl (see imp_dump.pl) 'Hyp2' 'Hyph' 'Offs' 'pInf' - written in Perl (see imp_dump.pl) 'Pc31' 'PPic' - written in Perl (see imp_dump.pl) 'SKtb' 'SMnu' 'stbd' '!!sw' - written in Perl (see imp_dump.pl) formatting: 'ESts' - written in Perl (see imp_dump.pl) 'HRle' 'Styl' 'StRn' - written in Perl (see imp_dump.pl) 'StR#' 'StR2' tables: 'Tabl' 'TCel' 'TRow' images: 'GIF ' - written in Perl (see imp_dump.pl) 'JPEG' - written in Perl (see imp_dump.pl) 'PIC2' - written in Perl (see imp_dump.pl) 'PICT' - written in Perl (see imp_dump.pl) 'PNG ' - written in Perl (see imp_dump.pl) markups: 'MRPs' 'Ano2' 'Hlts' 'BTok' 'BMks' form_data: 'TGNt' 'Form' 'FItm' 'FIDt' 'FrDt' What you'll need is a test .imp, a good binary/hex editor (I use XVi32 Edit) and a lot of elbow grease and desire. Post here what you find out and I'll update the "un-documented" list above to reflect that! Thanks in advance! p.s. after unzipping the attachment, just place any and all your .imp files in the folder therein called 'place imp file here' and execute the 'extract imp files.bat'. Look at the generated file 'imp_dump.output.txt' for the parsing output info for all the .imp files placed in that folder. Then, look in that folder to see a directory for each .imp that will contain the compressed & decompressed text and any images. Have fun HEX-exploring! EDIT: 06Jan2009: added a compiled windows executable to a separate .zip Last edited by nrapallo; 01-06-2009 at 04:49 PM. Reason: added compiled windows executable to a separate .zip |
12-18-2008, 10:06 AM | #2 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
DATA.FRK and LZSS decompression
Please refer to my original posting of deimp.exe here for some background info.
One of the component records in the .RES directory when the .IMP file is exploded with unimp.exe is the DATA.FRK file. It contains the basic text used in the ebook and is the same for both the Color VGA (REB 1200) & Grayscale Half-VGA (EBW 1150) .IMP files. This DATA.FRK file is decompressed by deimp.exe if it was originally (LZSS) compressed, when created, along with control characters (see below) being substituted/expanded. DATA.FRK File Element text is extracted and placed in this file. Elements tags are replaced with control characters. This file can be compressed and encrypted with compression occuring before encryption. This file is compressed when the element <meta name="x-SBP-compress" content="on"/> is included in the <x-metadata> element of the package file. The compression algorithm used is LZSS. This file is encrypted when the element <meta name="x-SBP-encrypt" content="on"/> is included in the <x-metadata> element of the package file. The encryption algorithm used is DES. The 8 byte encryption key is in the SoftBook Edition Encryption Key File (.key) at offset 0x0C. Characters less than 0x20 are removed expect for line break which is replaced with 0x20. Mutliple 0x20 characters are replaced with a single 0x20. Control characters Code:
0x0A end of document, forced page break 0x0B start of element except < span > 0x0D line break element < br / > 0x0E start of table element < table > 0x0F image element < img / > 0x13 end of table cell < /td > tag 0x14 horizontal rule element < hr / > 0x15 before and after page header content 0x16 before and after page footer content In addition to those control characters above, characters to "substitute/convert" would be: Code:
HEX => Should be (actual char) 0x8E => "é" (i.e. "é"), 0xA0 => " ", (i.e. " "), 0xA5 => "•", (i.e. "•"), 0xA8 => "®", (i.e. "®"), 0xA9 => "©", (i.e. "©"), 0xAA => "™", (i.e. "™"), 0xAE => "Æ", (i.e. "Æ"), 0xC7 => "«", (i.e. "«"), 0xC8 => "»", (i.e. "»"), 0xC9 => "…", (i.e. "…"), 0xD0 => "–", (i.e. "–"), 0xD1 => "—", (i.e. "—"), 0xD2 => "“", (i.e. "“"), 0xD3 => "”", (i.e. "”"), 0xD4 => "‘", (i.e. "‘"), 0xD5 => "’", (i.e. "’"), 0xE1 => "·", (i.e. "·"), p.s. as an exercise, would anyone want to try tweaking this code to allow the LZSS (re-)compression of text for use as the DATA.FRK in the .imp? Last edited by nrapallo; 12-18-2008 at 11:46 PM. Reason: added actual character to substituted characters |
Advert | |
|
12-18-2008, 06:51 PM | #3 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
attempt at reverse engineer imp with c#
Hello All,
With Nick's help, I was able to put together a simple de-imp program to pull the text out of the imp file. I appologize for the lack of documentation and plan to add that to the code soon. Please feel free to submit any changes to the code. I am working on a GUI front end so you can select a single file or folder and the program will de-imp the files to text and hopefully with some help to the file format of lrf (Sony). Thanks to all. -Michael |
12-18-2008, 10:38 PM | #4 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Nice to see you are already adding to our knowledge-base! Thanks again for sharing this! Last edited by nrapallo; 12-18-2008 at 11:35 PM. |
|
12-19-2008, 01:28 AM | #5 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
Reverse IMP GUI v.1.0
Nick,
I agree with you on the lack of something to look at besides the console. So here is a GUI version of my previous ConvertIMP. I know it is not much, but it is something to start with. I will try to put in aknowledgements in the next update to the program but till then I would like to thank Nick and Michael Dipperstein for their libraries that helped me produce the program. -Michael New addition to v.1.0.1 - Image Viewing New addition to v.1.0.2 - Image Viewing (including PNG, GIF, JPEG) New addition to v.1.1.0 - Editting Book Properties. --- I am removing this pending furthure testing. Sorry for the delay. Last edited by mscott161; 12-22-2008 at 12:49 PM. Reason: Update to program |
Advert | |
|
12-19-2008, 01:59 AM | #6 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
|
12-19-2008, 12:55 PM | #7 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
Metadata...
What would you like to edit?
--Michael |
12-19-2008, 01:44 PM | #8 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Where do I begin (as shouts can be heard from around the world from .imp users! ):
Per the .imp specs: Edit - Book properties start at offset 0x30 Yes* - ID: null terminated C string YES - Bookshelf Category: null terminated C string N/A - Subcategory: null terminated C string, not displayed on REB1200 YES - Title: null terminated C string N/A - Last name: null terminated C string N/A - Middle name: null terminated C string YES - First name: null terminated C string Note * = there is a way to auto-generate this ID.; N/A = not allowed; YES = allow edits Afterwards, the length of Book Properties (including 7 null's) needs to be updated so that BytesRemainingInHeader is set to length of Book Properties + 24! Also, it would be nice if the Name of .RES directory could be changed to the .imp filename (minus .ext) or even auto-generated to be 'Author-Title' as you used for the (decompressed) text filename. The DictionaryLength (length of directory name) in the 48 byte header would have to be updated then as well. (p.s. you called it Dictionary, but I think you meant Directory...) That's it for now... Check please! Last edited by nrapallo; 12-19-2008 at 01:48 PM. |
12-19-2008, 01:49 PM | #9 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
12-19-2008, 01:55 PM | #10 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
ConvertIMPGUI
I have updated the ConvertIMPGUI program to include viewing of the images in the IMP file. Please look at the attachments in a previous message for the update.
I will look into the category, author, and title changes. --Michael |
12-19-2008, 01:58 PM | #11 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Basically, the users choose the category (when editing the metadata) from a drop-box list that includes predefined categories plus user previoulsy used/defined ones. They do say that imitation is the greatest form of flattery... Just food for thought! |
|
12-19-2008, 02:24 PM | #12 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
Update to GUI
Sorry about the image viewing before I only included PNG. I have updated the code in the previous message (see new attachment to that message for v.1.0.2) which now include (PNG, GIF, and JPEG).
--Michael |
12-21-2008, 10:47 PM | #13 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
Editting Book Properties
Nick and Everyone,
I have put in book property editing in the release v.1.1.0 Hope you enjoy. I am all ears if something does not work correctly or ideas on GUI arrangement. If you have a specific idea on GUI send a screen shot of what you want it to look like and I will take a look. I updated the ConvertIMPGUI Post above with the new version. Happy Holidays --Michael |
12-22-2008, 09:47 AM | #14 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Caution and use common sense
Quote:
Please proceed with caution with these early releases! |
|
12-22-2008, 03:28 PM | #15 |
Enthusiast
Posts: 42
Karma: 370
Join Date: Dec 2008
Device: ebookwise, sony
|
New Thread for the Convert / Editing IMP GUI Program
I have started a new thread with Nicks help. I will post all release and changes to it.
-- Michael |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-500 reverse engineering the Reader USB protocol | kovidgoyal | Sony Reader Dev Corner | 196 | 10-31-2016 03:35 PM |
KDK Reverse Engineering - For Those who Cannot wait... | DairyKnight | Kindle Developer's Corner | 0 | 05-14-2010 01:29 AM |
Converting RB to IMP format | sputnik | Fictionwise eBookwise | 3 | 01-04-2010 03:53 PM |
Reverse engineering the Cybook hard-/software | srml | Gen3 Developer's Corner | 8 | 07-07-2008 05:27 PM |
Introduction to Reverse Engineering Software | Colin Dunstan | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 05-25-2004 12:31 PM |