01-14-2012, 01:18 PM | #31 |
Sigil Developer
Posts: 8,156
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Hi,
I ran "strings" on kindlegen and it appears to have the following option: -donotaddsource Has anyone tried the latest kindlegen to see if this works? Your file sizes should be a lot smaller. KevinH |
01-14-2012, 01:50 PM | #32 |
Grand Sorcerer
Posts: 28,038
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I think that might be a leftover from earlier versions. I still get an "unsupported argument" error when trying the -donotaddsource switch with the latest kindlegen.
|
Advert | |
|
01-17-2012, 01:41 PM | #33 |
Sigil Developer
Posts: 8,156
Karma: 5450818
Join Date: Nov 2009
Device: many
|
new version of a K8 aware kindlestrip program
Hi,
I modified kindlestrip_v130.py provided above to properly update the EXTH 121 metadata value if need be and now it appears to work just fine with the KindlePreviewer. So here is an experimental kindlestrip_v132.py.zip that hopefully support K8 style mobis. KevinH |
01-17-2012, 02:21 PM | #34 | |
Grand Sorcerer
Posts: 28,038
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
01-30-2012, 01:42 PM | #35 |
Sigil Developer
Posts: 8,156
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Hi DiapDealer,
Based on looking at Nick's mobi_split.py code, it seems that the Mobi Header actually has a pointer and count to the SRCS record: srcs_index = 224 (or 0xe0) srcs_count = 228 (or 0xe4) So I think we need a new version of kindlestrip.py that once it removes the SRCS section, it modifies the mobi (section 0) header to set 0xe0 to 0xffffffff and 0xe4 to 0. We probably need to do that (or at least check) inside both the mobi7 header and the kf8 mobi header. We should also probably back-port this change to the original kindlestrip.py as well. KevinH |
Advert | |
|
01-31-2012, 05:48 PM | #36 | |
Sigil Developer
Posts: 8,156
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Hi DiapDealer,
I took a shot at using the srcs index and count info in a hopefully appropriate manner. I called it kindlestrip_v133.py I have only tested it in a limited fashion. KevinH Quote:
|
|
06-04-2012, 05:57 AM | #37 |
The Grand Mouse 高貴的老鼠
Posts: 72,470
Karma: 309060442
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
I've updated the first post in this thread to have the latest KindleStrip, as updated by KevinH (& updated by me to add him to the credits).
I've also updated the AppleScript Wrapper to include the latest version. |
06-19-2012, 05:15 AM | #38 |
Member
Posts: 21
Karma: 244219
Join Date: Jul 2011
Device: K3
|
tiny bug report
i just upgraded to the latest version - tiny bug report - which does not affect the function of this nice utility!
line 142 of kindlestrip_v134.py: print " beginning at offset %0x and ending at offset %0x" % (srcs_offset, srcs_length) should be: print "beginning at offset %0x and ending at offset %0x" % (srcs_offset, next_offset-1) or this: print " beginning at offset %0x for length %0x" % (srcs_offset, srcs_length) |
06-19-2012, 09:15 AM | #39 | |
The Grand Mouse 高貴的老鼠
Posts: 72,470
Karma: 309060442
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
|
|
08-12-2012, 06:55 PM | #40 |
Wizard
Posts: 1,669
Karma: 2300001
Join Date: Mar 2011
Location: Türkiye
Device: Kindle 5.3.7
|
Thanks
|
08-20-2012, 05:14 AM | #41 | |
Member
Posts: 21
Karma: 244219
Join Date: Jul 2011
Device: K3
|
mobi with srcs_count = 2
Quote:
kindlestrip.py displays: KindleStrip v1.34. Written 2010-2012 by Paul Durrant and Kevin Hendricks. Found SRCS section number 240, and count 2 Error: SRCS section num does not point to SRCS. The 1st section (240) starts "PAGE" - this appears to be a "pageMap" section generated from the "page-map.xml" file in the epub - the section contains these strings: "fileRevisionId" : "1" "description" : "PageMap from source by kindlegen" The 2nd section (241) starts "SRCS". I think the "pageMap" should be retained, "SRCS" section stripped and srcs_count reduced by 1 - I'll attempt to code a fix and test it ... Last edited by dilo_sec; 08-20-2012 at 10:38 AM. Reason: update with kindlestrip.py actual output |
|
08-22-2012, 12:18 PM | #42 |
Sigil Developer
Posts: 8,156
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Hi,
Having the pagemap.xml stored inside the mobi and not in a separate file is new (and interesting). I am not sure if it is something that needs to be stripped or not. The issue is the sanity check for SRCS. Perhaps that should allow SRCS or PAGE or maybe we need to be able to extract the PAGE information similar to how we extract the SRCS. Would you please post a zip archive of a sample epub ebook that uses a pagemap.xml file so that we can run it through the latest kindlegen to see exactly what is being stored in the PAGE section of the mobi and if it is used or referenced anyplace else in the header. Thanks, KevinH |
08-22-2012, 04:08 PM | #43 | |
Member
Posts: 21
Karma: 244219
Join Date: Jul 2011
Device: K3
|
kindlestrip_v135.py
Quote:
Incidentally, kindlegens v1.2, v2.4 and the latest v2.5 all create a mobi with srcs_cnt = 2, using the files in sample epub (LOREM2.epub) - also attached below. I've only tested in the Kindle Previewer (I'm Kindle-less at the moment!) using the 2 de-SRCS'ed mobi files I have (sample and ebook where I first came across srcs_cnt=2). |
|
08-24-2012, 12:05 AM | #44 |
Sigil Developer
Posts: 8,156
Karma: 5450818
Join Date: Nov 2009
Device: many
|
inclusion of pagemap
Hi,
Thanks for posting LOREM2.epub. I used it with Kindlegen 2.5 and found that the page map information from page-map.xml info is somehow encoded (into position or byte offset info) and included in *both* the Mobi6 Header and the Mobi8 header inside the mobi. I had never actually seen that before. The SRCS offset and count were never typically set in the Mobi8 header. But that makes sense as the formats are different enough that the Mobi8 version would need different page map information. Here is what the latest version of DumpMobiHeader_v010.py shows for the kindlegen generated mobi (note the Section Map at the end as well): kbhend$ python DumpMobiHeader_v010.py LOREM2.mobi DumpMobiHeader v010 LOREM2.mobi .MOBI First Header Dump from Section 0 Header Version is: 0x6 Header start position is: 0x0 Header Length is: 0xf8 Field: compression_type Offset: 0x000 Width: 2 Value: 0x02 Field: fill0 Offset: 0x002 Width: 2 Value: 0x00 Field: text_length Offset: 0x004 Width: 4 Value: 0x1796 Field: text_records Offset: 0x008 Width: 2 Value: 0x02 Field: max_section_size Offset: 0x00a Width: 2 Value: 0x1000 Field: crypto_type Offset: 0x00c Width: 2 Value: 0x00 Field: fill1 Offset: 0x00e Width: 2 Value: 0x00 Field: magic Offset: 0x010 Width: 4 Value: MOBI Field: header_length Offset: 0x014 Width: 4 Value: 0x00f8 Field: type Offset: 0x018 Width: 4 Value: 0x0002 Field: codepage Offset: 0x01c Width: 4 Value: 0xfde9 Field: unique_id Offset: 0x020 Width: 4 Value: 0xaa53c38e Field: version Offset: 0x024 Width: 4 Value: 0x0006 Field: metaorthindex Offset: 0x028 Width: 4 Value: 0xffffffff Field: metainflindex Offset: 0x02c Width: 4 Value: 0xffffffff Field: index_names Offset: 0x030 Width: 4 Value: 0xffffffff Field: index_keys Offset: 0x034 Width: 4 Value: 0xffffffff Field: extra_index0 Offset: 0x038 Width: 4 Value: 0xffffffff Field: extra_index1 Offset: 0x03c Width: 4 Value: 0xffffffff Field: extra_index2 Offset: 0x040 Width: 4 Value: 0xffffffff Field: extra_index3 Offset: 0x044 Width: 4 Value: 0xffffffff Field: extra_index4 Offset: 0x048 Width: 4 Value: 0xffffffff Field: extra_index5 Offset: 0x04c Width: 4 Value: 0xffffffff Field: first_nontext Offset: 0x050 Width: 4 Value: 0x0003 Field: title_offset Offset: 0x054 Width: 4 Value: 0x0238 Field: title_length Offset: 0x058 Width: 4 Value: 0x000a Field: language_code Offset: 0x05c Width: 4 Value: 0x0009 Field: dict_in_lang Offset: 0x060 Width: 4 Value: 0x0000 Field: dict_out_lang Offset: 0x064 Width: 4 Value: 0x0000 Field: min_version Offset: 0x068 Width: 4 Value: 0x0006 Field: first_resc_offset Offset: 0x06c Width: 4 Value: 0x0006 Field: huff_offset Offset: 0x070 Width: 4 Value: 0x0000 Field: huff_num Offset: 0x074 Width: 4 Value: 0x0000 Field: huff_tbl_offset Offset: 0x078 Width: 4 Value: 0x0000 Field: huff_tbl_len Offset: 0x07c Width: 4 Value: 0x0000 Field: exth_flags Offset: 0x080 Width: 4 Value: 0x0858 Field: fill3_a Offset: 0x084 Width: 4 Value: 0x0000 Field: fill3_b Offset: 0x088 Width: 4 Value: 0x0000 Field: fill3_c Offset: 0x08c Width: 4 Value: 0x0000 Field: fill3_d Offset: 0x090 Width: 4 Value: 0x0000 Field: fill3_e Offset: 0x094 Width: 4 Value: 0x0000 Field: fill3_f Offset: 0x098 Width: 4 Value: 0x0000 Field: fill3_g Offset: 0x09c Width: 4 Value: 0x0000 Field: fill3_h Offset: 0x0a0 Width: 4 Value: 0x0000 Field: drm_offset Offset: 0x0a8 Width: 4 Value: 0xffffffff Field: drm_count Offset: 0x0ac Width: 4 Value: 0x0000 Field: drm_size Offset: 0x0b0 Width: 4 Value: 0x0000 Field: drm_flags Offset: 0x0b4 Width: 4 Value: 0x0000 Field: fill4_a Offset: 0x0b8 Width: 4 Value: 0x0000 Field: fill4_b Offset: 0x0bc Width: 4 Value: 0x0000 Field: first_content Offset: 0x0c0 Width: 2 Value: 0x01 Field: last_content Offset: 0x0c2 Width: 2 Value: 0x06 Field: unknown0 Offset: 0x0c4 Width: 4 Value: 0x0001 Field: fcis_offset Offset: 0x0c8 Width: 4 Value: 0x0008 Field: fcis_count Offset: 0x0cc Width: 4 Value: 0x0001 Field: flis_offset Offset: 0x0d0 Width: 4 Value: 0x0007 Field: flis_count Offset: 0x0d4 Width: 4 Value: 0x0001 Field: unknown1 Offset: 0x0d8 Width: 4 Value: 0x0000 Field: unknown2 Offset: 0x0dc Width: 4 Value: 0x0000 Field: srcs_offset Offset: 0x0e0 Width: 4 Value: 0x0009 Field: srcs_count Offset: 0x0e4 Width: 4 Value: 0x0002 Field: unknown3 Offset: 0x0e8 Width: 4 Value: 0xffffffff Field: unknown4 Offset: 0x0ec Width: 4 Value: 0xffffffff Field: fill5 Offset: 0x0f0 Width: 2 Value: 0x00 Field: traildata_flags Offset: 0x0f2 Width: 2 Value: 0x03 Field: ncx_index Offset: 0x0f4 Width: 4 Value: 0x0003 Field: unknown5 Offset: 0x0f8 Width: 4 Value: 0xffffffff Field: unknown6 Offset: 0x0fc Width: 4 Value: 0xffffffff Field: datp_offset Offset: 0x100 Width: 4 Value: 0xffffffff Field: unknown7 Offset: 0x104 Width: 4 Value: 0xffffffff Extra Region Length: 0x0 EXTH Region Length: 0x2130 EXTH MetaData Key: "Published" Value: "2012-08-20" Key: "Creator" Value: "E X Ample" Key: "Subject" Value: "Sample Text" Key: "Description" Value: "Sample Text" Key: "Language_(524)" Value: "en" Key: "TextDirection" Value: "horizontal-lr" Key: "K8(129)_Masthead/Cover_Image" Value: "kindle:embed:0001" Key: "K8(131)_Unidentified_Count" Value: 0x0000 Key: "StartOffset" Value: 0x027b Key: "Font Signature (hex)" Value: 0x010000000000000000000000000000800000000000000000 0000000000000000bef4edec Key: "Creator Software" Value: 0x00ca Key: "Creator Major Version" Value: 0x0002 Key: "Creator Minor Version" Value: 0x0005 Key: "Kindlegen_BuildRev_Number" Value: "0626-3a91e28" Key: "Creator Build Number" Value: 0x0000 Key: "K8(125)_Count_of_Resources_Fonts_Images" Value: 0x0001 Key: "K8(121)_Boundary_Section" Value: 0x000c Mobi Ebook uses the new dual mobi/KF8 file format Second Header Dump from Section 12 Header Version is: 0x8 Header start position is: 0xc Header Length is: 0xf8 Field: compression_type Offset: 0x000 Width: 2 Value: 0x02 Field: fill0 Offset: 0x002 Width: 2 Value: 0x00 Field: text_length Offset: 0x004 Width: 4 Value: 0x19df Field: text_records Offset: 0x008 Width: 2 Value: 0x02 Field: max_section_size Offset: 0x00a Width: 2 Value: 0x1000 Field: crypto_type Offset: 0x00c Width: 2 Value: 0x00 Field: fill1 Offset: 0x00e Width: 2 Value: 0x00 Field: magic Offset: 0x010 Width: 4 Value: MOBI Field: header_length Offset: 0x014 Width: 4 Value: 0x00f8 Field: type Offset: 0x018 Width: 4 Value: 0x0002 Field: codepage Offset: 0x01c Width: 4 Value: 0xfde9 Field: unique_id Offset: 0x020 Width: 4 Value: 0xaa53c38e Field: version Offset: 0x024 Width: 4 Value: 0x0008 Field: metaorthindex Offset: 0x028 Width: 4 Value: 0x0004 Field: metainflindex Offset: 0x02c Width: 4 Value: 0xffffffff Field: index_names Offset: 0x030 Width: 4 Value: 0xffffffff Field: index_keys Offset: 0x034 Width: 4 Value: 0xffffffff Field: extra_index0 Offset: 0x038 Width: 4 Value: 0xffffffff Field: extra_index1 Offset: 0x03c Width: 4 Value: 0xffffffff Field: extra_index2 Offset: 0x040 Width: 4 Value: 0xffffffff Field: extra_index3 Offset: 0x044 Width: 4 Value: 0xffffffff Field: extra_index4 Offset: 0x048 Width: 4 Value: 0xffffffff Field: extra_index5 Offset: 0x04c Width: 4 Value: 0xffffffff Field: first_nontext Offset: 0x050 Width: 4 Value: 0x0004 Field: title_offset Offset: 0x054 Width: 4 Value: 0x0238 Field: title_length Offset: 0x058 Width: 4 Value: 0x000a Field: language_code Offset: 0x05c Width: 4 Value: 0x0009 Field: dict_in_lang Offset: 0x060 Width: 4 Value: 0x0000 Field: dict_out_lang Offset: 0x064 Width: 4 Value: 0x0000 Field: min_version Offset: 0x068 Width: 4 Value: 0x0008 Field: first_resc_offset Offset: 0x06c Width: 4 Value: 0x000f Field: huff_offset Offset: 0x070 Width: 4 Value: 0x0000 Field: huff_num Offset: 0x074 Width: 4 Value: 0x0000 Field: huff_tbl_offset Offset: 0x078 Width: 4 Value: 0x0000 Field: huff_tbl_len Offset: 0x07c Width: 4 Value: 0x0000 Field: exth_flags Offset: 0x080 Width: 4 Value: 0x0058 Field: fill3_a Offset: 0x084 Width: 4 Value: 0x0000 Field: fill3_b Offset: 0x088 Width: 4 Value: 0x0000 Field: fill3_c Offset: 0x08c Width: 4 Value: 0x0000 Field: fill3_d Offset: 0x090 Width: 4 Value: 0x0000 Field: fill3_e Offset: 0x094 Width: 4 Value: 0x0000 Field: fill3_f Offset: 0x098 Width: 4 Value: 0x0000 Field: fill3_g Offset: 0x09c Width: 4 Value: 0x0000 Field: fill3_h Offset: 0x0a0 Width: 4 Value: 0x0000 Field: unknown0 Offset: 0x0a4 Width: 4 Value: 0xffffffff Field: drm_offset Offset: 0x0a8 Width: 4 Value: 0xffffffff Field: drm_count Offset: 0x0ac Width: 4 Value: 0x0000 Field: drm_size Offset: 0x0b0 Width: 4 Value: 0x0000 Field: drm_flags Offset: 0x0b4 Width: 4 Value: 0x0000 Field: fill4_a Offset: 0x0b8 Width: 4 Value: 0x0000 Field: fill4_b Offset: 0x0bc Width: 4 Value: 0x0000 Field: fdst_offset Offset: 0x0c0 Width: 4 Value: 0x1000e Field: fdst_flow_count Offset: 0x0c4 Width: 4 Value: 0x0001 Field: fcis_offset Offset: 0x0c8 Width: 4 Value: 0x0010 Field: fcis_count Offset: 0x0cc Width: 4 Value: 0x0001 Field: flis_offset Offset: 0x0d0 Width: 4 Value: 0x000f Field: flis_count Offset: 0x0d4 Width: 4 Value: 0x0001 Field: unknown1 Offset: 0x0d8 Width: 4 Value: 0x0000 Field: unknown2 Offset: 0x0dc Width: 4 Value: 0x0000 Field: srcs_offset Offset: 0x0e0 Width: 4 Value: 0x0011 Field: srcs_count Offset: 0x0e4 Width: 4 Value: 0x0001 Field: unknown3 Offset: 0x0e8 Width: 4 Value: 0xffffffff Field: unknown4 Offset: 0x0ec Width: 4 Value: 0xffffffff Field: fill5 Offset: 0x0f0 Width: 2 Value: 0x00 Field: traildata_flags Offset: 0x0f2 Width: 2 Value: 0x03 Field: ncx_index Offset: 0x0f4 Width: 4 Value: 0x000c Field: fragment_index Offset: 0x0f8 Width: 4 Value: 0x0004 Field: skeleton_index Offset: 0x0fc Width: 4 Value: 0x0007 Field: datp_offset Offset: 0x100 Width: 4 Value: 0x0012 Field: guide_index Offset: 0x104 Width: 4 Value: 0x0009 Extra Region Length: 0x0 EXTH Region Length: 0x213c EXTH MetaData Key: "Published" Value: "2012-08-20" Key: "Creator" Value: "E X Ample" Key: "Subject" Value: "Sample Text" Key: "Description" Value: "Sample Text" Key: "Language_(524)" Value: "en" Key: "TextDirection" Value: "horizontal-lr" Key: "K8(129)_Masthead/Cover_Image" Value: "kindle:embed:0001" Key: "K8(131)_Unidentified_Count" Value: 0x0000 Key: "StartOffset" Value: 0x027b Key: "StartOffset" Value: 0x0314 Key: "Font Signature (hex)" Value: 0x010000000000000000000000000000800000000000000000 0000000000000000bebcaff0 Key: "Creator Software" Value: 0x00ca Key: "Creator Major Version" Value: 0x0002 Key: "Creator Minor Version" Value: 0x0005 Key: "Kindlegen_BuildRev_Number" Value: "0626-3a91e28" Key: "Creator Build Number" Value: 0x0000 Key: "K8(125)_Count_of_Resources_Fonts_Images" Value: 0x0000 Map of Palm DB Sections Dec - Hex : Description ---- - ---- ----------- 0000 - 0000: HEADER 6 0001 - 0001: Text Record 0 0002 - 0002: Text Record 1 0003 - 0003: NCX Index 0 0004 - 0004: NCX Index 1 0005 - 0005: NCX Index CNX 0006 - 0006: RESC 0007 - 0007: FLIS 0008 - 0008: FCIS 0009 - 0009: Source Archive 0 0010 - 000a: Source Archive 1 0011 - 000b: BOUNDARY 0012 - 000c: HEADER 8 0013 - 000d: Text Record 0 0014 - 000e: Text Record 1 0015 - 000f: 0000 0016 - 0010: Fragment Index 0 0017 - 0011: Fragment Index 1 0018 - 0012: Fragment Index CNX 0019 - 0013: Skeleton Index 0 0020 - 0014: Skeleton Index_Index 1 0021 - 0015: Guide Index 0 0022 - 0016: Guide Index 1 0023 - 0017: Guide Index CNX 0024 - 0018: NCX Index 0 0025 - 0019: NCX Index 1 0026 - 001a: NCX Index CNX 0027 - 001b: FLIS 0028 - 001c: FCIS 0029 - 001d: Source Archive 0 0030 - 001e: DATP 0031 - 001f: EOF_RECORD So this is interesting indeed. So we need to figure out how that page-map.xml tag entries are converted and stored in the new kindlegen page map that is stored in the PAGE sections in both the Mobi6 and Mobi8 parts of the .mobi file. Once we add grok that we should be able to add support for unpacking that information using Mobi_Unpack and then figure out a way for Calibre to generate that page information as well for its joint kf8 and .azw3 files. Thanks for pointing this out. Kevin |
08-24-2012, 05:37 AM | #45 |
Member
Posts: 21
Karma: 244219
Join Date: Jul 2011
Device: K3
|
KevinH,
The pagemap section maps the page numbers to offsets into the raw (uncompressed) html in the mobi file - (only) the offsets are different for Mobi6 and Mobi8 parts. I've decoded the pagemap section as follows: (the use of some values are unknown?) -- start 0x0000: 50414745 PAGE 0x0004: 00000008 ? 0x0008: 00010001 ? 0x000C: 0000002A ? 0x0010: 0000 block 0? 0x0012: 001E size of block, 0x0032-0x0014 = 1E 0x0014: 7B0A { 0x0016: 2020 20226669 ... "fileRevisionId" : "1" 0x0030: 7D0A } 0x0032: 0001 block 1? 0x0034: 0054 size of block, 0x008E-0x003A = 5A 0x0036: 000A pages in pagemap 0x0038: 0010 ? 0x003A: 7B0A { 0x003C: 2020 20226465 ... "description" : "PageMap from source by kindlegen", 0x0073: 2020 20227061 ... "pageMap" : "(1,a,1)" 0x008C: 7D0A } 0x008E: 0281 page_1 0x0090: 0500 page_2 0x0092: 074A page_3 0x0094: 0959 page_4 0x0096: 0B49 page_5 0x0098: 0DCA page_6 0x009A: 1014 page_7 0x009C: 1176 page_8 0x009E: 1413 page_9 0x00A0: 164E page_10 --end pagemap Apologies about the formatting - all the tabs and mutiple spaces have been scrunched up to 1 space! Hope this is useful ... DS. |
Tags |
k5 tools, mobi2mobi |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Applescript Wrapper Application for Kindlegen | pdurrant | Kindle Formats | 50 | 02-18-2020 02:16 AM |
how to use python script with windows xp | tuufbiz1 | Other formats | 12 | 01-08-2011 09:22 AM |
How do I get a shortcut for a Python script onto the taskbar in W7? | Sydney's Mom | Workshop | 6 | 03-28-2010 09:11 PM |
Nedd a little help with a python script | gandor62 | Calibre | 1 | 08-07-2008 10:59 PM |
Python script to create collections | gwynevans | Sony Reader Dev Corner | 2 | 03-13-2008 01:29 PM |