09-14-2011, 07:54 AM | #181 |
Developer
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
It seems to support only subversion (*shudder*)...
As I've said before, I pushed my git repository to github and any fellow developer should feel free to create forks for their own development which can be merged if a feature is ready: https://github.com/siebert/mobiunpack Ciao, Steffen |
09-14-2011, 09:27 AM | #182 |
Sigil Developer
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
|
Hi fandrieu,
Great work! I will take a shot at combining your latest version with a version that uses Siebert's readTag routine to parse the TAGX which can be found in the indx0 section to find the field bitmaps for each tag and parse them. That way we can forget about all of the if type == 0x1f lines and just use the correct bitmaps to decipher which fields are present and then read them. Thanks! KevinH |
Advert | |
|
09-14-2011, 09:38 AM | #183 | |
Sigil Developer
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
|
Hi DaleDe,
Quote:
Based on similar experiences from other small (couple of files only) dev projects, it appears to me that using development specific hosting with its own hurdle of concurrent versioning tool (git vs svn vs mercurial vs cvs vs rcs, etc.) and the lack of visits by users who might have an "itch to scratch" simply lowers contributions. I think the same thing happens with users of both Sigil and Calibre. They are constantly pointed to other official sites but most of the impetus for change is done or initiated via MR. So unless we are disrupting things with our posts, I would prefer to keep things here just to maximize our exposure to new users (and hopefully potential developers) who might want to contribute a new feature or quick fix. My 2 cents ... KevinH |
|
09-14-2011, 10:07 AM | #184 |
Member
Posts: 11
Karma: 10
Join Date: Sep 2011
Device: kindle 3
|
KevinH, sorry to flood the thread with zips, but here a new version
I tried the NCX code against on all the mobis I could lay my hands on... The only "real" error I got was with really fat ebooks (technical books with more than a thousand entries), the INDX1 is splitted across more than one section ! I first added a few checks to prevent exceptions, but more importantly found out the the actual number of "data" INDX sections is stored in the INDX0. So I modified the code to take this into account and parse multiple INDXx. In the zip file you'll find a file for this test case, a dummy book with 4000 entries on 5 levels (that's a 600kb ncx...) While I was at it, as suggested by siebert, I used his tagx code to parse the rest of INDX0, but still doesn't do anything with the data. Please use this version instead if the previous if you plan on integrating the changes. Thanks, fand. PS: i also included the (simplistic) script I used to test the code on all my books, if someone interested... Last edited by fandrieu; 09-14-2011 at 02:23 PM. Reason: reup: fixed an error in child reordering |
09-14-2011, 12:39 PM | #185 |
Grand Sorcerer
Posts: 27,699
Karma: 196509000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Hi, the above (mobiunpack_testncx2.zip) test script isn't recognizing the ncx in most of my mobi's. The multi-level stuff seems to be off by one. Any of my mobi's that have a strictly flat ncx (one level), the script mistakenly reports as having "No ncx." And with a mobi that has a two-level ncx, the script builds a one-level (flat ncx file)... ignoring the parent level if an entry has a parent.
I may be wrong, but I seem to remember something about calibre flattening the ncx regardless. I'm not sure the Kindle properly handles a multi-level ncx file. Something about only the parent levels (and not the children) showing on the progress bar as "jump points" (which is the only thing useful function the ncx provides on a Kindle). I could be completely mistaken about all that, though... I'll have to do some testing. Last edited by DiapDealer; 09-14-2011 at 12:49 PM. |
Advert | |
|
09-14-2011, 01:28 PM | #186 | ||
Member
Posts: 11
Karma: 10
Join Date: Sep 2011
Device: kindle 3
|
Quote:
But for now I couldn't find a book to reproduce the problem, that's pretty weird, i'll look into it further... Quote:
But anyway kindlegen does produce this kind of file and my goal with this code was to extract as much from the mobi as possible, so that you can re-compile the files from mobiunpack into an as-identical-as-possible new mobi... |
||
09-14-2011, 03:22 PM | #187 |
Sigil Developer
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
|
Hi All,
Okay, I took fandrieu's latest, and modified it to pass the tagx info to the readINDX1 routine and fixed an off by one in the code that sorts the NCX. I think this should now be close. PS: Actually I still think sortINDX has an off-by-one issue and my change may not be the correct one! My change fixed my problem but will probably fail for some other case. Recursion is so fun! Either way it needs to be worked on and fixed. We should also re-factor things into classes and maybe even separate it into files that encapsulate the various functions in some smarter way. Last edited by KevinH; 09-15-2011 at 06:56 PM. Reason: add a PS |
09-14-2011, 05:08 PM | #188 | |
Grand Sorcerer
Posts: 27,699
Karma: 196509000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'm getting good results with these latest scripts. I'm still trying to find something in one of my books that breaks it, but I'm not having much luck.
Quote:
Last edited by DiapDealer; 09-14-2011 at 05:33 PM. |
|
09-14-2011, 05:28 PM | #189 | |
Member
Posts: 11
Karma: 10
Join Date: Sep 2011
Device: kindle 3
|
Quote:
calibre fetched a scheduled feed just while i was testing some files, so i tried the resulting "periodical" mobi and that was it It seems the problem is with the INDX parsing, i got the output: Code:
parsed INDX header: len 192 nul1 0 type 1 gen 0 start 1256 count 54 code 4294967295 lng 4294967295 total 0 ordt 0 ligt 0 contextual data @ xB DF 0 -1 1 6 contextual data @ x98 2 2 E2 -1 -1 contextual data @ x127 46 2 E2 -1 -1 There's actually an extra VWI in the first "DF" entry so the rest is shifted. I guess the right way to fix should be to use the TAGX data to reliably know what to expect in the entries. In this particular case our current "type-based" rules might work if we took into account the differences between book & periodical style indexes...but i'm yet to fiddle with that... EDIT: I missed KevinH last post... Thanks for the tagx code i'll look into it And yes there were some errors in the sortINDX code i actually (silently out of shame ) reuploaded the zip earlier with >= replaced by > in the first test and other fixes EDIT2: tagx: pretty impressive, many thanks for quickly implementing this tagx bit i had skipped altogether sortINDX: you got the second ">0" error but missed the one i mentioned above refactor: i was toying with the oop approach before but wouldn't do it to keep in sync with other versions, but i have a mobiunpack_ootest.py somehere... Last edited by fandrieu; 09-14-2011 at 06:05 PM. |
|
09-14-2011, 06:04 PM | #190 |
The Grand Mouse 高貴的老鼠
Posts: 71,889
Karma: 307105450
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Bear in mind that calibre-generated Mobipocket files might not be valid in all instances, since the code was written with reverse-engineered info, not with documentation of the format.
|
09-14-2011, 07:02 PM | #191 |
Sigil Developer
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
|
Hi All,
Okay I merged the fixes that fandrieu made to his version (fixes to sortINDX, other changes) and added in a few other typo fixes and now I think we have a version we can use as the basis for public testing and as a basis for refactoring into classes while trying to keep to just one file. Very nice work fandrieu! mobiunpack_fand_updated2.zip is attached. KevinH Last edited by KevinH; 09-15-2011 at 08:32 PM. |
09-14-2011, 07:58 PM | #192 |
Grand Sorcerer
Posts: 27,699
Karma: 196509000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The above script is slightly broken for MOBI's that have no NCX (when DEBUG_NCX is set to False). In that circumstance, the outncx variable is referenced before it's assigned in the unpackBook function. The <spine> element is also incorrect in the opf for a MOBI with no ncx file.
I made two small changes to the unpackBook function that make it work for MOBI's with no NCX. A quick diff will reveal the simple changes. I'm having quite a bit of success with unpacking various books and rebuilding them with Kindlegen. Last edited by DiapDealer; 09-16-2011 at 01:20 PM. |
09-14-2011, 08:06 PM | #193 |
Sigil Developer
Posts: 7,878
Karma: 5449552
Join Date: Nov 2009
Device: many
|
Hi DiapDealer,
Nice catch! I never actually tested it on a book without an NCX. If your version seems to work for everyone, then we have one to release before we attempt the refactoring/adding of classes. Thanks, KevinH [QUOTE=DiapDealer;1742537]The above script is slightly broken for MOBI's that have no NCX (when DEBUG_NCX is set to False). In that circumstance, the outncx variable is referenced before it's assigned in the unpackBook function. The <spine> element is also incorrect in the opf for a MOBI with no ncx file. I made two small changes to the unpackBook function that make it work for MOBI's with no NCX. A quick diff will reveal the simple changes. I'm having quite a bit of success with unpacking various books and rebuilding them with Kindlegen. [/QUOTE] |
09-14-2011, 09:54 PM | #194 |
Member
Posts: 11
Karma: 10
Join Date: Sep 2011
Device: kindle 3
|
Hehe, i didn't take the time to check your latest fixes (pretty late here), but you seem to have spotted the misplaced outncx=False line
I just wanted to add another bit that troubled me: I merged the (hopefully fixed) sortINDX & buildNCX functions, removing an "evolutionary" clutch with the added bonus of correct indenting (but didn't take much time to test it though...) |
09-15-2011, 10:42 AM | #195 |
Developer
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Hi,
I've looked into the latest source provided by fandrieu and the handling seems to make some shortcuts. I assume that the ncx index also contains a IDXT section, why don't you don't use it to find the start and end position of each entry, so you can verify that you've decoded all bytes? The tag handling code will work only if all bitmasks are single bits. Is this always the case? I would then at least add an assertion which will fail for non-single bitmasks. Ciao, Steffen |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |