09-01-2010, 02:01 PM | #1 |
The Grand Mouse 高貴的老鼠
Posts: 72,538
Karma: 309500000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Kindlestrip Python script and AppleScript wrapper
Kindlegen, Kindle Comic Creator and Kindle Previewer add the source files used in compiling the kindle ebook as one of the (invisible) records in the kindle ebook.
So I wrote a python script that strips out the sources record from Kindle format ebooks. And for those on Macs I wrote a nice Applescript wrapper and also put the python script in the AppleScript bundle to make things easy. Kevin Hendricks has since updated the code to handle files from KindleGen 2.x, and I've also tweaked a bit more to handle KindleGen 2.7. If you're going to upload to the Amazon store, this script is usually unnecessary, as Amazon will strip the sources before delivery anyway. Do not use this script to make files to be uploaded to KDP, unless you have to because of size constraints on uploaded Kindlegen now includes the option to not add the source files to the end of the generated book. So if you're using Kindlegen and want a file without the sources added, don't use KindleStrip, but specify this option in Kindlegen to get guaranteed correctly formatted books. If you're on a Mac you only need the Applescript, as it includes the Python script in it. The Applescript is a simple drag&drop operation — drag your KindleGen generated file onto it, and it creates one named [oldname]_stripped.mobi. As always, please comment with any bug reports or problems. Last edited by pdurrant; 06-12-2014 at 06:07 PM. |
09-03-2010, 06:23 PM | #2 |
The Grand Mouse 高貴的老鼠
Posts: 72,538
Karma: 309500000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Now at version 1.1. Writes out the stripped data as a zip file. The data in the Mobipocket file seems to have a 16 byte header that's written out as hexadecimal to the standard output. Thos using the AppleScript won't see this at all. I have no idea what the 16 bytes mean, so this probably isn't a loss.
|
Advert | |
|
09-03-2010, 06:40 PM | #3 |
I'm Super Kindle-icious
Posts: 6,734
Karma: 2434103
Join Date: Apr 2008
Location: Long Drive, Calinadia Candafornia
Device: KDXG, KT, Oasis
|
Thanks pdurrant! I don't have any books to upload to Amazon but I always appreciate the efforts of those who push the Kindle limits to make it even more useful.
|
09-24-2010, 01:23 AM | #4 |
Wizzard
Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
Just tried this on a couple of auto-generated mobis made via the new version of Kindle Previewer (1.5).
It now has"ePub support", by which it means that it automatically converts any ePubs dragged upon it to mobi and drops the file in the same folder, apparently on the lower -c1 compression setting. Also a new simulation option for iPad, but no K3 mode yet. But the people trying to figure out Kindle Audio/Video now have a new testing tool for their efforts. Anyway, the stripping works a treat and the extraction gives back almost exactly went in, as far as I can tell. Did a few more tests with my lazily assembled Fictionwise cleanup conversions and html comes back as zipped html, and a zipped up ePub in yields the exact same zipped-up ePub out. Interestingly enough, if you originally pointed KindleGen at an opf (either custom or via unpacked epub), then no matter what the source structure, the unzipped-from-stripped version yields up the css, html, image, and misc (ncx, etc.) files rearranged into separate subdirectories with exactly those names. Stripped file has immense space savings, often near-halving; sometimes more if there are a fair number of graphics involved in the source. Even pure text with no pictures is over a third smaller. I have absolutely no idea why Amazon would remove the entirely logical -donotaddsource option unless they actually want to serve up plenty of bloated files via 3G and cut down on the marketable "Kindle can hold #### books!" space (and deduct extra from royalties paid out, of course), which seems rather counter-productive to me. While we're on the subject of inexplicable KindleGen design decisions, might as well mention some more things I found out while using it:
|
09-24-2010, 02:17 PM | #5 |
Wizzard
Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
Also, I think I've figured out what the mysterious header bytes mean.
If your source was converted straight from a properly zipped ePub, then you get 53524353000000100000003000000001. If it came from any combination of un-prepackaged html/opf, it'll be 53524353000000100000002f00000001. If it's a no-source-files-added mobi to begin with, then the header bytes are 46434953000000140000001000000002. And it seems that even the samples offered for the newer books at Amazon nowadays include the bloat (but only from the mobi conversion and cut off appropriately at the sample length), which looks like it's a useless expenditure to me. Ah well, if they want to waste their server bandwidth for no good reason, that's entirely up to them. As long as they don't go back to charging that extra $2 Whispernet surcharge that they finally got rid of for Canadians. |
Advert | |
|
02-21-2011, 01:55 PM | #6 |
Enthusiast
Posts: 38
Karma: 10
Join Date: Nov 2010
Device: Sony eReader
|
If anyone still takes a gander at this thread, having some issues running the Kindlestrip tool on OSX10.6.6; a simple drag & drop of a .mobi file onto the AppleScript file doesn't actually cause anything to occur...taking a closer look, I'm wondering if the inherent Python files on my Mac are outdated to run kindlestrip properly (I had no issue at whatsoever using your ePub zip/unzip scripts, but I could be misled in that they don't use the Python language?). My version:
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) Noticed that there is a more current build of 3.2, wondering if maybe this could be the issue? I'm sure also there is a way to run from Terminal, but I am certainly not at that level of familiarity with Python to do so....thanks in advance if anyone spots this... |
02-21-2011, 02:08 PM | #7 |
Wizzard
Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
I'm also on 10.6.6 and the AppleScript has been working for me for the past couple of months and again when I used it yesterday.
I used to have the standard Python 2.6-ish install, but then I went and got the 2.7.1 installer from Python.org (after the source failed to compile, grr). Maybe your unzip utility sets the permissions wrongly? In any case, to use it on the command-line, just do python PATH/TO/kindlestrip.py OriginalFile.mobi OutputFile.mobi OptionalStrippedData.zip You can drag and drop the kindlestrip.py file onto the Terminal window and it will autofill its path, and the 3rd filename is optional if you don't care about looking at the stripped data. You can also alias it in your .profile for convenience, aka: alias kstrip="python PATH/TO/kindlestrip.py" and then string together a series of commands to batch process a folder: alias kstripbatch='for m in *.mobi; do kstrip "$m" "${m/.mobi/-stripped.mobi}"; done' |
02-21-2011, 04:03 PM | #8 | |
The Grand Mouse 高貴的老鼠
Posts: 72,538
Karma: 309500000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
What happens if you just double-click the applescript? (It should ask you to locate kindlestrip.py - just click cancel if it does.) |
|
02-21-2011, 08:25 PM | #9 | |
Enthusiast
Posts: 38
Karma: 10
Join Date: Nov 2010
Device: Sony eReader
|
Quote:
Once again, a tip of the hat...the help here is impressively reliable, and kudos on the tools.... |
|
02-21-2011, 08:27 PM | #10 | |
Enthusiast
Posts: 38
Karma: 10
Join Date: Nov 2010
Device: Sony eReader
|
Quote:
....thanks again! |
|
02-22-2011, 04:29 AM | #11 |
The Grand Mouse 高貴的老鼠
Posts: 72,538
Karma: 309500000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
02-22-2011, 06:54 AM | #12 |
Junior Member
Posts: 1
Karma: 10
Join Date: Feb 2011
Device: Kindle 3
|
Thanks for your investigation, and the tool!
After I got v1.1, I added, at line 78 (just after calculating penoffset and lastoffset), the following: if datain[self.penoffset:self.penoffset+4] != 'SRCS': raise StripException("already stripped") The intention here is to not delete the FCIS segment from an already-stripped file (or one that was generated with -donotaddsource). I'm enough of a doofus that I'm sure to mess up something by stripping it twice! (On the other hand, I've found one source that says the FLIS and FCIS segments aren't necessary for the Kindle, so at least I'd get a few second chances.) Thanks again! |
02-22-2011, 07:19 AM | #13 | |
The Grand Mouse 高貴的老鼠
Posts: 72,538
Karma: 309500000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
|
|
03-03-2011, 08:14 AM | #14 |
The Grand Mouse 高貴的老鼠
Posts: 72,538
Karma: 309500000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
09-19-2011, 03:41 AM | #15 |
Carbon Reserve
Posts: 44
Karma: 10
Join Date: Jun 2010
Device: PC
|
Could someone write a step by step tutorial for this using kindlestrip. I have tried to follow along but fell flat on my face despite being generally knowledgeable of computers. Step by step please, download this, drag that... Thanks.
Last edited by Xabache; 09-19-2011 at 03:45 AM. |
Tags |
k5 tools, mobi2mobi |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Applescript Wrapper Application for Kindlegen | pdurrant | Kindle Formats | 50 | 02-18-2020 02:16 AM |
how to use python script with windows xp | tuufbiz1 | Other formats | 12 | 01-08-2011 09:22 AM |
How do I get a shortcut for a Python script onto the taskbar in W7? | Sydney's Mom | Workshop | 6 | 03-28-2010 09:11 PM |
Nedd a little help with a python script | gandor62 | Calibre | 1 | 08-07-2008 10:59 PM |
Python script to create collections | gwynevans | Sony Reader Dev Corner | 2 | 03-13-2008 01:29 PM |