09-21-2010, 02:12 PM | #46 | |
The Grand Mouse 高貴的老鼠
Posts: 72,264
Karma: 309000000
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
I'd suggest forgetting the terminal, and use the Applescript application I've just uploaded here: https://www.mobileread.com/forums/sho...836#post774836 Just drag&drop the mobipocket file onto the script, and it'll decode into a folder in the same location as the file. The first time you run it it will ask you to find the copy of the MobiUnpack.py script you have on your hard disk. If you have any further problems, just ask. |
|
09-22-2010, 10:39 AM | #47 |
Chocolate Grasshopper ...
Posts: 27,599
Karma: 20821184
Join Date: Mar 2008
Location: Scotland
Device: Muse HD , Cybook Gen3 , Pocketbook 302 (Black) , Nexus 10: wife has PW
|
Becky
Welcome to mobileread .... |
Advert | |
|
10-12-2010, 07:43 PM | #48 |
Guru
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
I've finally gotten around to "discovering" mobiunpack, and now I have a few questions.
1) on both Linux and Windows, the output .html file seems to have Apple/Mac style end-of-line characters. Can this be fixed easily? I'm not a python programmer by any means, but i did try changing things like "f = open(outsrc, 'wb')" to "f = open(outsrc, 'w')" without effect. 2) I'm guessing the .html file produced is not supposed to be valid HTML. e.g. it lacks a <!DOCTYPE..> header, and the <guide> section in the <head> shouldn't be there. The presence of the <mbp: pagebreak /> tags are trivial. Anyhow, it's a great tool for seeing what is going on inside the mobipocket file! Thanks for your efforts, all of you, whoever you are! |
10-12-2010, 08:08 PM | #49 |
Sigil Developer
Posts: 8,109
Karma: 5450184
Join Date: Nov 2009
Device: many
|
Hi,
I think the line endings depend on which type of machine was used to generate the original Mobi file. The ones you tested must have been made on a Mac platform. Luckily HTML itself is immune to line ending differences. But encodings (specificly utf-8) may need the high bit set so I would keep the 'wb'. If you are on Linux or Mac OSX, simply use tr to remove or change them: To replace carriage returns '\r' with new lines '\n': cat FILE.html | tr '\r' '\n' > temp.html mv temp.html FILE.html To simply remove the carriage returns without replacing them cat FILE.html | tr -d '\r' > temp.html mv temp.html FILE.html BTW: There is another tool: mobiml2html.py that will take the Mobi specific html file created by mobiunpack.py and make it xhtml if you want to archive things or convert them to epub. It is available as python source code with a GUI front-end from the same site as a zip archive http://code.google.com/p/ebook-conve...s.zip&can=2&q= or you can checkout the source tree itself http://code.google.com/p/ebook-conve...ource/checkout It is also available in the "tools" package mentioned on the ApprenticeAlf site. Hope this helps, KevinH Last edited by KevinH; 10-12-2010 at 08:19 PM. Reason: fixed a typo, added an download archive |
10-13-2010, 11:22 AM | #50 |
Guru
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
KevinH, Thanks for all the info. No, the files were not created on a mac. They were built on Linux and tested on Linux and Windows.
Actually it turns out that they seem to have no EOL characters at all. the "tr" command didn't change anything in the file. I had guessed Mac format because that's what notepad++ guessed. In the end I used perl to add linebreaks between all tags (e.g. "s/></>\n</g"). That turns out to be overkill, but at least the file is readable and editable. The clean-up tools you linked to work very well indeed. |
Advert | |
|
10-15-2010, 02:07 AM | #51 |
Enthusiast
Posts: 42
Karma: 11050
Join Date: Nov 2009
Device: Kindle Paperwhite, Kindle Touch, Kindle 2
|
A few things:
* MobiPocket is an old format, derived from HTML2 with some extensions. In HTML2 times, there was no !DOCTYPE, and in any case there is no need in MobiPocket to differentiate between document languages (because there is only one), so you shouldn't expect it to be there. In fact, quite a bit of what mobigen/kindlegen does is to convert HTML4 and XHTML to HTML2 by rewriting tags and flattening CSS into old-style tags. * <guide> is one of the extensions. Basically they took an entire chunk of the .opf file and stuck it in the <head> tag so that devices could generate menus to navigate to parts of the document. There are historical reasons for doing it this way, originating with MobiPocket's predecessor formats, which were basically just one big HTML document wrapped in a Palm database file. There are many other ways this could have been done, but creating multiple files/streams within the Palm database would get awkward for several reasons, not least of all because links are all flattened to absolute file positions. * mobigen/kindlegen specifically removes line breaks to make the file smaller, so you shouldn't expect to see any. Honestly, MobiPocket is such a crappy format that I would strongly advise avoiding it at all costs, with the sole exception of using it as an output format to display on a Kindle. For all other purposes, you should use ePub. I only wrote the original mobiunpack.py because I tried to decompress the dictionary with other tools, it took more than 30 minutes, and I wanted to demonstrate that it could be done much better (even in Python). Last edited by adamselene; 10-15-2010 at 02:12 AM. |
10-15-2010, 11:17 AM | #52 | |
Guru
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
Quote:
Thanks for the background information. I find it fascinating. Mobiunpack is a great tool for looking at what's inside the mobi package, and thanks to it I can actually SEE what you're talking about. I've been dabbling in ebook format conversions since Aportis Doc and Peanut Reader on Palm Pilots, but it has only been recently that I've taken a more "professional" interest. So much to learn! |
|
11-14-2010, 04:43 PM | #53 |
Junior Member
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: KindleDX2, PocketBook InkPad Color
|
Having finally bought my Kindle just as the price of modern digital books went up, I naturally turned to the wonderful world of out-of-copyright material for the bulk of my reading pleasure. Of course the quality of digitizing does vary a lot, and I'm just grateful for all the work that people have done already to make it possible to read books I'd otherwise not be able to get. However, I have a surprising number of (non-DRM) ebooks which need only a small number of errors corrected, and I'm OCD enough to want to do that if I can. I know calibre would solve some of these problems, but for editing an ebook originally generated in PRC this script seems much more suitable. Unfortunately I don't have Python installed on my Windows XP computer and I don't really want to get involved with all the complications that would involve just to do some PRC proofreading....
Is there any possibility that some kind person might convert this script into a Windows executable, as has been done for the mobiperl scripts? I know it's an imposition and I feel guilty about not doing it for myself, but I'm getting older and doing something like installing Python doesn't seem as much fun as it used to. |
11-14-2010, 04:51 PM | #54 |
Junior Member
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: KindleDX2, PocketBook InkPad Color
|
Sorry...adding this post because I can't figure out how else to subscribe to this thread...had the wrong option set when I posted the first time... darn :newbie !
|
11-14-2010, 04:56 PM | #55 |
Wizzard
Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
1) Installing Python on Windows is as easy as double-clicking the installer from ActiveState Python Community Edition. Actually using it is admittedly a bit trickier, but perhaps someone will make a widgetized version.
2) You can subscribe to any thread without posting in it by clicking the Thread Tools button in the bar above the top post and choosing Subscribe. 3) Welcome to MobileRead! |
11-14-2010, 05:10 PM | #56 |
Junior Member
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: KindleDX2, PocketBook InkPad Color
|
Duh...thank you for that, ATDrake. (Especially as I apparently didn't succeed the other way....)
I may just have to grit my teeth and take on Python as well as the prc format (and XML and all the other things I only vaguely sorta know about). Somehow I hadn't expected getting a Kindle to turn me back into any sort of computer geek after decades of just being a user! |
11-14-2010, 07:13 PM | #57 | |
Grand Sorcerer
Posts: 27,962
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
(Even though a Python to C port of MobiUnpack probably wouldn't be that difficult... there'd then be two separate versions to maintain) |
|
11-14-2010, 08:21 PM | #58 |
Junior Member
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: KindleDX2, PocketBook InkPad Color
|
Very humbly...what's Tk? I thought what was available was the original script and an applet for the Mac....
|
11-14-2010, 08:47 PM | #59 | |
Grand Sorcerer
Posts: 27,962
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
11-14-2010, 09:28 PM | #60 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: KindleDX2, PocketBook InkPad Color
|
Quote:
...No, never mind, I worked that bit out for myself. Thank you so much for all your help! Last edited by sklamb; 11-15-2010 at 01:06 AM. Reason: Brain clarification and general dawning of enlightenment.... |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |