07-20-2011, 08:40 AM | #1 |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
[Conversion Output Plugin] AZW output by kindlegen for periodicals
This plugin overrides the default mobi periodical generation routine with another that makes use of the kindlegen program available at Amazon.com here. This project is motivated by the fact that the section/article view introduced since Kindle 3.1 firmware does not work properly with the mobi periodical generated by the default Calibre routine. One cannot have the pointer in that view points to the last article read when he reads an article and uses the "back" button on the device to go to that view (details). Using kindlegen is a viable solution the community comes up so far (details). This plugin ports that solution into the Calibre plugin framework. After installing this plugin, one can simply specify the output format to azw in both command line and Calibre graphical interface to generate periodicals with the problem described above gone.
History: v1.0.5 [2014/12/07]: Now compatible with Calibre version 2.12. Note I can no longer generate periodicals using the latest Kindlegen (2.9). v1.0.4 [2011/07/23]: Now compatible with Calibre version 0.8.11. v1.0.3 [2011/07/22] - Remove sections that are empty (Support Calibre version up to 0.8.10) v1.0.2 [2011/07/22] - Use kindlestrip (by Paul Durrant) to trim down the result file v1.0.1 [2011/07/21] - Fix a few typesetting problems (using calibre's own routine) and make it work with one-feed recipe v1.0.0 [2011/07/20] - Basic done TODO: * Increase its dependency on code in Calibre source. This allows the code to stay in sync with any updates in Calibre. Latest news (2011/09/08): There is already a native solution to the problem. Please check here for details. Last edited by tylau0; 12-07-2014 at 06:10 AM. |
07-20-2011, 12:29 PM | #2 |
creator of calibre
Posts: 44,559
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No need, I started working on MOBI indexing a few days ago. Hopefully I will be able to figure out the problem. I've written code that decompiles the MOBI, including all indexing information which should allow me to see what the differences between kindlegen generated periodicals and calibre ones are. You can run it with
calibre-debug --inspect-mobi filename.mobi You will need to be running from latest calibre source for this to work. |
Advert | |
|
07-20-2011, 04:11 PM | #3 |
onlinenewsreader.net
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
kovid - I'm 99% sure the issue includes the trailing byte sequence following each HTML record. I have been unable to fully decode it because Kindlegen seems to insert (in an apparently inconsistent way) some arbitrary bytes in some of the sequences and I haven't been able to determine what the logic is. However, if you take a Kindlegen-generated document (which works properly on Kindle in Sections & Articles view) and zero out the trailing byte sequences, the document still displays properly on Kindle except now it exhibits the impaired 'back' function in the Sections & Articles view. So the trailing byte sequences are definately part of the puzzle.
On the other hand, if you look at Amazon generated periodicals (e.g. the New York Times) the trailing byte sequences are consistent and reflect the changes I made to the MOBI code a few months ago (and forwarded to you). However, in those files the NCX entries have additional data bytes that I cannot decode, but may be associated with the issue. This suggests Amazon is NOT using Kindlegen to format periodicals (not surprising, actually because Kindlegen is a piece of cr*p). |
07-20-2011, 04:36 PM | #4 |
creator of calibre
Posts: 44,559
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yeah, I've already figured out its the trailing byte sequences. I'm working on decoding them now.
I'm currently working off kindlegen generated mobi files, and I've completely deciphered the index, cnx and tagx records for those. The trailing byte sequences are still opaque to me, but I don't think they will prove very hard to decode. Hopefully, understanding and duplicating what kindlegen does with the TBS sequences will allow calibre periodicals to work properly. |
07-20-2011, 09:45 PM | #5 | |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
If you are continuing to work on this while waiting for an update of Calibre with mobi indexing as desired, or if the feedback is of any use for any other projects you may have under way, I have tested the plugin with six recipes for which Calibre generates correct epub and mobi output (although of course without proper back button behaviour), using both current and previous versions of kindlegen - although unfortunately the recipe I included to test the masthead was one of those which failed, so I could just have tested with the current version. Four of the six recipes generated azw output files, the other two failed. Of the four which produced azw output, two had correct back button behaviour, the other two produced azw files which could be viewed with Kindle for PC, but opened on the Kindle itself showing a table of contents but with a message box which displayed "The selected item could not be opened. If you purchased this item from Amazon, delete the item and download it from Archived Items." More comments on this below. I tested using the command line ebook-convert with "--test -vv --debug-pipeline" to generate small e-books, and generated epub, azw and mobi versions to compare. In one case, one of the four articles extracted showed a loss of some text in the first article in the azw output when compared to the epub or mobi versions: The recipe used was: Spoiler:
The second recipe which produced a useable azw file (loss of text not noticed in this case, but possible of course when more articles are extracted) was: Spoiler:
The third azw file producing recipe, with problems described above, was: Spoiler:
This recipe failed at first to produce an azw file, as it was an initial version returning the complete page. The faulty azw file was only generated when the keep_only_tags and remove_tags were added to restrict the text extracted. I found with nickredding's code that more azw files were generated, but the extra azw files (beyond the first two which worked here) also were faulty and showed the same message box. The fourth recipe which produced a faulty azw file was: Spoiler:
The two recipes which completely failed were: Spoiler:
which could have tested the masthead with kindlegen 1.1, if it had generated output, and: Spoiler:
As all six recipes produced epub and mobi versions, my suspicion is that the problem may be with the html extraction, either that Calibre removes content which would prove problematic which is left in here (and the lost text with the first recipe suggests comparison of the html extracted with Calibre and here could be useful - I will report if I find anything of interest in this respect, or kindlegen is simply more sensitive to unwanted or unsupported html than ebook-convert. As kindlegen seems to be based on MobiPocket mobigen, which I called without difficulty in my own extended version of the MobiPocket webcompanion which I continued to develop and use after Amazon bought MobiPocket and dropped the webcompanion, until I bought a Kindle in January and started to use Calibre for News generation, I am more inclined to suspect that it is something with the html passed to kindlegen which causes failure - five of these six recipes are for publications which extracted without difficulty when I used mobigen in my own software. |
|
Advert | |
|
07-21-2011, 10:07 AM | #6 |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Thank oneillpt for the extensive testing.
The missing text was my fault - I delete certain <p> tags in opf file with content inside. All the recipes that were not working contain only one feed. It was not taken care before in my implementation. Attached is the modified CalibreKindlegenHelper.py. Replace it with the one in azwplugin.zip. I'll do a further extensive testing soon and pack it in the plugin. |
07-21-2011, 12:07 PM | #7 |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Check the updated plugin at the top of this thread. It should have all the problems you mentioned fixed.
Thanks again. P.S. Thanks Kovid and nickredding for working on a Calibre self-contained solution. I am looking forward to that clean fix! Last edited by tylau0; 07-21-2011 at 02:45 PM. |
07-21-2011, 10:17 PM | #8 | |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
One change which I would suggest, and which I will try out for myself tomorrow, is to add a compression setting for kindlegen in the plugin. The azw files from kindlegen weigh in at nearly twice the size of the mobi version generated from the same recipe. My Depeche du Midi azw file now comes in at 18 MB for example! Many thanks for this very useful plugin! |
|
07-22-2011, 01:07 PM | #9 |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
I adopt the code from Kindlestrip that trims the file size by half. Please check the top post for the updated plugin. Thanks.
|
07-22-2011, 07:01 PM | #10 | |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
I still find that the "The selected item could not be opened. If you purchased ..." message box can occur, although now in a way which does not prevent use of the ebook. It occurs with an extended version of one of the recipes I used earlier to test: Spoiler:
In this case only the first feed shows any articles (38 at the moment), but the Kindle table of contents includes all the remaining feeds, showing zero articles for each, and the ebook text shows the name of each feed followed by the single line "RSS de diariodelaltoaragon.es" (this seems to be correct as browsing the rss feeds in a web browser gets this single line for these feeds too). Moving down the left (sections) column of the Kindle toc past that first feed to the second which shows zero articles gives the message box, forcing closure of the ebook. The same thing happens when on the last (Calibre Table of Contents) page when attempting to open the Kindle table of contents, requiring paging back or skipping to previous article before the Kindle table of contents can be accessed. In this case I found this problem by accident - all sections in the Kindle table of contents were visible on screen at the same time, so there was no need to scroll down the sections. In another case, the list of sections required a second page to display, and while scrolling down through a series of sections with zero articles the right hand (articles) column displayed the first articles for the next section with articles, and no message box occurred. |
|
07-22-2011, 07:42 PM | #11 |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Latest version even better
The latest version now produces an azw file about 10% smaller than the corresponding mobi version.
|
07-22-2011, 08:24 PM | #12 | |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
v1.0.3 (available at the top post) removes sections without any article. That should fix the issue you raised.
Quote:
|
|
07-22-2011, 08:35 PM | #13 |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
A fix in just over 40 minutes. Impressive!
|
07-23-2011, 05:51 PM | #14 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
I just tried to convert an epub file into AZW using your plugin but when the conversion gets to 67% "calibre-parallel.exe" crashes.
I'm not sure if it matters but I'm using the portable version of Calibre. Any advice? Spoiler:
|
07-23-2011, 10:57 PM | #15 |
Connoisseur
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
v1.0.4 at the top post may have solved the issue you raise. Please check. Thanks.
|
Tags |
issue fix, kindle, kindlegen, periodical |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion to azw? | grapho | Conversion | 6 | 01-30-2011 11:01 AM |
AZW to EPUB conversion - overlapping letters | suecsi | Calibre | 4 | 10-17-2010 12:53 AM |
PDF to prc/azw Batch Conversion | xsolitudex | 2 | 09-04-2010 11:19 AM | |
PDF -> AZW conversion, weird character spacing | beacher | Amazon Kindle | 7 | 08-17-2010 10:54 PM |
AZW Conversion | elliskatz | Introduce Yourself | 7 | 08-14-2010 06:47 AM |