Correct Mobi location formula (maybe)

rodrigoccurvo · 12-01-2011, 03:42 PM

Hi, everyone.

For a while I've been trying to understand Kindle's location and the consensus seems to be that for Mobi 1 location = 128 bytes (it's even on https://wiki.mobileread.com/wiki/Page_numbers).

But today I was taking a look at the Amazon Cloud Reader (reader.amazon.com) source code and I found the following:

Code:

locationFromPosition: function (a) {
        return Math.floor(a * 2 / 300 + 1)
}

(The file is KindleReaderApp-min.js, but I won't post the whole link since it's weird and I don't know if it has session informations. I guess you can find it on your own.)

The surrounding code is a bit larger, but in the end that seems to be the formula used for calculating the location for the Mobi format (there is another for topaz which is Math.floor((a * d + 100) / 100)).

I've tried looking at the parameter "a" and for the text parts it seems to be characters, but I don't know if that means bytes for every case. I've tested it a little bit just to know if it's correct and it seems to be, but not enough for me to be sure. Also, I don't know if the relation holds for images and other things apart from characters.

I don't know and couldn't find the original source for the 128 bytes information, so I'm guessing it's an approximation. But as I said, in my initial tests the above formula seems to work.

What to you guys think? Does it make sense? Is the 128 bytes info an approximation or is it on the format specs?

[]'s

Rodrigo

DiapDealer · 12-01-2011, 05:03 PM

I've always heard 128 bytes of source html, but since Amazon doesn't release detailed specs, who knows?

pdurrant · 12-01-2011, 05:25 PM

Quote:

Originally Posted by rodrigoccurvo

Hi, everyone.

For a while I've been trying to understand Kindle's location and the consensus seems to be that for Mobi 1 location = 128 bytes (it's even on https://wiki.mobileread.com/wiki/Page_numbers).

But today I was taking a look at the Amazon Cloud Reader (reader.amazon.com) source code and I found the following:

Code:

locationFromPosition: function (a) {
        return Math.floor(a * 2 / 300 + 1)
}

(The file is KindleReaderApp-min.js, but I won't post the whole link since it's weird and I don't know if it has session informations. I guess you can find it on your own.)

The surrounding code is a bit larger, but in the end that seems to be the formula used for calculating the location for the Mobi format (there is another for topaz which is Math.floor((a * d + 100) / 100)).

I've tried looking at the parameter "a" and for the text parts it seems to be characters, but I don't know if that means bytes for every case. I've tested it a little bit just to know if it's correct and it seems to be, but not enough for me to be sure. Also, I don't know if the relation holds for images and other things apart from characters.

I don't know and couldn't find the original source for the 128 bytes information, so I'm guessing it's an approximation. But as I said, in my initial tests the above formula seems to work.

What to you guys think? Does it make sense? Is the 128 bytes info an approximation or is it on the format specs?

[]'s

Rodrigo

Now, that's an interesting finding. And it can be checked fairly easily. My copy of The Lord of the Rings in Kindle for Mac has 24992 locations. The raw mobi-html is 3748684 bytes long. Plugging that into the forumla you've found, you get 24992 (.22666...). Taking the 128 bytes estimate, you get 29286 (.59375).

My LotR is encoded with Windows Latin-1. Let's see what happens with a UTF-8 encoded ebook. I suspect that we need to pass bytes, not characters.

My copy of Unfinished Tales is a (kindlegen compiled) conversion from an ePub. It's utf-8 encoded. It has 1590431 characters, but 1613151 bytes of mobi-html. In Kindle for Mac it has 10755 locations.

1590431*2/300+1 =10603 (.87333...)
1613151*2/300+1 =10755 (.34)

Well. That seems definite. The number of bytes (not characters) through the unpacked mobi-html of the book, when divided by 150, adding 1 and truncating, is the kindle location in the book.

Good find!

pdurrant · 12-01-2011, 05:30 PM

I've updated that page in the wiki to reflect this new finding.

blaenk · 08-07-2017, 06:25 PM

Sorry for the necropost if that's not allowed, but I've been investigating this and just now found this thread. Given the last post here I decided to search around the mobileread wiki, is this the page: https://wiki.mobileread.com/wiki/Page_numbers ?

If so my question is that it takes as input presumably a byte-offset from the "start" of the book, where the book is the actual .mobi file? So if I have a location (taken from the My Clippings.txt file) I can convert that into a byte-offset position into the mobi file by doing something like (location - 1) * 150? I guess the tricky part would be then mapping that byte-offset into the unpacked contents of the mobi.

Basically I'm trying to see if it would be possible to take highlight locations from the 'My Clippings.txt' file on the Kindle device and map them to the actual source file in an unpacked book, but it would probably be easier to just brute-force search for the actual highlighted contents than to try to do this location mapping, especially since I've only seen mention of mobi so I'm not sure if this works the same with azw3.

jhowell · 08-07-2017, 07:40 PM

Quote:

Originally Posted by blaenk

I can convert that into a byte-offset position into the mobi file by doing something like (location - 1) * 150?

Yes. But it is an offset into the unpacked raw HTML content of the MOBI.

Quote:

Originally Posted by blaenk

I guess the tricky part would be then mapping that byte-offset into the unpacked contents of the mobi.

You can obtain the raw HTML contents of a MOBI file (what location numbers index into) using kindleunpack. You will also have to deal with DRM for many books.

Quote:

Originally Posted by blaenk

Basically I'm trying to see if it would be possible to take highlight locations from the 'My Clippings.txt' file on the Kindle device and map them to the actual source file in an unpacked book, but it would probably be easier to just brute-force search for the actual highlighted contents than to try to do this location mapping, especially since I've only seen mention of mobi so I'm not sure if this works the same with azw3.

Kindle locations are approximate indexes into the book, good enough to get you to the right screen of content. It is hard to tell whether or not that will be good enough for your purpose.

There is a recent thread in the Amazon Kindle forum, Extract notes from "My Clippings.txt", started by someone who wants to do something similar. See the post from me in that thread for information on how locations map to KF8/AZW3.

KFX format, which is used for most Amazon-purchased books on newer kindles, would take a lot more work to deal with.

Good luck.

blaenk · 08-07-2017, 08:46 PM

Thank you jhowell for responding, I really appreciate it.

Quote:

Originally Posted by jhowell

Yes. But it is an offset into the unpacked raw HTML content of the MOBI.

What I'm not sure about is, I don't have much experience with MOBI but with EPUB for example when I've unpacked them I've noticed that they sometimes (often? always?) contain multiple HTML files. If that can be true for MOBI as well, and you say that this is an offset into the unpacked raw HTML, then that implies that there is some defined order so that it is well-define where an offset enters into if it goes past the "first" file (not to mention it would also determine what the first file would be), does that make sense? If so, what determines this order? Would it be some metadata file contained within the MOBI that defines the book-order of the pages, which is itself the order of the files that the offset offsets into?

Put another way, if the offset is 700 but the "first file" (again, I'd need to know what the first file even is) only goes up to 500, then I'd need to know what second file to offset 200 into, correct? What defines that order?

Also, just to be sure, you're saying that a Kindle "Position" is just that "raw byte offset" right? So using the formula mentioned in this thread I could go from that raw byte offset to a Kindle "Location" and vice versa.

I suppose that if the location I arrive at is not exact, I can at least use it to determine what html file to search for the highlighted text within/around so as to drastically reduce the search space.

Thanks for the link to that thread! It definitely seems useful.

jhowell · 08-07-2017, 09:40 PM

Once you unpack it, MOBI contains the equivalent of only a single HTML file. It is much more primitive than EPUB. The content isn't pure HTML. It has special markup for image references for example.

See the wiki for more details.

blaenk · 08-07-2017, 10:12 PM

Ah that makes sense then, thanks again!

DiapDealer · 08-09-2017, 08:49 PM

KindleUnpack has an option to output raw mobi markup. My guess is that this is the uncompressed markup the locations formula would be based on. Even the monolithic html file that KindleUnpack produces for MOBI books will have been tweaked a bit.

12-01-2011, 03:42 PM	#1
rodrigoccurvo Member Posts: 14 Karma: 10 Join Date: May 2011 Location: Campo Grande, MS, Brazil Device: Kindle 3	Correct Mobi location formula (maybe) Hi, everyone. For a while I've been trying to understand Kindle's location and the consensus seems to be that for Mobi 1 location = 128 bytes (it's even on https://wiki.mobileread.com/wiki/Page_numbers). But today I was taking a look at the Amazon Cloud Reader (reader.amazon.com) source code and I found the following: Code: locationFromPosition: function (a) { return Math.floor(a * 2 / 300 + 1) } (The file is KindleReaderApp-min.js, but I won't post the whole link since it's weird and I don't know if it has session informations. I guess you can find it on your own.) The surrounding code is a bit larger, but in the end that seems to be the formula used for calculating the location for the Mobi format (there is another for topaz which is Math.floor((a * d + 100) / 100)). I've tried looking at the parameter "a" and for the text parts it seems to be characters, but I don't know if that means bytes for every case. I've tested it a little bit just to know if it's correct and it seems to be, but not enough for me to be sure. Also, I don't know if the relation holds for images and other things apart from characters. I don't know and couldn't find the original source for the 128 bytes information, so I'm guessing it's an approximation. But as I said, in my initial tests the above formula seems to work. What to you guys think? Does it make sense? Is the 128 bytes info an approximation or is it on the format specs? []'s Rodrigo

08-07-2017, 09:40 PM	#8
jhowell Grand Sorcerer Posts: 6,736 Karma: 86234863 Join Date: Nov 2011 Location: Charlottesville, VA Device: Kindles	Once you unpack it, MOBI contains the equivalent of only a single HTML file. It is much more primitive than EPUB. The content isn't pure HTML. It has special markup for image references for example. See the wiki for more details. Last edited by jhowell; 08-07-2017 at 09:55 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Patch: Use real ASIN instead of UUID in mobi files to show correct cover in KindleApp	siebert	Calibre	4	02-24-2012 09:13 AM
Does Calibre automatically converty PDF to .mobi to correct dimension ?	bbs7772004	Devices	1	10-26-2011 08:59 PM
Calibre and mobi format - creating a paige or location specific table of contents	coaver	Conversion	2	01-25-2011 06:22 AM
Mobi to Kindle with correct metadata?	rex0810	Calibre	3	09-25-2009 06:36 PM
Correct missing author info in a Mobi file?	nekokami	Kindle Formats	5	12-15-2008 11:26 AM

12-01-2011, 05:03 PM	#2
DiapDealer Grand Sorcerer Posts: 28,038 Karma: 199464182 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	I've always heard 128 bytes of source html, but since Amazon doesn't release detailed specs, who knows?

12-01-2011, 05:30 PM	#4
pdurrant The Grand Mouse 高貴的老鼠 Posts: 72,470 Karma: 309060442 Join Date: Jul 2007 Location: Norfolk, England Device: Kindle Voyage	I've updated that page in the wiki to reflect this new finding.

08-07-2017, 06:25 PM	#5
blaenk Connoisseur Posts: 53 Karma: 118948 Join Date: Jul 2014 Device: Kindle PaperWhite 3	Sorry for the necropost if that's not allowed, but I've been investigating this and just now found this thread. Given the last post here I decided to search around the mobileread wiki, is this the page: https://wiki.mobileread.com/wiki/Page_numbers ? If so my question is that it takes as input presumably a byte-offset from the "start" of the book, where the book is the actual .mobi file? So if I have a location (taken from the My Clippings.txt file) I can convert that into a byte-offset position into the mobi file by doing something like (location - 1) * 150? I guess the tricky part would be then mapping that byte-offset into the unpacked contents of the mobi. Basically I'm trying to see if it would be possible to take highlight locations from the 'My Clippings.txt' file on the Kindle device and map them to the actual source file in an unpacked book, but it would probably be easier to just brute-force search for the actual highlighted contents than to try to do this location mapping, especially since I've only seen mention of mobi so I'm not sure if this works the same with azw3.

08-07-2017, 10:12 PM	#9
blaenk Connoisseur Posts: 53 Karma: 118948 Join Date: Jul 2014 Device: Kindle PaperWhite 3	Ah that makes sense then, thanks again!

08-09-2017, 08:49 PM	#10
DiapDealer Grand Sorcerer Posts: 28,038 Karma: 199464182 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	KindleUnpack has an option to output raw mobi markup. My guess is that this is the uncompressed markup the locations formula would be based on. Even the monolithic html file that KindleUnpack produces for MOBI books will have been tweaked a bit.

Advert

Advert