09-15-2014, 09:48 PM | #1 |
Addict
Posts: 201
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
|
How does the calibre viewer calculate page number and total pages?
I'm working on a port of davidfor's Kobo Utilities to Sony, and trying to find a reasonable way to find my position in a book that I'm currently reading. Sony doesn't make it easy. If you downloaded a book from Sony's store, or since they sold out, from Kobo, they maintain a table that gives amongst other things "percent read", but if your book is sideloaded, Sony appears to calculate the number of pages and your current page number on the fly, and it's never saved in its database (and of course they don't tell US how they do it).
So, I'm trying to figure out how the calibre viewer calculates these numbers, and can't find the code anywhere. |
09-16-2014, 12:00 AM | #2 |
creator of calibre
Posts: 44,303
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
iterator/book.py
|
Advert | |
|
09-16-2014, 12:35 AM | #3 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
For epubs on the Sony devices, the number of pages will be calculated by the Adobe RMSDK. And the current page will be based on that. The description of the method is in the Wiki, but have a look the Count Pages for an implementation.
To calculate a percent read, what Kovid pointed to should work. You will also need the current position from the database on the Sony. From memory, this is stored in an Adobe specific way. I assume it comes from the RMSDK as the Kobo's use it for epubs as well. The calibre viewer uses a different position method (the same as for epub3?). I don't know if there is already a way to translate between them, but it shouldn't be to hard*. From memory, iterator/book.py has to unpack the book to work. That means calculating the percent read could take some time. For one book, it shouldn't be to bad, but if you are doing it for all the books on the device, it might take a while. I suppose that should only happen once when the store positions is first run. * Imagine me laughing maniacally while I typed that. |
09-16-2014, 09:10 AM | #4 |
Addict
Posts: 201
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
|
As far as I can tell, the only thing that Sony stores for position of sideloaded books is the bookmark. Which is like an EPUB3 CFI, but not identical (nor is calibre's), but is easily (so much for maniacal laughter!) translated to the calibre format (they're closer to each other than to EPUB3). FWIW, EPUB3 counts nodes (text nodes + tags) while calibre/Sony seem to count only tags, with the significant difference that Sony's CFIs don't count the <HEAD> tag.
So, in any case, it's going to have to open the book to calculate the position. Thanks for the answers. Now, off to try some more stuff! |
09-16-2014, 09:55 AM | #5 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Hmm, you're right, it is easy. It's been a while since I compared the two methods. And not counting the head tag has always bugged me when I've looked at this.
With that, it would be easy to put the reading position or bookmarks into the epub for the viewer. |
Advert | |
|
09-16-2014, 11:31 AM | #6 |
Addict
Posts: 201
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
|
That's what I was thinking.
|
09-16-2014, 10:06 PM | #7 |
Addict
Posts: 201
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
|
iterator/book.py calculates the total number of pages. I'm still not seeing anything that translates a bookmark into a current page number.
|
09-16-2014, 11:21 PM | #8 |
creator of calibre
Posts: 44,303
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is nothing that translates a bookmark into a page number. A page number is simply defined as
(number of pages of current html file * frac of file scrolled)/(total number of pages of all current html files) If you are are asking how the viewer scrolls to a bookmark, look at cfi.coffee And note that EPUB 3 CFI does not count text nodes. It makes no sense to count text nodes, since: 1) Text nodes can be normalized by the renderer 2) Offsets as numbers of characters in the terminal tag are recorded in the CFI in any case, making counting text nodes totally useless. What the EPUB CFI spec does is assign odd numbered indices to represent the text between tags regardless of how many actual text nodes there are. So tags are always even numbered. |
09-23-2014, 03:36 PM | #9 | ||
Addict
Posts: 201
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
|
Quote:
I guess it's fortunate that I lucked into a poorly formatted page on my first test. The calibre viewer and the Sony bookmark had similar pointers into this structure (.../2[heading_id_2]/4@4.9:0 and .../2/4:1, respectively) Code:
<h1 class="part" id="heading_id_2"> <a id="page10"/> <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/> </h1> Code:
<h1 class="part" id="heading_id_2"> <a id="page10"> <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/> </a> </h1> Quote:
|
||
09-24-2014, 08:10 AM | #10 |
creator of calibre
Posts: 44,303
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You should not use BeautifulSoup to parse. The parsing strategy to follow would be:
1) Try to parse as XML, implementing various simple corrections so that only slightly invalid documents still parse. 2) If (1) fails, parse as HTML 5 3) If (2) fails parse as HTML 4 and/or use BeautifulSoup See parse_utils.py in the calibre source code. Of course, the correct solution is to use the exact parsing algorithm used by the software that generated the CFI, since that is no practical, IMO the above cascade will likely give yo the best results, with perhaps a few modifications to handle common cases. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Aura HD Total number of book page | n3xtITA | Kobo Reader | 26 | 12-23-2013 06:58 AM |
Total number of pages | xaim | Marvin | 5 | 11-17-2013 09:59 AM |
Show Total Number of Books in Calibre Library | Canadian reader | Library Management | 8 | 08-29-2013 11:29 PM |
Does Kobo display total number of pages? | foghat | Kobo Reader | 24 | 06-12-2010 01:10 AM |
How are the page numbers/number of pages defined? | kennyc | ePub | 8 | 09-27-2009 11:23 AM |