Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 01-06-2016, 04:24 PM   #811
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
Because who knows how many OTHER edge cases like that exist.
PeterT is offline   Reply With Quote
Old 01-06-2016, 04:41 PM   #812
rpgmaker
Connoisseur
rpgmaker began at the beginning.
 
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
Quote:
Originally Posted by PeterT View Post
Because who knows how many OTHER edge cases like that exist.
I know but it's good to encourage people to report them. Even if they can't be fixed (or dev decides they aren't worth a fix) it's good to be aware.
rpgmaker is offline   Reply With Quote
Advert
Old 01-06-2016, 05:33 PM   #813
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
You probably have to open this as a calibre bug; the plugin uses
Code:
from calibre.utils.wordcount import get_wordcount_obj

    book_text = _read_epub_contents(iterator, strip_html=True)
    wordcount = get_wordcount_obj(book_text)
to get the word count
PeterT is offline   Reply With Quote
Old 01-06-2016, 05:37 PM   #814
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by PeterT View Post
Is it that critical that the word count be precise?
In the book I just read, it makes a difference of least 112 words. Why bother to count the words if it's possibly inaccurate?
JSWolf is offline   Reply With Quote
Old 01-06-2016, 05:42 PM   #815
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by rpgmaker View Post
I know but it's good to encourage people to report them. Even if they can't be fixed (or dev decides they aren't worth a fix) it's good to be aware.
Are there any ISO endorsed standards for counting words?

If not, then it's not a bug, its merely a difference of opinion on how composite words ought to be counted.

There are many users (possibly millions) who have used the PI against their existing libraries (possibly billions of books), most (probably vast majority) of whom are only interested in consistent relative counts.

Any change would have to be an 'opt-in ' choice and it would open up discussion to the vagaries of different word separators and different languages - e.g. I would not want a non-breaking hyphen to count as a word separator.

BR
BetterRed is offline   Reply With Quote
Advert
Old 01-06-2016, 06:20 PM   #816
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
The only use I've ever had is to get a rough estimate of the size of a book; I see no use in knowing the precise number of words. Besides, do you count the words in the index, or the table of contents?
PeterT is offline   Reply With Quote
Old 01-06-2016, 06:28 PM   #817
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
[QUOTE=PeterT;3236052]The only use I've ever had is to get a rough estimate of the size of a book; I see no use in knowing the precise number of words. Besides, do you count the words in the index, or the table of contents?/QUOTE]

Well, I've reported the bug in the Calibre forum. But, as for ToC, I usually delete it as the NCX does just fine.

Anyway, thank you for finding the source of the bug. If it gets fixed, all well and good. If not, not that big a deal.
JSWolf is offline   Reply With Quote
Old 01-06-2016, 06:44 PM   #818
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Just FYI for anyone interested ...

Based on a very quick look, the current algorithm for calculating Wordcount appears to be taken from main calibre code (calibre.utils.wordcount, author="Ryan Ginstrom") rather than being created specifically for the 'Count Pages' plugin.

I imagine getting it changed would involve either:
  1. Convincing Kovid/Ryan Ginstrom to change it calibre-wide (good luck with that ) or
  2. Replacing a couple of lines in your own personal copy of 'Count Pages' statistics.py

    From:
    Code:
        wordcount = get_wordcount_obj(book_text)
        return wordcount.words
    To:
    Code:
        words = _my_wordcount_algorithm(book_text)
        return words
    
    def _my_wordcount_algorithm(text):
        wcount = <Your personal wordcount algorithm here>
        return wcount

ETA: Damn - not quick enough!

Last edited by jackie_w; 01-06-2016 at 06:47 PM.
jackie_w is offline   Reply With Quote
Old 01-06-2016, 07:02 PM   #819
rpgmaker
Connoisseur
rpgmaker began at the beginning.
 
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
Quote:
Originally Posted by BetterRed View Post
Are there any ISO endorsed standards for counting words?

If not, then it's not a bug, its merely a difference of opinion on how composite words ought to be counted.

...

... I would not want a non-breaking hyphen to count as a word separator.
You're right and you bring up a very interesting point. As far as I'm concerned this settles it, let alone the fact that the word count seems to come from calibre itself. This plugin has never been "accurate" in the traditional sense of the word anyways... it's value lies in the rough idea it gives you about how long a book is, specially when compared to others in your library and its usefulness in that aspect hasn't change one bit because of this.
rpgmaker is offline   Reply With Quote
Old 01-06-2016, 09:32 PM   #820
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,451
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Whose rules?

IIRC in Jr High Typing (Royal Manuals), 7 (error free) characters was a word for your WPM count.
This Duck only needed 1 digit most of the time
theducks is offline   Reply With Quote
Old 01-07-2016, 10:24 AM   #821
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Quote:
Originally Posted by JSWolf View Post
I have found some bugs in count pages that can cause it to count two words as one word.

except…if
except—if
except–if

All of those cases cause two words to be counted as one. I do hope someone is available to fix this. Thanks.
Is it so simple?

What will you do with words like 3D printer in German language called 3-D-Drucker as one word? Count it as 3 words is definitely wrong for that language and this kind of exceptions happen a lot more. Guess, in other languages too. To cover this you will need a dictionary for each language and I am quite sure, you will not cover all exceptions as e.g. in German language there is no rule to prevent constructions with a "-" between words. This is often used for a better reading of long word constructions.

Last edited by Divingduck; 01-07-2016 at 10:29 AM.
Divingduck is offline   Reply With Quote
Old 01-07-2016, 10:10 PM   #822
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by Divingduck View Post
Is it so simple?

What will you do with words like 3D printer in German language called 3-D-Drucker as one word? Count it as 3 words is definitely wrong for that language and this kind of exceptions happen a lot more. Guess, in other languages too. To cover this you will need a dictionary for each language and I am quite sure, you will not cover all exceptions as e.g. in German language there is no rule to prevent constructions with a "-" between words. This is often used for a better reading of long word constructions.
It isn't obvious from JSWolf's post, but the last one is actually an "en-dash", not a hyphen. I have no idea whether that should be considered a word delimiter or word joiner.

The method Kovid has mentioned for word counting accepts a locale. That should sort the differences out between the languages.
davidfor is offline   Reply With Quote
Old 01-07-2016, 10:54 PM   #823
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Beta - Change method used for word count

I wasn't going to do this, but, Kovid added an extra method and the count wasn't taking into account the language of the book, so...

Attached is a beta version of the plugin that uses the ICU Word Iterator to do the count. This does the word count using the language set in the book. If this is not set, it will default to English. For older versions of calibre that do not include the appropriate methods for the ICU Word Iterator, the word count will use the older method.

But, for this beta, the count is actually being done twice. The old method is always done and printed to the job log. Then it attempts to use the new method. I have done this to get an idea of how much difference there is for the two methods. For English, the difference is small enough that it doesn't bother me. But, for other languages, it might be more. I don't have enough non-English test books to check. I would be interested to know if there is a significant difference for any language.

To view the two counts, you need to open the job list (click "Jobs" in the bottom right of the calibre window), select the count pages job and press the "Show job details".

If anyone finds a problem, please report it. If none are found, and there are no objections to changing the word count algorithm, I will arrange to release this sometime next week.
Attached Files
File Type: zip Count Pages-beta.zip (240.2 KB, 153 views)

Last edited by davidfor; 01-08-2016 at 01:10 AM. Reason: Fixed the file name. Same contents, but correctly named
davidfor is offline   Reply With Quote
Old 01-07-2016, 10:57 PM   #824
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,564
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@davidfor: I suggest defaulting to the calibre interface language rather than English when no language is available (from calibre.utils.localization import get_lang)
kovidgoyal is offline   Reply With Quote
Old 01-07-2016, 11:33 PM   #825
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Would it be possible to retain the 'old' method, at least as a configurable option.

Added: I appreciate I don't have to install the new version, but I prefer to keep all my software up to date.

Also what will it do with non-breaking hyphens in English, maybe someone can point me at the relevant ICU doco for that level of detail - looked for it, but failed to find it, but ...

BR

Last edited by BetterRed; 01-07-2016 at 11:45 PM.
BetterRed is offline   Reply With Quote
Reply

Tags
count, count pages, page count, pages, plugin


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1214 Yesterday 12:05 PM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 09:39 AM
[GUI Plugin] Quick Preferences kiwidude Plugins 62 03-17-2024 12:47 AM
[GUI Plugin] Kindle Collections (old) meme Plugins 2070 08-11-2014 01:02 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 01:27 PM


All times are GMT -4. The time now is 10:29 AM.


MobileRead.com is a privately owned, operated and funded community.