01-06-2016, 04:24 PM | #811 |
Grand Sorcerer
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Because who knows how many OTHER edge cases like that exist.
|
01-06-2016, 04:41 PM | #812 |
Connoisseur
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
|
|
Advert | |
|
01-06-2016, 05:33 PM | #813 |
Grand Sorcerer
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
You probably have to open this as a calibre bug; the plugin uses
Code:
from calibre.utils.wordcount import get_wordcount_obj book_text = _read_epub_contents(iterator, strip_html=True) wordcount = get_wordcount_obj(book_text) |
01-06-2016, 05:37 PM | #814 |
Resident Curmudgeon
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
01-06-2016, 05:42 PM | #815 | |
null operator (he/him)
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
If not, then it's not a bug, its merely a difference of opinion on how composite words ought to be counted. There are many users (possibly millions) who have used the PI against their existing libraries (possibly billions of books), most (probably vast majority) of whom are only interested in consistent relative counts. Any change would have to be an 'opt-in ' choice and it would open up discussion to the vagaries of different word separators and different languages - e.g. I would not want a non-breaking hyphen to count as a word separator. BR |
|
Advert | |
|
01-06-2016, 06:20 PM | #816 |
Grand Sorcerer
Posts: 12,762
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
The only use I've ever had is to get a rough estimate of the size of a book; I see no use in knowing the precise number of words. Besides, do you count the words in the index, or the table of contents?
|
01-06-2016, 06:28 PM | #817 |
Resident Curmudgeon
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
[QUOTE=PeterT;3236052]The only use I've ever had is to get a rough estimate of the size of a book; I see no use in knowing the precise number of words. Besides, do you count the words in the index, or the table of contents?/QUOTE]
Well, I've reported the bug in the Calibre forum. But, as for ToC, I usually delete it as the NCX does just fine. Anyway, thank you for finding the source of the bug. If it gets fixed, all well and good. If not, not that big a deal. |
01-06-2016, 06:44 PM | #818 |
Grand Sorcerer
Posts: 6,224
Karma: 16536676
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Just FYI for anyone interested ...
Based on a very quick look, the current algorithm for calculating Wordcount appears to be taken from main calibre code (calibre.utils.wordcount, author="Ryan Ginstrom") rather than being created specifically for the 'Count Pages' plugin. I imagine getting it changed would involve either:
ETA: Damn - not quick enough! Last edited by jackie_w; 01-06-2016 at 06:47 PM. |
01-06-2016, 07:02 PM | #819 |
Connoisseur
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
|
You're right and you bring up a very interesting point. As far as I'm concerned this settles it, let alone the fact that the word count seems to come from calibre itself. This plugin has never been "accurate" in the traditional sense of the word anyways... it's value lies in the rough idea it gives you about how long a book is, specially when compared to others in your library and its usefulness in that aspect hasn't change one bit because of this.
|
01-06-2016, 09:32 PM | #820 |
Well trained by Cats
Posts: 30,451
Karma: 58055868
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Whose rules?
IIRC in Jr High Typing (Royal Manuals), 7 (error free) characters was a word for your WPM count. This Duck only needed 1 digit most of the time |
01-07-2016, 10:24 AM | #821 | |
Wizard
Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Quote:
What will you do with words like 3D printer in German language called 3-D-Drucker as one word? Count it as 3 words is definitely wrong for that language and this kind of exceptions happen a lot more. Guess, in other languages too. To cover this you will need a dictionary for each language and I am quite sure, you will not cover all exceptions as e.g. in German language there is no rule to prevent constructions with a "-" between words. This is often used for a better reading of long word constructions. Last edited by Divingduck; 01-07-2016 at 10:29 AM. |
|
01-07-2016, 10:10 PM | #822 | |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
The method Kovid has mentioned for word counting accepts a locale. That should sort the differences out between the languages. |
|
01-07-2016, 10:54 PM | #823 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Beta - Change method used for word count
I wasn't going to do this, but, Kovid added an extra method and the count wasn't taking into account the language of the book, so...
Attached is a beta version of the plugin that uses the ICU Word Iterator to do the count. This does the word count using the language set in the book. If this is not set, it will default to English. For older versions of calibre that do not include the appropriate methods for the ICU Word Iterator, the word count will use the older method. But, for this beta, the count is actually being done twice. The old method is always done and printed to the job log. Then it attempts to use the new method. I have done this to get an idea of how much difference there is for the two methods. For English, the difference is small enough that it doesn't bother me. But, for other languages, it might be more. I don't have enough non-English test books to check. I would be interested to know if there is a significant difference for any language. To view the two counts, you need to open the job list (click "Jobs" in the bottom right of the calibre window), select the count pages job and press the "Show job details". If anyone finds a problem, please report it. If none are found, and there are no objections to changing the word count algorithm, I will arrange to release this sometime next week. Last edited by davidfor; 01-08-2016 at 01:10 AM. Reason: Fixed the file name. Same contents, but correctly named |
01-07-2016, 10:57 PM | #824 |
creator of calibre
Posts: 44,564
Karma: 24495948
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@davidfor: I suggest defaulting to the calibre interface language rather than English when no language is available (from calibre.utils.localization import get_lang)
|
01-07-2016, 11:33 PM | #825 |
null operator (he/him)
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Would it be possible to retain the 'old' method, at least as a configurable option.
Added: I appreciate I don't have to install the new version, but I prefer to keep all my software up to date. Also what will it do with non-breaking hyphens in English, maybe someone can point me at the relevant ICU doco for that level of detail - looked for it, but failed to find it, but ... BR Last edited by BetterRed; 01-07-2016 at 11:45 PM. |
Tags |
count, count pages, page count, pages, plugin |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1214 | Yesterday 12:05 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
[GUI Plugin] Quick Preferences | kiwidude | Plugins | 62 | 03-17-2024 12:47 AM |
[GUI Plugin] Kindle Collections (old) | meme | Plugins | 2070 | 08-11-2014 01:02 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |