01-07-2016, 11:45 PM | #826 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
|
01-08-2016, 12:32 AM | #827 |
Grand Sorcerer
Posts: 12,763
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
@davidfor: Is this a beta of Kobo Utilities or of the Word Count plugin?
|
Advert | |
|
01-08-2016, 01:12 AM | #828 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
|
01-08-2016, 02:40 AM | #829 | ||
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
Code:
<p>this-that</p> - hyphen-minus - both methods had one word - en dash - old 1 word, ICU 2 words - em dash - old 1 word, ICU 2 words - soft hyphen (u00AD) - both methods had one word - hyphen (u2010) - both methods had one word - non-breaking hyphen (u2011) - old 1 word, ICU 2 words - No character - both methods had one word - Space - both methods had two words That makes it look like both methods have a problem. The em dash should be a word delimiter and the non-breaking hyphen probably shouldn't (I assume it is a hyphen designed to make sure a hyphenated word is not split over two lines). I think the en dash should be word delimiter, but I could understand if it wasn't. From that quick test, the issue is the frequency of the characters. I have never knowingly seen a non-breaking hyphen, but I have seen plenty of en and em dashes. That makes the ICU method more accurate for most of my books. After a bit more reading... From looking at the code (in C), and the on-line documentation for the one of the included header files, unicode/ubrk.h, the rules seem to come from http://www.unicode.org/reports/tr29/#Word_Boundaries and http://www.unicode.org/reports/tr14/. I had a quick look at them, and my head is hurting. But, it does seem that the non-breaking hyphen should be treated as a word delimiter. That suggests the ICU method is correct in all my tests. |
||
01-08-2016, 04:42 AM | #830 |
null operator (he/him)
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@davidfor - non breaking hyphen issue was a point of interest. They make sense in words like 'e-mail', I've seen that broken across two lines on web pages, they also make sense in identifiers such as part numbers, telephone numbers, credit card numbers etc - especially if a user wants to select for lookup etc. But can they be rendered in common-or-garden fonts on common-or-garden e-readers - maybe not?
Re the other issue of providing an option for the 'legacy' algo: I use a change in word count as an indicator - that something has changed. Sometimes it's expected, and other times not. If I'm expecting it to change and it doesn't then it usually indicates I forgot to do a change (or save the change). If I'm expecting to stay the same and it doesn't then it usually means I made some other error. I don't want to recount my entire library, it's nudging 100,000. So absent a legacy algo option I'll rejig current Count Pages into a private Old Count Pages - I care more about consistency than I do so-called accuracy BR |
Advert | |
|
01-08-2016, 07:37 AM | #831 |
US Navy, Retired
Posts: 9,867
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Consistency is what I look for too, but a few words difference from one method to the other won't make no never-mind to me. Life is too short to worry about this difference. If the new method is more accurate and scratches an itch that @davidfor, as a developer, is willing to put the time in to scratch then I'll just say Thank You Davidfor and continue on as if nothing has changed.
|
01-08-2016, 08:35 AM | #832 | ||
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
And are all the books likely to change? Or just recently added ones? Or are you working on a batch of books at a time? Or maybe the updates coming into your library without notice? |
||
01-08-2016, 08:37 AM | #833 | |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Actually, it's all JSWolf's fault. If he hadn't actually counted the words in his book, none of this would have started. I think next time he does something like this, I'll pick a random Discworld book and suggest that's the best place to start. |
|
01-08-2016, 09:13 AM | #834 |
Connoisseur
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
|
The difference seems to be marginal... there should be no problem with keeping old counts and just count the pages on new books using the updated plugin (in case this change hits the official release).
|
01-08-2016, 03:31 PM | #835 |
Grand Sorcerer
Posts: 6,111
Karma: 34000001
Join Date: Mar 2008
Device: KPW1, KA1
|
Then on 100.000 words, your word count is possibly off by around 0.1%. If this problem is not fixed very easily and quickly (like, within 10 minutes), it's not worth it to spend time on it.
|
01-08-2016, 03:39 PM | #836 |
Connoisseur
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
|
I think if the author can make it work in a way that is more accurate and at the same time takes into account @BetterRed's earlier observation I'm all for it being updated. Even if it means only 0.1% more "accuracy".
|
01-08-2016, 05:32 PM | #837 | |
null operator (he/him)
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
I do the final stage of 'new book' processing 'a book at a time'. So I make a mental note of the word count I did as part of the initial stage - which is a bulk operation. If it doesn't change when I think it should've done, then its invariably my error - happened to me yesterday, forgot to save the final edits I'd done in Sigil. Post 'final' changes are most often done to correct transcription errors (often times they are made by one of my colleague, all of whom are OS, they send them via email with a note as to the changes they made), so again I run CP one at a time and again I (or the colleague) sometimes make mistakes. I guess I could automate it, but then I'd have to tell the automaton of my expectations, anyway I prefer to give my brain a bit of exercise, lest it atrophies even faster than it already is The change in word counts I anticipate are almost invariably small (e.g. up or down a bit) or no change at all. So within the differences between the old and new algo's. BR Last edited by BetterRed; 01-08-2016 at 05:35 PM. |
|
01-08-2016, 05:49 PM | #838 |
Resident Curmudgeon
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
01-08-2016, 05:58 PM | #839 |
Resident Curmudgeon
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The only reason I knew of the errors was because I fixed a book I finished reading and did a count pages on it and got less words than I had before. It has a lot of ellipses with a space after and not before so I edited them to have no space after. Then I did a count pages and got less words.
|
01-08-2016, 06:06 PM | #840 | |
Resident Curmudgeon
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
Tags |
count, count pages, page count, pages, plugin |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1214 | Yesterday 12:05 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
[GUI Plugin] Quick Preferences | kiwidude | Plugins | 62 | 03-17-2024 12:47 AM |
[GUI Plugin] Kindle Collections (old) | meme | Plugins | 2070 | 08-11-2014 01:02 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |