Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 01-07-2016, 11:45 PM   #826
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by kovidgoyal View Post
@davidfor: I suggest defaulting to the calibre interface language rather than English when no language is available (from calibre.utils.localization import get_lang)
Yes, that makes a lot more sense.

I have updated the beta to do this.
davidfor is offline   Reply With Quote
Old 01-08-2016, 12:32 AM   #827
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,763
Karma: 75000002
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
@davidfor: Is this a beta of Kobo Utilities or of the Word Count plugin?
PeterT is offline   Reply With Quote
Advert
Old 01-08-2016, 01:12 AM   #828
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by PeterT View Post
@davidfor: Is this a beta of Kobo Utilities or of the Word Count plugin?
Somehow I had the wrong thing in the clipboard when I added "beta" to the filename. Was the correct file, just the wrong name.
davidfor is offline   Reply With Quote
Old 01-08-2016, 02:40 AM   #829
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by BetterRed View Post
Would it be possible to retain the 'old' method, at least as a configurable option.
I really, really don't want to. Unless we can show that there is something drastically wrong with the ICU method, I just don't see the point in maintaining both. And if there is a reason not to use the ICU method, then I not changing anything is the way to go.
Quote:
Added: I appreciate I don't have to install the new version, but I prefer to keep all my software up to date.

Also what will it do with non-breaking hyphens in English, maybe someone can point me at the relevant ICU doco for that level of detail - looked for it, but failed to find it, but ...
I just tried a very small book with the following in it:

Code:
<p>this-that</p>
I ran count pages on it several times replacing the hyphen with other possible characters. The results, with the name from the calibre editor, are:
- hyphen-minus - both methods had one word
- en dash - old 1 word, ICU 2 words
- em dash - old 1 word, ICU 2 words
- soft hyphen (u00AD) - both methods had one word
- hyphen (u2010) - both methods had one word
- non-breaking hyphen (u2011) - old 1 word, ICU 2 words
- No character - both methods had one word
- Space - both methods had two words

That makes it look like both methods have a problem. The em dash should be a word delimiter and the non-breaking hyphen probably shouldn't (I assume it is a hyphen designed to make sure a hyphenated word is not split over two lines). I think the en dash should be word delimiter, but I could understand if it wasn't.

From that quick test, the issue is the frequency of the characters. I have never knowingly seen a non-breaking hyphen, but I have seen plenty of en and em dashes. That makes the ICU method more accurate for most of my books.

After a bit more reading...

From looking at the code (in C), and the on-line documentation for the one of the included header files, unicode/ubrk.h, the rules seem to come from http://www.unicode.org/reports/tr29/#Word_Boundaries and http://www.unicode.org/reports/tr14/. I had a quick look at them, and my head is hurting. But, it does seem that the non-breaking hyphen should be treated as a word delimiter. That suggests the ICU method is correct in all my tests.
davidfor is offline   Reply With Quote
Old 01-08-2016, 04:42 AM   #830
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@davidfor - non breaking hyphen issue was a point of interest. They make sense in words like 'e-mail', I've seen that broken across two lines on web pages, they also make sense in identifiers such as part numbers, telephone numbers, credit card numbers etc - especially if a user wants to select for lookup etc. But can they be rendered in common-or-garden fonts on common-or-garden e-readers - maybe not?

Re the other issue of providing an option for the 'legacy' algo: I use a change in word count as an indicator - that something has changed. Sometimes it's expected, and other times not. If I'm expecting it to change and it doesn't then it usually indicates I forgot to do a change (or save the change). If I'm expecting to stay the same and it doesn't then it usually means I made some other error.

I don't want to recount my entire library, it's nudging 100,000. So absent a legacy algo option I'll rejig current Count Pages into a private Old Count Pages - I care more about consistency than I do so-called accuracy

BR
BetterRed is offline   Reply With Quote
Advert
Old 01-08-2016, 07:37 AM   #831
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,867
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by BetterRed View Post
I don't want to recount my entire library, it's nudging 100,000. So absent a legacy algo option I'll rejig current Count Pages into a private Old Count Pages - I care more about consistency than I do so-called accuracy
Consistency is what I look for too, but a few words difference from one method to the other won't make no never-mind to me. Life is too short to worry about this difference. If the new method is more accurate and scratches an itch that @davidfor, as a developer, is willing to put the time in to scratch then I'll just say Thank You Davidfor and continue on as if nothing has changed.
DoctorOhh is offline   Reply With Quote
Old 01-08-2016, 08:35 AM   #832
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by BetterRed View Post
@davidfor - non breaking hyphen issue was a point of interest. They make sense in words like 'e-mail', I've seen that broken across two lines on web pages, they also make sense in identifiers such as part numbers, telephone numbers, credit card numbers etc - especially if a user wants to select for lookup etc. But can they be rendered in common-or-garden fonts on common-or-garden e-readers - maybe not?
Yes, they sound like reasonable uses for non-breaking hyphens. And I have no idea if any ereader renders them. I did notice that the font I use in the calibre editor doesn't. I don't know if that's just that font, or a lot of fonts.
Quote:
Re the other issue of providing an option for the 'legacy' algo: I use a change in word count as an indicator - that something has changed. Sometimes it's expected, and other times not. If I'm expecting it to change and it doesn't then it usually indicates I forgot to do a change (or save the change). If I'm expecting to stay the same and it doesn't then it usually means I made some other error.

I don't want to recount my entire library, it's nudging 100,000. So absent a legacy algo option I'll rejig current Count Pages into a private Old Count Pages - I care more about consistency than I do so-called accuracy
You have me intrigued about how you do this. How do you know the word count has changed or not changed? Are you keeping a copy of the count somewhere to compare the new count with and updating that when finished? Or are you running something from the command line and comparing the new calculation against the old?

And are all the books likely to change? Or just recently added ones? Or are you working on a batch of books at a time? Or maybe the updates coming into your library without notice?
davidfor is offline   Reply With Quote
Old 01-08-2016, 08:37 AM   #833
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by DoctorOhh View Post
Consistency is what I look for too, but a few words difference from one method to the other won't make no never-mind to me. Life is too short to worry about this difference. If the new method is more accurate and scratches an itch that @davidfor, as a developer, is willing to put the time in to scratch then I'll just say Thank You Davidfor and continue on as if nothing has changed.
I didn't want to scratch it. I really, really didn't. But, I sat down at my laptop this morning, someone made another post, and I found myself fiddling with it. Then fixing it properly and adding the language and... Well, you can see where it has led.

Actually, it's all JSWolf's fault. If he hadn't actually counted the words in his book, none of this would have started. I think next time he does something like this, I'll pick a random Discworld book and suggest that's the best place to start.
davidfor is offline   Reply With Quote
Old 01-08-2016, 09:13 AM   #834
rpgmaker
Connoisseur
rpgmaker began at the beginning.
 
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
Quote:
Originally Posted by BetterRed View Post
I don't want to recount my entire library, it's nudging 100,000. So absent a legacy algo option I'll rejig current Count Pages into a private Old Count Pages - I care more about consistency than I do so-called accuracy
The difference seems to be marginal... there should be no problem with keeping old counts and just count the pages on new books using the updated plugin (in case this change hits the official release).
rpgmaker is offline   Reply With Quote
Old 01-08-2016, 03:31 PM   #835
Katsunami
Grand Sorcerer
Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.Katsunami ought to be getting tired of karma fortunes by now.
 
Katsunami's Avatar
 
Posts: 6,111
Karma: 34000001
Join Date: Mar 2008
Device: KPW1, KA1
Quote:
Originally Posted by JSWolf View Post
In the book I just read, it makes a difference of least 112 words. Why bother to count the words if it's possibly inaccurate?
Then on 100.000 words, your word count is possibly off by around 0.1%. If this problem is not fixed very easily and quickly (like, within 10 minutes), it's not worth it to spend time on it.
Katsunami is offline   Reply With Quote
Old 01-08-2016, 03:39 PM   #836
rpgmaker
Connoisseur
rpgmaker began at the beginning.
 
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
Quote:
Originally Posted by Katsunami View Post
Then on 100.000 words, your word count is possibly off by around 0.1%. If this problem is not fixed very easily and quickly (like, within 10 minutes), it's not worth it to spend time on it.
I think if the author can make it work in a way that is more accurate and at the same time takes into account @BetterRed's earlier observation I'm all for it being updated. Even if it means only 0.1% more "accuracy".
rpgmaker is offline   Reply With Quote
Old 01-08-2016, 05:32 PM   #837
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,006
Karma: 27620706
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by davidfor View Post
You have me intrigued about how you do this. How do you know the word count has changed or not changed? Are you keeping a copy of the count somewhere to compare the new count with and updating that when finished? Or are you running something from the command line and comparing the new calculation against the old?
I must emphasis, the comparison is done to detect my mistakes.

I do the final stage of 'new book' processing 'a book at a time'. So I make a mental note of the word count I did as part of the initial stage - which is a bulk operation. If it doesn't change when I think it should've done, then its invariably my error - happened to me yesterday, forgot to save the final edits I'd done in Sigil.

Post 'final' changes are most often done to correct transcription errors (often times they are made by one of my colleague, all of whom are OS, they send them via email with a note as to the changes they made), so again I run CP one at a time and again I (or the colleague) sometimes make mistakes.

I guess I could automate it, but then I'd have to tell the automaton of my expectations, anyway I prefer to give my brain a bit of exercise, lest it atrophies even faster than it already is

The change in word counts I anticipate are almost invariably small (e.g. up or down a bit) or no change at all. So within the differences between the old and new algo's.

BR

Last edited by BetterRed; 01-08-2016 at 05:35 PM.
BetterRed is offline   Reply With Quote
Old 01-08-2016, 05:49 PM   #838
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by davidfor View Post
It isn't obvious from JSWolf's post, but the last one is actually an "en-dash", not a hyphen. I have no idea whether that should be considered a word delimiter or word joiner.
In the UK, they use an en-dash where in the US it would be an em-dash.
JSWolf is offline   Reply With Quote
Old 01-08-2016, 05:58 PM   #839
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by davidfor View Post
Actually, it's all JSWolf's fault. If he hadn't actually counted the words in his book, none of this would have started. I think next time he does something like this, I'll pick a random Discworld book and suggest that's the best place to start.
The only reason I knew of the errors was because I fixed a book I finished reading and did a count pages on it and got less words than I had before. It has a lot of ellipses with a space after and not before so I edited them to have no space after. Then I did a count pages and got less words.
JSWolf is offline   Reply With Quote
Old 01-08-2016, 06:06 PM   #840
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,482
Karma: 136564766
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by davidfor View Post
- em dash - old 1 word, ICU 2 words

That makes it look like both methods have a problem. The em dash should be a word delimiter and the non-breaking hyphen probably shouldn't (I assume it is a hyphen designed to make sure a hyphenated word is not split over two lines). I think the en dash should be word delimiter, but I could understand if it wasn't.
For the em-dash, the ICU method is correct. It should be two words. An em-dash is similar to an ellipse.
JSWolf is offline   Reply With Quote
Reply

Tags
count, count pages, page count, pages, plugin


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1214 Yesterday 12:05 PM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 09:39 AM
[GUI Plugin] Quick Preferences kiwidude Plugins 62 03-17-2024 12:47 AM
[GUI Plugin] Kindle Collections (old) meme Plugins 2070 08-11-2014 01:02 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 01:27 PM


All times are GMT -4. The time now is 12:19 PM.


MobileRead.com is a privately owned, operated and funded community.