View Single Post
Old 01-06-2020, 07:31 AM   #92
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,017
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well I looked at the code and there was an O(n^2) algorithm in there, that is now gone. However, that algorithm was always there under linux, so I am surprised that linux performance is fine. After my changes, counting 450K words from the gutenberg count of monte cristo text file takes 0.1 seconds with python 3 and 0.2 seconds with 4.8 on windows and similar on linux. Tested with

Code:
 calibre-debug -c "from calibre.spell.break_iterator import *; import sys; raw = open(sys.argv[-1], 'rb').read().decode('utf-8'); from calibre.utils.monotonic import *; st = monotonic(); print(count_words(raw)); print(monotonic() - st);" cmc.txt
kovidgoyal is offline   Reply With Quote