Well I looked at the code and there was an O(n^2) algorithm in there, that is now gone. However, that algorithm was always there under linux, so I am surprised that linux performance is fine. After my changes, counting 450K words from the gutenberg count of monte cristo text file takes 0.1 seconds with python 3 and 0.2 seconds with 4.8 on windows and similar on linux. Tested with
Code:
calibre-debug -c "from calibre.spell.break_iterator import *; import sys; raw = open(sys.argv[-1], 'rb').read().decode('utf-8'); from calibre.utils.monotonic import *; st = monotonic(); print(count_words(raw)); print(monotonic() - st);" cmc.txt