View Single Post
Old 06-27-2015, 02:26 PM   #40
GeoffR
Wizard
GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.GeoffR ought to be getting tired of karma fortunes by now.
 
GeoffR's Avatar
 
Posts: 3,821
Karma: 19162882
Join Date: Nov 2012
Location: Te Riu-a-Māui
Device: Kobo Glo
Patched Dutch hyphenation dictionary

The Dutch hyphenation dictionary hyph_nl.dic that comes with the firmware contains two faults:

1. Although marked as UTF-8, hyph_nl.dic is actually encoded as ISO8859-1. I am not sure if this creates an actual problem in practice, but my guess is it could cause problems for words which contain non-ascii characters. The hyphenation dictionaries for all other languages are correctly encoded as UTF-8.

2. The LEFTHYPHENMIN and RIGHTHYPHENMIN settings are missing from hyph_nl.dic. This probably means that these settings default to 2, which results in extremely aggressive hyphenation, and is more likely to expose any faults in the dictionary or hyphenation algorithms. Most other languages have them set to 5.

AFAIK these faults have been present in the hyph_nl.dic in all firmware versions, but with the changes to the KEPUB hyphenation in firmware 3.16.0 they may have become more noticable.

I've attached a copy of a patched hyph_nl.dic which fixes these two problems. Because the hyph_nl.dic that comes with the Kobo firmware is identical (apart from being incorrectly marked as UTF-8) to the one available from OpenTaal.org I think it is okay to distribute a modified version from OpenTaal.org here. I have included the licence in Dutch and English. The only modifications I have made to hyph_nl.dic are to encode it as UTF-8 using iconv, and to add LEFTHYPHENMIN 5 and RIGHTHYPHENMIN 5 lines.

All you need to do to replace the hyph_nl.dic on your device with the patched one is to extract the attached ZIP file and copy the KoboRoot.tgz it contains into the.kobo directiory of your device and safely eject. It is safe to use this file on older firmware versions too.

Edit: Note that updating the firmware replaces this dictionary with the original faulty one, so if you want to continue using this dictionary then you'll need to re-install it after each firmware update.
Attached Files
File Type: zip hyph_nl_utf8.zip (57.3 KB, 389 views)

Last edited by GeoffR; 08-26-2015 at 09:43 PM. Reason: ... need to re-install it after each firmware update.
GeoffR is offline   Reply With Quote