01-31-2012, 04:08 AM | #1 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
A script to convert XML/stardict dicts to Odyssey format
Hi guys,
As I am working on the script to convert a stardict dictionary to the Cybook Odyssey format, I would like to test the conversion of several dictionaries. To get two doves with a stone, please PM me a pointer to a freeware/GPL'ed/etc. (preferably stardict) dictionary you would like to have on your Cybook Odyssey. Once the conversion goes well, I will give the converted dictionary to you. You can also leave a reply here saying "EN -> IT requested" and the like. Thanks. Last edited by Dr. Drib; 02-19-2012 at 06:16 PM. |
01-31-2012, 12:48 PM | #2 |
Wizard
Posts: 4,756
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
EN -> EN requested. And you know I want something that is more modern than the 1913 webster - maybe something with computer words and locations of the world, military acronyms ....
But then it's going to be hard to find that in freeware or otherwise freely distributable |
Advert | |
|
01-31-2012, 03:07 PM | #3 | |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Quote:
|
|
02-02-2012, 04:01 AM | #4 |
a pthread?? where? where?
Posts: 1,763
Karma: 30462
Join Date: Mar 2009
Location: Somewhere in EU
Device: Newton MessagePad 2100, and only this
|
Wictionary is known to be a beast to parse...
|
02-02-2012, 05:27 AM | #5 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
I worked on the Italian version (which is way worse than the English one, because formatting rules are not really enforced and it lacks many words definitions) and I think it can be done with decent results.
I cannot directly use my code for the English one because format tags are different, etc. But I can spend some time on it in a (not-so-distant) future. |
Advert | |
|
02-06-2012, 04:28 PM | #6 |
Member
Posts: 10
Karma: 10
Join Date: Oct 2011
Device: PRS650
|
Hi,
I've try this for the english to french one. Have you succeeded in modifying the database (the idx file) ? It is a sqlite database but I encountered a problem because in the F_Word field in the T_DictIndex table is made with a ICUNoCase option that I don't understand. The program doesn't accept to modify this field. The others one are (F_Offset, F_Size and F_NumChunk). Thank you. |
02-06-2012, 05:07 PM | #7 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Unfortunately I have had very little time to work on cleaning up the conversion script in the past days. Hopefully I will publish a first version of the script by the end of this week.
Pascal: IcuNoCase is a SQLite collation. You should first define a collation function myfunc, and then install it using Code:
create_collation("IcuNoCase", myfunc) |
02-07-2012, 01:21 AM | #8 |
Member
Posts: 10
Karma: 10
Join Date: Oct 2011
Device: PRS650
|
Hi,
Thank you for your answer. It's a very nice idea to publish this script. This collation function doesn't seem easy for me to write (I've tried to look on the sqlite site) et I'll wait until you publish your script. Best regards. |
02-07-2012, 07:59 AM | #9 |
Member
Posts: 10
Karma: 10
Join Date: Oct 2011
Device: PRS650
|
Hi,
Thank you very much. In fact, I've compiled the libSqliteIcu.so and from squlite3 made ".load libSqliteIcu.so" and "SELECT icu_load_collation('en_US', 'IcuNoCase');" and from then insert a csv file including the entries (F_Word, etc). I've tried 'en_US' because it is an english-french dictionary. It seems to work. Best regards. |
02-08-2012, 04:52 PM | #10 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Ok, nice to know.
I'm nearly done with my script to convert from stardict to the Cybook Odyssey format. I plan to release it on Friday or Saturday, stay tuned |
02-08-2012, 05:48 PM | #11 |
Wizard
Posts: 4,756
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
|
02-11-2012, 08:56 AM | #12 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
You can find the much (!?) awaited script to convert stardict dictionaries to the Odyssey format at:
NEW PAGE: this one I put there a description of the Odyssey format and the files required to convert your stardict dictionary to the Odyssey format, documenting the script usage and how you can write your own parser. Enjoy! EDIT: updated with new page URL. Last edited by AlPe; 02-13-2012 at 03:20 PM. |
02-11-2012, 10:27 AM | #13 | |
Wizard
Posts: 4,756
Karma: 246906703
Join Date: Dec 2011
Location: USA
Device: Oasis 3, Oasis 2, PW3, PW1, KT
|
Quote:
|
|
02-11-2012, 10:32 AM | #14 |
Digital Amanuensis
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
|
Nope, I haven't had time to polish that code as well, but I agree it would be a nice feature to add to the current script. I will try to work on that mid-next week.
|
02-12-2012, 06:39 PM | #15 |
Member
Posts: 10
Karma: 10
Join Date: Oct 2011
Device: PRS650
|
Hi,
Very nice a and awesone job. The documentation is also welcome. Regarding the conversion from mobi, I'm not sure it is possible. I've done this for the Harrap's Shorter English French that I had bought for my Kindle. I had to remove some html tags using the sed command and create the entries using perl scripts. But it seems you have to adapt the script to the dictionary because there are some particular entries. For example, plural or preterit are not directly included (it works with html links that are not possible on the Odyssey). I've try this for the Cambridge Klett and had to adapt the script. Best regards. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
dictionaries | shleepy | Bookeen | 29 | 12-14-2013 11:15 AM |
Dictionaries under 2.1.0. | jshzh | PocketBook | 11 | 01-13-2012 04:53 AM |
Just got K3 and need some help with 3G and dictionaries... | pollo | Amazon Kindle | 1 | 12-29-2011 05:13 PM |
Android Dictionaries | obsessed2 | enTourage Archive | 0 | 05-01-2011 11:44 AM |
Can anybody tell me about dictionaries? | andym | Workshop | 0 | 09-26-2007 03:32 AM |