Index of Custom Dictionaries for Kobo eReader - Page 21

oren64 · 09-05-2016, 10:53 AM

Quote:

Originally Posted by frenshprince

Thanks Oren

I tried to find something relative to dictionnary in Kindle forum, but I guess there is no need over there to convert a mobi to a textfile

It seems that Penelope can convert a mobi dict to a Kobo dict easily, but if you don't get how Python works, there is no way to do it yourself.

See here, it can only write mobi file.

Contact Alberto by email, maybe he have solution for you.

GERGE · 09-05-2016, 11:05 AM

Collins dictionary is also sold as mobi. It is one of the biggest and best dictionaries with 200000 head words (or 722000 words as ads would have it). If there is a way to convert it, it would be great.

Doitsu · 09-05-2016, 11:05 AM

Quote:

Originally Posted by frenshprince

I bought an expensive french dictionnary for kindle, and I would like to convert it for my kobo.

You can unpack most DRM-free Kindle dictionaries with the KindleUnpack Calibre plugin.
However, you'd need to manually reformat the generated HTML file(s) to one of the input formats supported by Penelope.

helour · 09-07-2016, 01:09 PM

The english-slovak (with pronunciation) & slovak-english dictionary contains 50453 & 50420 translated terms.
The KoboRoot.tgz package contains autoinstall (you don't need to use any sql tool) english-slovak & slovak-english dictionary and slovak hyphenation.

frenshprince · 09-07-2016, 01:18 PM

Quote:

Originally Posted by oren64

See here, it can only write mobi file.

Contact Alberto by email, maybe he have solution for you.

You're right. I didn't see that.

Quote:

Originally Posted by Doitsu

You can unpack most DRM-free Kindle dictionaries with the KindleUnpack Calibre plugin.
However, you'd need to manually reformat the generated HTML file(s) to one of the input formats supported by Penelope.

Thanks Doitsu
Is there any tuto for that ?
I'm not affraid to spend time on it, but my knowledge in this area is quite limited

mietek81 · 09-09-2016, 03:47 AM

Hi
I have done something like that with Kindle polish to english dictionary using liniux shell scripting and I must say that it is not a simple task. Fist of all, when you unpack the mobi file, you'll get probably single, large html file which is hard to view in any viewer/notepad app. Than you would have to clean this file of all unwanted tags, html bits and other stuff. After that it converts to tab-file format and than can use Penelope.
But I think that each mobi dictionary may have a different structure, so even if I would share my script it will not work with your dictionary. But still, I can send it to you so if you have any linux scripting skills you can try to make it work

AlPe · 09-23-2016, 08:34 AM

@Oren64, helour: I added to the first post the dictionaries you created since my last visit, thank you.

AlPe · 09-23-2016, 08:38 AM

Quote:

Originally Posted by mietek81

But I think that each mobi dictionary may have a different structure, so even if I would share my script it will not work with your dictionary. But still, I can send it to you so if you have any linux scripting skills you can try to make it work

Yes, that is the main problem. Actually, there are several variants of the MOBIPOCKET format for dictionaries, so that is another variable in the equation.

Unfortunately I have no better answer than what was suggested above: unpack the MOBI and clean the source with some script, producing a format that Penelope can read, and use the latter to convert to the Kobo format.

Since I have maxed out on my FLOSS contributions (see next post), I do not plan to implement a generic read_mobi() function in Penelope, sorry.

AlPe · 09-23-2016, 08:52 AM

Quote:

Originally Posted by satelman

Hi, everyone!

I can confirm surquizu's and tropoy's issues trying to install a Kobo dictionary converted with penelope 3.1.2. I had this very same problem (I use Python 2.7 + Windows 10). There was no way for my Aura HD to recognize or accept my new Kobo dictionary.

However, when I converted again by using penelope 2.0.2 this time, everything went OK and the ereader accepted my just converted dictionary.

So, apparently, I will have to stick to version 2.0.2 (for the time being, at least).

Hi,

unfortunately I no longer have a Kobo to actually test the files output by Penelope, so I might have introduced a bug in Penelope v3.x; however, I have not received complaints about that from Linux users, so it might be a problem with Windows.

Also note that today (2016-09-23) I published Penelope v3.1.3 on GitHub and PyPI. This version fixes a bug with the Bookeen output and cleans the code a bit, so I doubt it will solve your issue anyway.

Unfortunately I am no longer able to maintain the Penelope project, I am too absorbed by other FLOSS projects. I am actively looking for someone to take Penelope over. If someone is interested, please let me know.

Meanwhile, I am still committed to try fixing bugs. So, it would helpful to know: (in what follows, by "working" I mean that the output dictionary works on Kobo devices)

1. Is Penelope v3.1.3 working for Linux/OS X users? With Python 2.7.x and/or Python 3.x?
2. Is Penelope v3.1.3 working for Windows users when using Python 3.x?

If the answer to 1. is Yes, then the problem is with Windows only.

If the answers to 1. and 2. are both Yes, then the problem is with Python 2.7.x on Windows only.

If the answer to 2. is No, then it might be a problem with Windows or a bug. In this case, to diagnose it, I would need:

A. the file(s) of the input dictionary
B. the Kobo file output by Penelope v3.1.3
C. the Kobo file output by Penelope v2.0.2

Finally, a general note on helping FLOSS maintainers: if you use a FLOSS tool, and you find a problem, please report it to the bug tracker of the project! I do not usually stop by MobileRead, while I get an email from GitHub when someone opens an issue: https://github.com/pettarin/penelope/issues . If a user does not have a GitHub account, she can send me an email.

oren64 · 09-23-2016, 12:37 PM

Hi Alberto

I install 3.1.3 with Python 3.5.1 Windows 7, and it's works okay, I tried 2 dictionries en-en, en-he, my kobo recognize the dictionaries.

Until now I used penelope-2.0.2, Python 2.7.11 and penelope.py, It works faster on my PC then Python 3.5.1 and penelope3.py.
I also use bat file, so it's more easer to use, no messing about with Command Prompt.

Code:

cmd /k penelope.py -p from_to -f from -t to --output-kobo%*

___________________________________

My problem is when there is two or more words to one translation.
like fore example:

Code:

word1|word2	translation

Kobo need to separate the words, but it show them as "word1|word2" in me kobo.
It worked fore me on my old PC with penelope-2.0.2 on before, I don't know if the problem with kobo, penelope or Windows.

AlPe · 09-23-2016, 03:26 PM

Hi Oren,

thanks for confirming Penelope v3.1.3 works on Windows with Python 3.x, at least now we can suggest people to use Python 3.x.

With respect to your question:

Quote:

My problem is when there is two or more words to one translation.
...

I am not sure I understood it correctly. I understood that you have a CSV file (I use "<TAB>" to indicate the field separator for the example):

Code:

word1|word2<TAB>definition

where "definition" is the definition associated with both "word1" and "word2". So, you expect the output Kobo to have two entries, one with index word "word1", the other with index word "word2", both associated with "definition".

If this is the case, unfortunately Penelope cannot understand that "word1|word2" is composed by two strings, each to be associated with "definition". It sees it as a single string, exactly as if it was "word3".

However, you can write a simple input parser and use the "--input-parser" switch. (I can code it for you, if you confirm I understood it correctly) Or use a text editor and some regex to duplicate the row:

Code:

word1<TAB>definition
word2<TAB>definition

and feed Penelope with this derived file.

If I did not understand correctly, let me know.

oren64 · 09-23-2016, 03:42 PM

Quote:

Originally Posted by AlPe

I am not sure I understood it correctly. I understood that you have a CSV file (I use "<TAB>" to indicate the field separator for the example):

Code:

word1|word2<TAB>definition

You understand me correctly.

Quote:

However, you can write a simple input parser and use the "--input-parser" switch. (I can code it for you, if you confirm I understood it correctly) Or use a text editor and some regex to duplicate the row:

Code:

word1<TAB>definition
word2<TAB>definition

Yes, I would like that.

AlPe · 09-23-2016, 04:07 PM

In attachment the input parser (the .py file), a simple .csv input file, and the resulting output files (I used XML so you can easily inspect them), but of course you can then use the input parser on your real input file and with Kobo output. You might want to use the sort/merge switches as well.

Code:

$ python -m penelope -i input.csv -j csv --csv-fs "\\t" -f en -t en -p xml -o output_no_input_parser.xml

$ python -m penelope -i input.csv -j csv --csv-fs "\\t" -f en -t en -p xml -o output_with_input_parser.xml --input-parser multiple_index_words.py

oren64 · 09-23-2016, 04:31 PM

I get an error.

AlPe · 09-23-2016, 04:40 PM

In the second invocation, you have an extra "t" after "C:\ ... \python.exe".

I think your command should be (assuming the input.csv and multiple_index_words.py are also in "D:\Penelope\penelope-3.1.3"):

Code:

C:\Users\Oren\AppData\Local\Programs\Python\Python35-32\python.exe -m penelope -i input.csv -j csv --csv-fs "\\t" -f en -t en -p xml -o output_with_input_parser.xml --input-parser multiple_index_words.py

(I am not sure "\\t" is the right escape for windows. Maybe it is just "\t". Use what works for you.)

09-23-2016, 12:37 PM	#310
oren64 I need a chapter break Posts: 4,042 Karma: 56058267 Join Date: Mar 2015 Location: Israel Device: Kobo Glo	Hi Alberto I install 3.1.3 with Python 3.5.1 Windows 7, and it's works okay, I tried 2 dictionries en-en, en-he, my kobo recognize the dictionaries. Until now I used penelope-2.0.2, Python 2.7.11 and penelope.py, It works faster on my PC then Python 3.5.1 and penelope3.py. I also use bat file, so it's more easer to use, no messing about with Command Prompt. Code: cmd /k penelope.py -p from_to -f from -t to --output-kobo%* ___________________________________ My problem is when there is two or more words to one translation. like fore example: Code: word1\|word2 translation Kobo need to separate the words, but it show them as "word1\|word2" in me kobo. It worked fore me on my old PC with penelope-2.0.2 on before, I don't know if the problem with kobo, penelope or Windows. Last edited by oren64; 09-23-2016 at 04:02 PM.

09-23-2016, 04:31 PM	#314
oren64 I need a chapter break Posts: 4,042 Karma: 56058267 Join Date: Mar 2015 Location: Israel Device: Kobo Glo	I get an error. Attached Thumbnails

09-23-2016, 04:40 PM	#315
AlPe Digital Amanuensis Posts: 727 Karma: 1446357 Join Date: Dec 2011 Location: Turin, Italy Device: Several eReaders and tablets	In the second invocation, you have an extra "t" after "C:\ ... \python.exe". I think your command should be (assuming the input.csv and multiple_index_words.py are also in "D:\Penelope\penelope-3.1.3"): Code: C:\Users\Oren\AppData\Local\Programs\Python\Python35-32\python.exe -m penelope -i input.csv -j csv --csv-fs "\\t" -f en -t en -p xml -o output_with_input_parser.xml --input-parser multiple_index_words.py (I am not sure "\\t" is the right escape for windows. Maybe it is just "\t". Use what works for you.)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
New custom English dictionaries	ShellShock	Kobo Developer's Corner	166	09-22-2020 03:48 PM
Kobo desktop loading dictionaries	brudigia	Kobo Reader	1	07-16-2014 01:55 PM
Touch building custom dictionaries, especially Japanese-English	tshering	Kobo Reader	0	07-12-2012 07:00 PM
Custom dictionaries for 350/650?	1drey	Sony Reader	5	12-28-2010 01:49 PM
non-English dictionaries in eReader on iPod Touch?	ficbot	Reading and Management	1	10-11-2008 10:04 AM

09-05-2016, 11:05 AM	#302
GERGE Guru Posts: 733 Karma: 5797160 Join Date: Jun 2010 Location: Istanbul Device: Kobo Libra	Collins dictionary is also sold as mobi. It is one of the biggest and best dictionaries with 200000 head words (or 722000 words as ads would have it). If there is a way to convert it, it would be great.

09-09-2016, 03:47 AM	#306
mietek81 Connoisseur Posts: 79 Karma: 5414 Join Date: Mar 2013 Location: CK, Poland Device: Kobo Glo, H2O	Hi I have done something like that with Kindle polish to english dictionary using liniux shell scripting and I must say that it is not a simple task. Fist of all, when you unpack the mobi file, you'll get probably single, large html file which is hard to view in any viewer/notepad app. Than you would have to clean this file of all unwanted tags, html bits and other stuff. After that it converts to tab-file format and than can use Penelope. But I think that each mobi dictionary may have a different structure, so even if I would share my script it will not work with your dictionary. But still, I can send it to you so if you have any linux scripting skills you can try to make it work

09-23-2016, 08:34 AM	#307
AlPe Digital Amanuensis Posts: 727 Karma: 1446357 Join Date: Dec 2011 Location: Turin, Italy Device: Several eReaders and tablets	@Oren64, helour: I added to the first post the dictionaries you created since my last visit, thank you.

Advert

Advert