Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > PocketBook > PocketBook Developer's Corner


Thread Tools Search this Thread
Old 04-16-2021, 07:37 AM   #106
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@ichnilatis PyGlossary generated a syn-file, that handles the inflections of words: pointing it to the main entry. I've just copied it along, so it should work in Goldendict. However, Koreader doesn't handle syn-files. So it is questionable whether that is a good solution. Than again, Koreader uses fuzzy search, so it probably will hit on the right entry anyway.

Would you be willing to check the performance? Are inflections found in Goldendict; are they found in Koreader?
Markismus is offline   Reply With Quote
Old 04-16-2021, 08:53 AM   #107
ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.
Posts: 172
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
@Markismus Unfortunately, not all entries' texts are displayed in Koreader. Mainly, the text of the entries that start with the first few letters of alphabet. Only the titles of all the entries are displayed.
However, Koreader doesn't seem to have problem with the inflections of words, thanks to fuzzy search.

Also, when I unzipped the file "e-Δομή (El-El)", 7z reported: "Data error:".

Do you have any idea what's going on?

In Pocketbook reader, everything looks OK, but dictionaries there are not very easy to use.

Do you think that I have to try it in Goldendict? I have it already installed there.

Thank you for your efforts!
ichnilatis is offline   Reply With Quote
Old 04-16-2021, 04:24 PM   #108
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@ichnilatis I didn't get any error with the Gnome Archive. However, I've recompressed the source files and uploaded a April16th2021 version.

I've attached a screenshot of Koreader of a dictionary lookup of one of the first entries in the dictionary. (I wrote the word with my keyboard after changing the language to Greek in my system.)
Attached Thumbnails
Click image for larger version

Name:	Screenshot from 2021-04-16 21-22-15.png
Views:	414
Size:	101.5 KB
ID:	186659  

Last edited by Markismus; 04-16-2021 at 04:26 PM.
Markismus is offline   Reply With Quote
Old 04-17-2021, 10:18 AM   #109
ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.ichnilatis once ate a cherry pie in a record 7 seconds.
Posts: 172
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
@Markismus Everything looks fine now!

Thank you very much! I sincerely appreciate your help.
ichnilatis is offline   Reply With Quote
Old 11-14-2021, 02:55 PM   #110
Junior Member
Getkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-books
Posts: 8
Karma: 848
Join Date: Aug 2014
Location: Netherlands
Device: PB631
@Markismus amazing work, thank you so much!

I'm using your VanDale Groot Woordenboek Nederlands-Frans (2010)_reconstructed_Feb19th2021.dic, but it has a lot of XML entities instead of Unicode characters. They appear as entities in the Pocketbook dictionary application, and don't appear at all in the quick dictionary popup of the normal ePub reader. I haven't tried in Koreader.

I looked at your script but I'm not super familiar with Perl. Perhaps could be replaced with
Attached Images
File Type: bmp scr0003.bmp (758.1 KB, 208 views)
Getkey is offline   Reply With Quote
Old 11-14-2021, 04:32 PM   #111
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@Getkey Nice screenshot! If you're ready to test, I am willing to program a fix. I haven't had a Pocketbook since 2 years, so most newer features for Pocketbook are as yet untested.

This is easily seen from the tail end of the script. "Create Stardict Dictionary" runs from line 1696 to 1745, while "Create Pocketbook Dictionary" runs from line 1749 to 1756. That ten-fold lines for the tested conversion.

Line 375 actually replaces '&' symbols in the text that are _not_ followed by a html-codepoint with the escape character '&' This to prevent errors in parsers that do look at escape sequences.

These entries seem to use a decimal escape sequence. So 160->nbsp, 233->é, 232->è, etc. However, 9830->♦, which seems odd.

Non-breakable spaces (&160 are in fact converted to codepoints at line 971 and further:
sub convertNonBreakableSpacetoNumberedSequence{
	my $UnConverted = join('',@_);
	debugV("Entered sub convertNonBreakableSpacetoNumberedSequence");
	$UnConverted =~ s~\ ~*~sg ;
	my @Converted = split(/$/, $UnConverted);
	return( @Converted );}
Which are converted to characters in the next subroutine at l.977:
sub convertNumberedSequencesToChar{
my $UnConverted = join('',@_);
debugV("Entered sub convertNumberedSequencesToChar");
$UnConverted =~ s~\&\#x([0-9A-Fa-f]{1,6});~chr("0x".$1)~seg ;
$UnConverted =~ s~\&\#([0-9]{1,6});~chr(int($1))~seg ;
return( split(/(\n)/, $UnConverted) );}

So the questions is not whether we need another dependency, but why the subroutine is not used or fails for Van Dale FR-NL 2010.

In line 1621 and further sub removeInvalidChars is defined and it also replaces some Perl characters codepoints. Also odd that those remain, if the subroutine convertNumberedSequencesToChar is called.

So when is it called? Apparently, it is called if the SameTypeSequence is not "h". In line 1682 and further:
# If SameTypeSequence is not "h", remove � sequences and replace them with characters.
if ( $SameTypeSequence ne "h" ){
	@xdxf_reconstructed = convertNumberedSequencesToChar(
							convertNonBreakableSpacetoNumberedSequence( @xdxf_reconstructed )
								) ;
So if you introduce an extra toggle, e.g.
my $ForceConvertNumberedSequencesToChar = 1;
if ( $SameTypeSequence ne "h" or $ForceConvertNumberedSequencesToChar or $isCreatePocketbookDictionary){
You could test whether the results are nicer for the Pocketbook.

If you aren't able to run the script yourself, I am willing to give it a whirl and send you the result if you're willing to test.
Markismus is offline   Reply With Quote
Old 11-14-2021, 04:34 PM   #112
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
I've just looked at the xdxf-article. It is definitely not filtered.
<head><k>leerling</k></head><def><p><span style="font-weight:bold;">leerling</span> <span style="font-style:italic;">(</span><span style="font-style:italic;">de</span><sup>m</sup><span style="font-style:italic;">)</span><span style="font-weight:bold;">, </span><span style="font-weight:bold;">leerlinge</span>*<span style="font-style:italic;">(</span><span style="font-style:italic;">de</span><sup>v</sup><span style="font-style:italic;">)</span><br/>1. <span style="font-style:italic;">scholier(e)</span><br/><span style="font-weight:bold;">élève</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">lagere school</span><span style="font-style:italic;">)</span>*<span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">écolier</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">écolière</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">middelbare school</span><span style="font-style:italic;">)</span>*<span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">collégien</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">collégienne</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span>lycéen<span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span>lycéenne<span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span>*<br/>♦ voorbeelden<br/>een briljante leerling<br/><span style="color:#472565;">un brillant sujet, un sujet d'élite</span>*<br/>een externe, niet-inwonende leerling<br/><span style="color:#472565;">un(e) externe</span>*<br/>een interne, inwonende leerling<br/><span style="color:#472565;">un(e) interne</span>*<br/>de nieuwe leerlingen<br/><span style="color:#472565;">les élèves qui entrent, de 1ère année</span>*<br/>een zwakke, trage, middelmatige leerling<br/><span style="color:#472565;">un élève médiocre, lent, moyen</span>*<br/>de zwakste leerling van de klas<br/><span style="color:#472565;">le dernier de la classe</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">figuurlijk</span><span style="font-style:italic;">)</span>*<span style="color:#472565;">le mauvais élève</span>*<br/>2. <span style="font-style:italic;">volgeling</span><br/><span style="font-weight:bold;">élève</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span><br/>disciple<span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span>*<br/>♦ voorbeelden<br/><span style="font-style:italic;">(</span><span style="font-style:italic;">Bijbel</span><span style="font-style:italic;">)</span>*de leerlingen van Jezus<br/><span style="color:#472565;">les disciples de Jésus</span>*<br/>3. <span style="font-style:italic;">aspirant(-)</span><br/><span style="font-weight:bold;">élève</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span>apprenti<span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span>apprentie<span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span>*<br/>♦ voorbeelden<br/>een leerling-metselaar<br/><span style="color:#472565;">un apprenti maçon</span>*</p></def>
After forcing the conversion it looks like this:
<head><k>leerling</k></head><def><p><span style="font-weight:bold;">leerling</span> <span style="font-style:italic;">(</span><span style="font-style:italic;">de</span><sup>m</sup><span style="font-style:italic;">)</span><span style="font-weight:bold;">, </span><span style="font-weight:bold;">leerlinge</span>*<span style="font-style:italic;">(</span><span style="font-style:italic;">de</span><sup>v</sup><span style="font-style:italic;">)</span><br/>1. <span style="font-style:italic;">scholier(e)</span><br/><span style="font-weight:bold;">élève</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">lagere school</span><span style="font-style:italic;">)</span>*<span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">écolier</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">écolière</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">middelbare school</span><span style="font-style:italic;">)</span>*<span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">collégien</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span><span style="font-weight:bold;">collégienne</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span>lycéen<span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span>lycéenne<span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span>*<br/>♦ voorbeelden<br/>een briljante leerling<br/><span style="color:#472565;">un brillant sujet, un sujet d'élite</span>*<br/>een externe, niet-inwonende leerling<br/><span style="color:#472565;">un(e) externe</span>*<br/>een interne, inwonende leerling<br/><span style="color:#472565;">un(e) interne</span>*<br/>de nieuwe leerlingen<br/><span style="color:#472565;">les élèves qui entrent, de 1ère année</span>*<br/>een zwakke, trage, middelmatige leerling<br/><span style="color:#472565;">un élève médiocre, lent, moyen</span>*<br/>de zwakste leerling van de klas<br/>

<span style="color:#472565;">le dernier de la classe</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">figuurlijk</span><span style="font-style:italic;">)</span>*<span style="color:#472565;">le mauvais élève</span>*<br/>2. <span style="font-style:italic;">volgeling</span><br/><span style="font-weight:bold;">élève</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span><br/>disciple<span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span>*<br/>♦ voorbeelden<br/><span style="font-style:italic;">(</span><span style="font-style:italic;">Bijbel</span><span style="font-style:italic;">)</span>*de leerlingen van Jezus<br/><span style="color:#472565;">les disciples de Jésus</span>*<br/>3. <span style="font-style:italic;">aspirant(-)</span><br/><span style="font-weight:bold;">élève</span><span style="font-style:italic;"> (</span><span style="font-style:italic;">m. of v.</span><span style="font-style:italic;">)</span><br/><span style="font-style:italic;">(</span><span style="font-style:italic;">man</span><span style="font-style:italic;">) </span>apprenti<span style="font-style:italic;"> (</span><span style="font-style:italic;">m.</span><span style="font-style:italic;">)</span>, <span style="font-style:italic;">(</span><span style="font-style:italic;">vrouw</span><span style="font-style:italic;">) </span>apprentie<span style="font-style:italic;"> (</span><span style="font-style:italic;">v.</span><span style="font-style:italic;">)</span>*<br/>♦ voorbeelden<br/>een leerling-metselaar<br/><span style="color:#472565;">un apprenti maçon</span>*</p></def>
And the coding agrees that '♦' is a diamond. (As does this forum, that upon saving this post converts '&# 9830;' immediately to '♦'.)

Last edited by Markismus; 11-14-2021 at 05:00 PM.
Markismus is offline   Reply With Quote
Old 11-14-2021, 04:56 PM   #113
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
I've uploaded the new Dutch-French dictionary to the pCloud. Its filename ends with ..Nov14th2021.dic

The html-escape sequences for symbols should be gone. Do you have a problem with the remaining color-styling or does the conversion rip that right out?

Last edited by Markismus; 11-14-2021 at 05:12 PM.
Markismus is offline   Reply With Quote
Old 11-14-2021, 05:36 PM   #114
Junior Member
inhabitant has learned how to read e-booksinhabitant has learned how to read e-booksinhabitant has learned how to read e-booksinhabitant has learned how to read e-booksinhabitant has learned how to read e-booksinhabitant has learned how to read e-booksinhabitant has learned how to read e-books
Posts: 1
Karma: 854
Join Date: Nov 2021
Device: PocketBook
It is pretty awesome what you doing here. I tried myself to make a DIC File out of Wikiferheng, which is a kurdish Dictionary. But i was just not successful. Maybe someone of you is able to help.

The Dump-Files of are here:

Is someone able to help with this issue? I also would pay for it, because i am in desperate need for it as i work as a medical in crisis areas. But the language is a big problem.

Kurmancî / German and German / Kurmancî

Last edited by inhabitant; 11-14-2021 at 05:40 PM.
inhabitant is offline   Reply With Quote
Old 11-14-2021, 07:44 PM   #115
Junior Member
Getkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-books
Posts: 8
Karma: 848
Join Date: Aug 2014
Location: Netherlands
Device: PB631
@Markismus, this is great! Works very well, see the scr0004.png and scr0008.png. And thanks for the explanation!

I think the ♦ are normal and are meant as a sort of separator.

Do you have a problem with the remaining color-styling or does the conversion rip that right out?
Nope it's perfect. The only "issue" is that in the PocketBook dictionary app, the ♦ are displayed as □ (see scr0004.png). It works in the quick dictionary popup however. So it seems the issue is on PocketBook's side and can be ignored (it's minor anyway ).

I've tried the other dictionaries you put on pCloud. Here are those with issues (see screenshots), could you regenerate them?
  • All the Van Dale dictionaries except GWHN 2019 have the same issue as the NL-FR
  • Babylon EL-EN has HTML tags (likely a different problem, right?)
  • Duden has hexadecimal HTML entities
  • KD has hexadecimal HTML entities
  • Longman has some HTML entities (notice the &apos; in the screenshot)
  • Littré 2011 had an issue with the conversion of the HTML entities, apparently? (might also be a different problem?)
Attached Thumbnails
Click image for larger version

Name:	scr0004.png
Views:	247
Size:	56.7 KB
ID:	190253   Click image for larger version

Name:	scr0008.png
Views:	206
Size:	46.6 KB
ID:	190254   Click image for larger version

Name:	scr0009.png
Views:	214
Size:	32.0 KB
ID:	190255   Click image for larger version

Name:	scr0010.png
Views:	212
Size:	57.7 KB
ID:	190256   Click image for larger version

Name:	scr0012.png
Views:	208
Size:	41.5 KB
ID:	190257   Click image for larger version

Name:	scr0013.png
Views:	186
Size:	51.6 KB
ID:	190258   Click image for larger version

Name:	scr0015.png
Views:	217
Size:	40.8 KB
ID:	190259  
Getkey is offline   Reply With Quote
Old 11-15-2021, 03:18 AM   #116
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
The square box is generated if the used fonttype doesn't contain the symbol. Try using another font. Normally, this is prevented by defining a fallback font, that is known to have all or almost al symbols defined.
Markismus is offline   Reply With Quote
Old 11-15-2021, 05:25 AM   #117
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@inhabitant You could start with taking a look at this thread and search mobileread forums further to turn the wiki-dump into a dictionary. When you've generated it and can't convert it to pocketbook, I am willing to help.

If you write a howto or at least give the relevant links about how to convert the wiki dump to a dictionary, I am willing to look into whether it can be included in the script.

Last edited by Markismus; 11-15-2021 at 05:51 AM.
Markismus is offline   Reply With Quote
Old 11-15-2021, 05:50 AM   #118
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@Getkey Could you test the Babylon_English_Greek dictionary of Nov15th2021?.

I've included the <br>-tag in $PossibleTags and expanded the code for eliminating colors-tags:
	# Removes all color from lemma description.
	# <c c="darkslategray"><c>Derived:</c></c> <c c="darkmagenta">
	# Does not remove for example <span style="color:#472565;"> and corresponding </span>!!!
	# Does not remove for example <font color="#007000">noun</font>!!!
	$def =~ s~<\?c>~~gs;
	$def =~ s~<c c=[^>]+>~~gs;
	# Does not remove span-blocks with nested html-blocks.
	$def =~ s~<span style="color:#\d+;">(?<colored_text>[^<]*)</span>~$+{colored_text}~gs;
	# Does not remove font-blocks with nested html-blocks.
	$def =~ s~<font color="#\d+">(?<colored_text>[^<]*)</font>~$+{colored_text}~gs;

Last edited by Markismus; 11-15-2021 at 06:14 AM.
Markismus is offline   Reply With Quote
Old 11-15-2021, 06:13 AM   #119
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
After adding <0x00> to the subroutine removeInvalidChars and rerunning the script for Duden the entry generated is:
<head><k>a</k></head><def><sup>i.</sup><blockquote><span><span><b>a, </b><b>A </b>das; - (UGS.: -s), - (UGS.: -s) [mhd., ahd. a]: <b>1.</b> erster Buchstabe des Alphabets: <i>ein kleines a, ein großes A; </i> <i>eine Broschüre mit praktischen Hinweisen von A bis Z (unter alphabetisch angeordneten Stichwörtern); </i> <b>R </b>wer A sagt, muss auch B sagen (wer etwas beginnt, muss es fortsetzen u. auch unangenehme Folgen auf sich nehmen); <sup>*</sup><b>das A und O, </b>(SELTENER:) <b>das A und das O </b>(die Hauptsache, Quintessenz, das Wesentliche, Wichtigste, der Kernpunkt; urspr. = der Anfang und das Ende, nach dem ersten [Alpha] und dem letzten [Omega] Buchstaben des griech. Alphabets); <sup>*</sup><b>von A bis Z </b>(UGS.; von Anfang bis Ende, ganz und gar, ohne Ausnahme; nach dem ersten u. dem letzten Buchstaben des dt. Alphabets). <b>2.</b> ‹das; -, -› (MUSIK) sechster Ton der C-Dur-Tonleiter: <i>der Kammerton a, A.</i> </span></span><span></span></blockquote>
<sup>ii.</sup><blockquote><span><span><small><sup>1</sup></small><b>a</b>= a-Moll; Ar.</span></span><span></span></blockquote>
<blockquote><span><span><small><sup>2</sup></small><b>a</b> ‹Präp.› [ital. a &lt; lat. ad = zu]: auf, mit, zu (in ital. Fügungen, z.*B. a*conto, a*tempo).</span></span><span></span></blockquote></def>
Note that the '>' symbol is still formatted as the html-codepoing '&lt;'. is this also not displayed properly, just as '&apos;'?

@Getkey Could you test Duden of Nov15th2021?
Markismus is offline   Reply With Quote
Old 11-15-2021, 06:38 AM   #120
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
Markismus's Avatar
Posts: 924
Karma: 149883
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
I've created a subroutine to remove the html-escape characters:
sub unEscapeHTMLString{
    my $String = shift;
    $String =~ s~\&lt;~<~sg;
    $String =~ s~\&gt;~>~sg;
    $String =~ s~\&apos;~'~sg;
    $String =~ s~\&amp;~&~sg;
    $String =~ s~\&quot;~"~sg;
    return $String;}
The entry for rectum is now generated as:
<head><k>rectum</k></head><def><b>rec</b>‧<b>tum</b></c> /ˈrektəm/  <i><c> noun</c></i> (<i>plural </i><b>rectums</c></b><i> or</i> <b>recta</c></b> /-tə/) [countable]</c><i> medical</c></i>
<blockquote>[Date: </c>1400-1500</c>; Language: </c>Modern Latin</c>; Origin: </c>rectum intestinum</c> </c><i>'straight intestine'</c></i>]</blockquote>
<blockquote> the lowest part of your ↑<kref>bowel</kref>s</c> ⇨ <b>rectal</b></blockquote></def>
@GetKey Could you check whether the Longman Dictionary of Nov15th displays without html-codes?

Last edited by Markismus; 11-15-2021 at 06:45 AM.
Markismus is offline   Reply With Quote

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Pocketbook dictionary logan PocketBook 322 03-05-2024 10:48 AM
Dictionary coversion from .mobi to pocketbook format? doctorat PocketBook 16 07-01-2020 06:34 PM
Webster's 1913 Dictionary in Pocketbook Format luqmaninbmore PocketBook 8 05-27-2020 11:41 AM
SW>EN Dictionary for Pocketbook tttrine PocketBook 3 06-09-2015 07:01 AM

All times are GMT -4. The time now is 09:48 AM. is a privately owned, operated and funded community.