Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 10-22-2022, 01:23 PM   #496
shamanNS
Guru
shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.
 
Posts: 945
Karma: 10500004
Join Date: Feb 2010
Location: Serbia
Device: Kindle PW5, Kobo Libra 2, Kindle PW1
Quote:
Originally Posted by nem_mil View Post
Right, but the dictionaries for two alphabets are already in two distinct files, recognized by Calibre as two separate languages. I got them here: https://devbase.net/dict-sr/

So, the Latin one is tagged as "Serbo-Croatian" (the file is titled hyph_sh.dic), and I run the Latin book only through that dictionary, never through the other one. The Cyrillic one is hyph_sr and I use it only for the Cyrillic books, but it doesn't work. From what I understand, Serbo-Croatian and Serbian are seen by Calibre and Hyphenate This as two separate languages, which would mean that I never use one dictionary for two alphabets, right?

Here is how it looks on my Calibre: https://ibb.co/BCmCBR4



Why do you say that it doesn't work properly?
Sorry, I didn't pay sufficient attention while reading your previous comment. I wrongly interpreted and misunderstood that you've edited the dictionary file (to switch it from representing "serbo-croatian" to "serbian latin") not the ebooks language metadata.


Anyways, I just did couple of test in Calibre and I've figured out what is the problem: this plugin doesn't like the character encoding used on that "hyph_sr.dic" (for Serbian Cyrillic). Converting the encoding to UTF-8 (that gets you the exact same file as the one attached by @BeckyEbook) fixes the problem.

While testing things I've realised that Calibre's built-in "Add soft hyphens" option in "Polish books" tool / plugin actually uses the same file that you've downloaded (with the same character encoding) and hyphenation works correctly when using "Polish books" option.

Long story short: download and use (add to HyphenateThis plugin) the file attached by @BeckyEbook and it will work correctly for Serbian Cyrillic books.:

fixed Serbian Cyrillic hyphenation dictionary


What I've written previously still stands: you can't "install" at the same time two hyphenation dictionaries if both get imported as "serbian". That issues was known to me every since this HyphenateThis plugin has first appeared.
You've gotten around it by changing book's language metadata to "Serbo-Croatian" and using dictionary for that. IMHO that workaround (mangling each Serbia Latin book's language metadata) is not acceptable. If I read serbian language books I would probably prefer continuously installing the correct serbian hyphenation dictionary .. or even using Calibre's "Polish Books" tool since that seams to be able to figure out when to use which of the two hyphenation dictionaries.

Hyphenation dictionaries that Polish Books uses can be found at:
Code:
C:\Program Files\Calibre2\app\resources\hyphenation\dictionaries.tar.xz

Last edited by shamanNS; 10-22-2022 at 01:33 PM.
shamanNS is offline   Reply With Quote
Old 10-22-2022, 01:34 PM   #497
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,021
Karma: 131375774
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Don't use soft-hyphens for ePub. They do not work well enough. Use a hyphenation dictionary that does work properly. But until I get the language codes, I am not going to try to build an install file for the hyphenation dictionaries.

Last edited by JSWolf; 10-22-2022 at 01:38 PM.
JSWolf is offline   Reply With Quote
Advert
Old 10-22-2022, 01:47 PM   #498
shamanNS
Guru
shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.
 
Posts: 945
Karma: 10500004
Join Date: Feb 2010
Location: Serbia
Device: Kindle PW5, Kobo Libra 2, Kindle PW1
Nobody cares what you think should or shouldn't be used and done

As for language codes: if you were more specific what form of language codes ("sr", "srp", "sr-RS","sr-Latn-RS", "sr-Cyrl-RS"...) you're expecting to get maybe someone would have already provided them to you. Also, you could've Googled that information 9999 times already since yesterday.

Last edited by shamanNS; 10-22-2022 at 01:50 PM.
shamanNS is offline   Reply With Quote
Old 10-22-2022, 01:57 PM   #499
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,021
Karma: 131375774
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by shamanNS View Post
Nobody cares what you think should or shouldn't be used and done

As for language codes: if you were more specific what form of language codes ("sr", "srp", "sr-RS","sr-Latn-RS", "sr-Cyrl-RS"...) you're expecting to get maybe someone would have already provided them to you. Also, you could've Googled that information 9999 times already since yesterday.
But I need to know what it is these eBooks are using. Sure I could have looked it up. But then I could have been wrong.

I'm trying to do what should work instead of what may not work.
JSWolf is offline   Reply With Quote
Old 10-22-2022, 03:54 PM   #500
shamanNS
Guru
shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.shamanNS ought to be getting tired of karma fortunes by now.
 
Posts: 945
Karma: 10500004
Join Date: Feb 2010
Location: Serbia
Device: Kindle PW5, Kobo Libra 2, Kindle PW1
Used where / for what?

1) dc:language in .opf
2) xml:lang attribute on XHTML elements
3) stuff like Hunspell and OpenOffice / LibreOffice dictionaries

"dc:language" most of the time will have either 2 letter language code ("sr") or 3 letter ISO_whatever variant ("srp"). The same value is used for both Latin or Cyrillic script / alphabets epubs.
There aren't 2 or 3 letter codes that indicate the script used.

Stuff like Windows locale and Dot NET locale that support "extended language codes" use that form of "2 letter language code + 4 letter script + 2 letter country code" ( so "sr-Latn-RS" and "sr-Cyrl-RS")

No idea how Kobo's hyphenation dictionaries "encode" that type of info. I've noticed that for example KOReader has hyphenation rules only for Serbian Cyrillic and not for Serbian Latin.
shamanNS is offline   Reply With Quote
Advert
Old 10-22-2022, 04:21 PM   #501
nem_mil
Member
nem_mil began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Oct 2022
Device: Kobo Libra 2, Onyx Boox Note, Onyx Boox Nova Air
I just tried the file uploaded by BeckyEbook and it works perfectly!!! Thank you all for your help

I was even thinking about using some condensed fonts to solve the issue since justified text leaves so much space between words in Serbian Cyrillic, but this is a much better solution.
nem_mil is offline   Reply With Quote
Old 12-20-2022, 04:58 AM   #502
FelixKrull
Enthusiast
FelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watch
 
Posts: 29
Karma: 10738
Join Date: Aug 2018
Device: none
I just discovered a bug or something:

I have three dictionaries installed, and hyphenating works perfectly.

Now I added a forth dictinonary (spanish, from here: https://extensions.libreoffice.org/e...h-dictionaries)

and the plugin stops working with an error:



After removing the dictionary everything works fine again.
FelixKrull is offline   Reply With Quote
Old 12-20-2022, 05:29 AM   #503
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,021
Karma: 131375774
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by FelixKrull View Post
I just discovered a bug or something:

I have three dictionaries installed, and hyphenating works perfectly.

Now I added a forth dictinonary (spanish, from here: https://extensions.libreoffice.org/e...h-dictionaries)

and the plugin stops working with an error:



After removing the dictionary everything works fine again.
You could try removing one of the three working dictionaries and then add in the Spanish dictionary in case it's 4 dictionaries that don't work.
JSWolf is offline   Reply With Quote
Old 12-21-2022, 01:25 AM   #504
FelixKrull
Enthusiast
FelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watchFelixKrull is clearly one to watch
 
Posts: 29
Karma: 10738
Join Date: Aug 2018
Device: none
Thanks, I'll try this (thought it would be awkward as I need all of the other dictionaries...)
FelixKrull is offline   Reply With Quote
Old 12-21-2022, 05:12 AM   #505
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,021
Karma: 131375774
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by FelixKrull View Post
Thanks, I'll try this (thought it would be awkward as I need all of the other dictionaries...)
You can always delete the Spanish dictionary and add back the one you've removed.
JSWolf is offline   Reply With Quote
Old 01-01-2023, 08:41 PM   #506
Miq 528
Junior Member
Miq 528 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2023
Device: Kindle 7th
Setting exceptions to the hyphenation.

Hello, I am going to self publish a book and this plugin will improve it a lot, thank you.

I can touch some things in the html, but my relative ignorance makes me ask this: How can I tag a phrase or word that I want to not being hyphenated?

In the configuration of your plugin I have seen that of:

"Comma separated list of tags to:
Ignore
h1, h2, h3"

I don't know how to put this into the html, but I think that with one example would be enough, here is the (shortened) Calibre html of the title page, could you put there that tag for that the title and the "Author's name" not be hyphenated?
Thank you.
Miqbenn.


<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="es" xml:lang="es">
<head>
<title>Unknown</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<link rel="stylesheet" type="text/css" href="stylesheet.css"/>
<link rel="stylesheet" type="text/css" href="page_styles.css"/>
</head>
<body class="calibre">
<p class="block_">*</p>
<p class="block_1">The Daughter of the Artisan</p>
<p class="block_1">Author's name</p>
<p class="block_">*</p>
</body></html>
Miq 528 is offline   Reply With Quote
Old 01-30-2023, 11:58 PM   #507
chancerlane
Junior Member
chancerlane began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jan 2023
Device: Kindle
Can any of you guy's direct me to a good English Hyphenation Dictionary? OP's post is slightly confusing to me and couldn't find a standard English Hyphenation dictionary is there a good one anyone can give me a direct link to?
chancerlane is offline   Reply With Quote
Old 08-08-2023, 10:52 AM   #508
waka
Junior Member
waka began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2023
Device: Kindle Oasis 2
This works great! Thank you very much!
It improves the reading experience a lot for the kindle's default reader.
I might not even jailbreak my device if I know this trick earlier!
waka is offline   Reply With Quote
Old 08-31-2023, 01:23 PM   #509
Ialou
Junior Member
Ialou began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2023
Device: Likebook Ares Note
Hyphenation help in ereader

I have an AresNote likebook ereader. He is basically a modified android. It has a hyphenation system, but it doesn't work well in my language. Is there any way to install the dictionary on the new device? Can anyone tell me?

Quote:
Originally Posted by SauliusP. View Post
Hyphenate This! will add soft hyphens to your ebook and add even better feel of a real book!

Supports EPUB and AZW3 formats (no MOBI, even with KF8 inside, convert instead).

If you have a Kindle (with AZW3/KF8 support) this plugin will explode the book, add soft-hyphens and rebuild it back.

Use hyphenation dictionaries from Apache OpenOffice Extensions. Download one and add to the plugin via its settings.

This plugin is primarily targeted for Kindle users reading AZW3/KF8 format books, as Kindle does not support hyphenation itself. However, recent firmwares added support of soft-hyphenation. So if the book is pre-hyphenated, Kindle will display it correctly. Text search and other features remain. Note, that if you hyphenate EPUB or AZW3 and convert to "old" MOBI, hyphenation won't work.

Some EPUB readers have native hyphenation, but if you read some exotic language (like me), hyphenation support might be poor or not present at all. Luckily, Libre/Open Office dictionaries are implemented for quite a lot of languages.

Note. Not all EPUB readers support soft-hyphens in the way expected. Some hyphenate, but do not show dashes. Some display correctly, but lack search feature. So try it yourself and decide if it is any good. As per discussion further in this thread:
  • Sony devices seem to split text on soft-hyphens, but do not display dashes. Not acceptable for Sony users.
  • Kobo seems to display hyphenation correctly, but text search is ruined.

CAUTION! In versions before (and including) 0.9.26 of Calibre there is a flaw with AZW3 explosion/rebuilding. You might loose picture content. So please back-up your AZW3, if it is the only and original version of the book you have!

CAUTION! In versions before 0.9.24 of Calibre there is a flaw in support of AZW3 explode and rebuild workflow. TOC might be corrupted as well as quick jump through chapters! Might not be the case for you, but be warned!

Illustrations. I have added screenshots from my Kindle with English book. However, English is quite compact and hyphenation does not show all its beauty. So I've also added two screenshots with Lithuaniant text, where hyphenation is more obvious. Of course, text will look like wingdings to most of you, but just try too see the difference :-)
Spoiler:

English text, original:


English text, soft-hyphenated:


Lithuanian text, original:


Lithuanian text, soft-hyphenated:




User Guide
Spoiler:

Install plugin and download "OXT" dictionaries from link above. Open plugin's settings via menu and add those dictionaries. After dictionaries are added to the plugin, downloaded files are removed, plugin stores hyphenation information inside it's settings directory.

NOTE. You may also add hyphenation dictionary directly, i.e. appropriate "DIC" file, extracted from "OXT" (OXT is simly a ZIP file). "DIC" file must be named "hyph_<language code>.dic". E.g. "hyph_en_US.dic" or "hyph_ru.dic".

NOTE 2. I have tested lots of "OXT" dictionaries. Surprisingly, some of them include hyphenation file, but it is not included in the descriptor (plugin uses descriptor to find out the hyphenation dictionary inside "OXT" archive). So if you add "OXT" dictionary, but no new dictionary appears in the list, try to open "OXT" file with some archive manager and search for "hyph*.dic" file there. If it is present, extract it and add directly. If not, you're not lucky.

Settings window:



Simple part:
Install or remove dictionaries here and specify the minimum length of the word to be hyphenated.

Advanced part:

Hyphenation limits

Some of the hyphenation dictionaries contain special directions: LEFTHYPHENMIN and RIGHTHYPHENMIN. They limit syllable length on either left or right side of word. Example in the picture is 2 characters on left (overwritten with 3) and 3 characters on right for English dictionary. Some dictionaries do not contain these directions, then default limit is 2. If you don't like default or included limits, you can edit limits for each dictionary separately by marking "Override" tick mark.

Tags to ignore/parse

Some people pointed out, that there is no real (and aesthetic) need to hyphenate chapter names. Those are usually enclosed in heading tags: h1, h2 etc. I have added possibility to ignore any tags. Defaults are three headings.
You might also want to hyphenate only particular tags' content. In the example these are p and td (paragraph and table cells).
Special note. If in the "parse" tags you enter p, that means all paragraphs will be parsed and hyphenated, including their child tags, like span, em, strong etc. If you want some special tags to be ignored inside p, add them to "ignore" list. In such case you might configure some particular tags inside p to be ignored, like em, for example.

Custom column

Hyphenation status can be saved to custom column of type "Text, column shown in the tag browser". User also can define, what to write to that custom column, when hyphenation was performed and when hyphens were removed. If column name is empty, status is not written anywhere.

Next, everything is simple. Choose book with EPUB and/or AZW3 formats, click plugin's icon, choose one of the formats and click OK. Book will be hyphenated. There is also handy action to remove soft-hyphens from book via menu.


Version history:
Spoiler:

Version 0.1.3 2020-10-01
Compatibility upgrade for Python 3 and Calibre 5

NB : Versions 0.1.0 ->0.1.2 defunct betas

Version 0.0.9 2019-11-14

Version 0.0.9 2019-11-14
Fix for uppercase dictionary description
(prevented to add new Russian and Swedish dictionaries).

Version 0.0.8 2014-08-08
Get ready for Calibre 2 with Qt5!

Version 0.0.7 2013-04-22
Fix for unicode support of custom text in hyphenated custom column.

Version 0.0.6 2013-04-09
Added custom column to save hyphenation status.

Version 0.0.5 2013-03-29
Community requests and other enhancements
  • Added limits of syllable splits on left and right sides.
  • Added override of the syllable limits via settings.
  • Added "tags to ignore" and "tags to parse" lists via settings.
  • Added nice icon and generally beautified settings dialogue (gets complex).
Inspired by active interest and donations (of course).

Version 0.0.4 2013-03-26
Shortened toolbar button label as per community requests.

Version 0.0.3 2013-03-19
Fixed some issues on user feedback.
Added internal Calibre's HTML parser to avoid encoding problems.
Changed text parsing to XML parsing, much faster and efficient.

Version 0.0.2 2013-03-18
Fixes of the FAIL of first release.

Version 0.0.1 2013-03-18
The very first version of the plugin.
Soft-hyphenation of EPUB and AZW3.
Ialou is offline   Reply With Quote
Old 08-31-2023, 07:53 PM   #510
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,021
Karma: 131375774
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Ialou View Post
I have an AresNote likebook ereader. He is basically a modified android. It has a hyphenation system, but it doesn't work well in my language. Is there any way to install the dictionary on the new device? Can anyone tell me?
The problem is that a lot of Reading programs do not like soft-hyphens. But you can try to see if they work with the program you are using to read eBooks.

The other thing is, if you are going to ask for help, you need to give enough information for someone to actually help. What language do you read eBooks?
JSWolf is offline   Reply With Quote
Reply

Tags
amazon account, formatting, hypenation, hyphenate this, hyphenation, spaces


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] KindleUnpack - The Plugin DiapDealer Plugins 494 05-18-2024 02:34 AM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 08:39 AM
[GUI Plugin] SmartEject JimmXinu Plugins 80 01-28-2024 06:15 PM
[GUI Plugin] Wordpress frescogamba Plugins 11 04-06-2015 09:09 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 05:06 AM.


MobileRead.com is a privately owned, operated and funded community.