04-11-2019, 07:48 AM | #1 |
Member
Posts: 22
Karma: 10
Join Date: Apr 2019
Device: edge
|
xml:lang
Hello, I am converting an EPUB-file to docx using calibre. In the text there are several description lists, i.e. glossaries, With the English and the translated Norwegian text:
<dl><dt>mind: </dt><dd lang="no" xml:lang="no">sinn</dd> My question is: Will this information (xml:lang) be transferred to the docx-file, so that the synthetic Speech will change accordingly? /Tage |
04-11-2019, 11:47 AM | #2 |
creator of calibre
Posts: 44,509
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try it and see. You can convert to docx and convert back to see if the lang information was preserved.
|
Advert | |
|
04-25-2019, 05:46 AM | #3 |
Member
Posts: 22
Karma: 10
Join Date: Apr 2019
Device: edge
|
Hi, I convert the epub to docx and then back to epub to check whether the Language metadata is preserved.
My original epub file has the following Language metadata: <dc:language xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dtb="http://www.daisy.org/z3986/2005/dtbook/" xmlns:d="http://www.daisy.org/ns/pipeline/data" id="language_1">en</dc:language> Then I convert the epub to docx and then back to epub to check whether the Language metadata is preserved, but now it is changed from "en" to "nb": <dc:language>nb</dc:language> This is an English Learning book so it is essential that the spoken Language is indeed English in the docx. In the text glossary the Language is preserved e.g. <p>Page 221:</p><dl><dt>completely: </dt><dd lang="no" xml:lang="no">fullstendig</dd><dt>out of fashion: </dt><dd lang="no" xml:lang="no">ikke på moten</dd> Konvertert tilbake til epub fra docx: <p class="block_6"><span class="text_">Page 221:</span></p> <ul class="list_"> <li class="block_9"><span class="text_">completely: </span><span lang="no" class="text_">fullstendig</span></li> So it seems that the main language "en" is lost converting to docx. |
04-25-2019, 05:53 AM | #4 |
creator of calibre
Posts: 44,509
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Set the language in the book metadata and it will be sued during conversion.
|
04-25-2019, 06:47 AM | #5 |
Member
Posts: 22
Karma: 10
Join Date: Apr 2019
Device: edge
|
OK, how can that be achieved programmatically? Currently my programmatic interface is:
try: self.utils.report.info("Konverterer fra XHTML til DOCX...") process = self.utils.filesystem.run(["/usr/bin/ebook-convert", html_file, os.path.join(temp_docxdir, epub.identifier() + ".docx"), "--no-chapters-in-toc", "--toc-threshold=0", "--docx-page-size=a4", # "--linearize-tables", "--extra-css=/home/statped/Dokumenter/produksjonssystem/produksjonssystem/extra.css", "--embed-font-family=Verdana", # microsoft fonts must be installed (sudo apt-get install ttf-mscorefonts-installer) "--docx-page-margin-top=42", "--docx-page-margin-bottom=42", "--docx-page-margin-left=70", "--docx-page-margin-right=56", "--base-font-size=13", "--font-size-mapping=13,13,13,13,13,13,13,13"]) |
Advert | |
|
04-25-2019, 07:45 AM | #6 |
creator of calibre
Posts: 44,509
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
--language
|
Tags |
xml:lang |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
xml:lang oddities | jcsalomon | ePub | 1 | 06-06-2016 06:28 PM |
xml:lang empty (pdf to epub) | fxp33 | Conversion | 3 | 05-08-2015 12:40 AM |
After merging all the .xml files, how do you divide it back into .xml files? | automa | Sigil | 10 | 08-13-2013 08:43 AM |
Russian lang. | cavaughan | Calibre | 2 | 08-06-2009 10:26 PM |
Why xml?? | real_yoni | Sony Reader Dev Corner | 1 | 01-20-2009 12:45 PM |