10-13-2010, 09:37 AM | #1 |
Dylanologist
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
Proper Unicode Declaration
Kaṅkhā-vitaraṇī-purāṇa-ṭīkā and the Kaṅkhā-vitaraṇī-abhinava-ṭīkā
What is the proper Unicode declaration in Sigil to render this line properly? Neither <?xml version="1.0 encoding=ISO-8859-1"?> or <?xml version="1.0 UTF-8"?> work. Sigil deletes the ISO & UTF designations. <head> <link rel="stylesheet" href="../Styles/Stylesheet.css" type="text/css; charset=UTF-8" /> </head> Seems to have no discernible effect. I was given this advice from an expert. I do not know how to implement it properly: "Ideally you should author them with the original diacritics using the appropriate Unicode character and UTF-8 encoding. If there are problems viewing the diacritics in a modern web browser then ensure that you are including the encoding declaration (in XML) and in the XHTML <meta> element." As always, I am thankful for everyone's advice. - Fabe |
10-13-2010, 09:50 AM | #2 |
Sigil Developer
Posts: 8,110
Karma: 5450184
Join Date: Nov 2009
Device: many
|
Hi,
The "typical" utf-8 heading for an xhtml file inside a epub (one of many) looks like the following: <?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> In fact, the above was generated by Sigil itself since it uses utf-8 as it xhtml encoding. This line in your question > <link rel="stylesheet" href="../Styles/Stylesheet.css" type="text/css; charset=UTF-8" /> just says that the encoding of the separate stylesheet is utf-8, not that the .xhtml file itself is. All of that said, unless the content of the xhtml is actually utf-8 byte values/strings then it still will not display properly. Many good text editors are smart enough to find the encoding info or let you set it before editing text within it. Hope this helps, KevinH |
Advert | |
|
10-13-2010, 10:26 AM | #3 | |
Dylanologist
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
Quote:
----- UPDATE: Opening the epub file in Text Wrangler shows the line as: <?xml version="1.0" encoding="utf-8"?> Last edited by Fabe; 10-13-2010 at 10:38 AM. Reason: New Info |
|
10-13-2010, 10:38 AM | #4 |
Sigil Developer
Posts: 8,110
Karma: 5450184
Join Date: Nov 2009
Device: many
|
Hi,
Sigil reads it in to understand the encoding of the file upon input but does not display that encoding in Code View. I don't think it wants you to edit the encoding in Code View as it is determined by Sigil itself. When Sigil saves the file in an epub, it does add the appropriate encoding info. To see this, edit a new file in Sigil, save it as an epub, and then change the name of the file from .epub to .zip and then open the zip archive. Look inside the extracted zip to find the actual .xhtml files. Use any text editor to examine the files themselves. You should see the encoding properly included. Hope this makes things clearer. KevinH |
10-13-2010, 10:45 AM | #5 |
Dylanologist
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
Yes. I discovered this a moment ago by opening the file in Text Wrangler.
So I have the clearest understanding (as much as this sawdust laden cranium can have) my original documents were displaying garbled characters for the foreign and diacritics because I did NOT specify UTF-8 in my HTML file? Thanks - Fabe |
Advert | |
|
10-13-2010, 11:32 AM | #6 |
Sigil Developer
Posts: 8,110
Karma: 5450184
Join Date: Nov 2009
Device: many
|
FYI
Be careful, some readers and even ADE do not seem to be able to show characters not supported in their internal character set becasue they come with limited fonts that only support some encodings (not full utf-8). For example, I simply copy and pasted your test line into a new Sigil document, was able to save it as an epub and open it in Sigil and see everything correctly. But when I opened it in my Sony Reader software, the string looked like garbage with ? marks everyplace. So just because the epub is technically correct, does not mean it will display properly on every reader. Take care, KevinH |
10-13-2010, 11:52 AM | #7 |
Dylanologist
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
Yes. This has been my point to print publishers all along. I cannot guarantee readability on all (or nearly all) ereaders unless the text is rendered as generic as possible. Hence, changing Kaṅkhā-vitaraṇī-purāṇa-ṭīkā to Kankha-vitarani-purana-tika.
|
10-13-2010, 12:05 PM | #8 |
Sigil Developer
Posts: 8,110
Karma: 5450184
Join Date: Nov 2009
Device: many
|
Hi,
The only solution for epubs is to embed your own font that does support the characters you have encoded. There are good free ttf fonts that support a huge range characters that can be embedded. There are a few other places in the Sigil forum that talk about how to embed fonts. Embedding fonts should work on all properly supported epub readers (although I think iBooks might be an exception to that). Take care, KevinH |
10-13-2010, 12:12 PM | #9 |
Dylanologist
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
|
Yes, iBooks does not accept embedded fonts. But the UTF-8 will come out fine. Thanks for your input Kevin. - Fabe
|
10-13-2010, 01:42 PM | #10 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
For reference:
The limited set of default glyphs available in ADE is defined in Appendices D.1 and D.3 of the PDF Reference, 5th Edition. iBooks has a far wider range of characters available in its built-in fonts, though there are significant holes in its support of Cyrillic as well as a variety of other scripts. AFAIK Apple hasn't deigned to release a proper specification of the supported characters, but from reports earlier this year it has good coverage of Latin Extended-A and -B, as well as Latin Extended Additional, which should cover your needs for this text without having to use alternate spellings. So in this case you should be safe relying on iBooks' built-in coverage and embedding a full unicode font to cover ADE and other properly compliant ereaders (iBooks will simply ignore the embedded font). Be sure that you're using fully-formed characters and not combining diacritics. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Namespace declaration | ChrisI | Sigil | 1 | 08-22-2010 06:02 AM |
Encoding declaration in OPF and TOC? | paulpeer | Sigil | 7 | 03-08-2010 03:48 PM |
Declaration of Independence | bill the smith | News | 140 | 10-02-2009 05:01 PM |
Government United States: Declaration of Independence etc, v1, 21 Oct 2007. | Patricia | BBeB/LRF Books | 2 | 10-21-2007 09:37 PM |
Government United States: Declaration of Independence etc, v1, 21 Oct 2007. | Patricia | Kindle Books | 0 | 10-21-2007 06:06 PM |