Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-04-2024, 05:36 PM   #1
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Help - need epub sample that uses non-ascii file names and paths

Sigil has received a bug report about how it could not handle for input an epub that used non-ascii file names made by InDesign.

I have tried the official non-ascii file name testcase from the epub3 samples page and Sigil had no problems with it.

But that testcase used Japanese and I can not tell if Japanese has the same issues with combined characters and unicode normalization (nfc vs nfd, etc) as more European non-ascii characters often have.

So can anyone either provide or point me to a good test case that uses non-ascii filenames in an epub that may use combined forms (when one accent is added to a base character in decomposed form), or alternatively when one character can be decomposed into two separate ones (typically involving one or more on accents and other diacritic markings).

If anyone has one, I would love to have a copy of it even if just a single filename that is non-ascii in a sample epub.

In addition, is this a known problem with InDesign either not properly unicode normalizing its non-ascii filenames to NFC as the spec calls for, or not properly url encoding them in the manifest? Or does the epub zip archive produced by InDesign not proerly set the flag that tells it to use utf-8?

Also inside a zip archive that uses the utf-8 flag the order of decomposed vs composed characters would matter. Does anyone know what the rule of normalization is for the files inside a zip archive? On macOS filesystem paths are typically stored as decomposed (NFD), but the web and elsewhere seems to all be NFC (composed). So could a zip archive built on a Mac by InDesign be using file names/paths stored inside the zip using NFD normalization form, when if fact it should probably be NFC form?


I just do not use InDesign, so I have no idea if this is an InDesign issue or a hidden Sigil bug.

Thanks,

KevinH

Last edited by KevinH; 05-04-2024 at 05:50 PM.
KevinH is offline   Reply With Quote
Old 05-05-2024, 07:54 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,640
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
I created a simple test file with Sigil that contains Latin, Hebrew and Arabic file names with combining diacritics. It also contains one file name without combining diacritics. Sigil had no problems with any of them.

However, EPUBCheck reported that it couldn't find the second résumé.xhtml file, which contained the letter e followed by a combining acute diacritic.
Code:
Col: -1: ERROR(RSC-001): File "OEBPS/Text/re%CC%81sume%CC%81.xhtml" could not be found.
But since the path seems to be properly URL encoded this might actually be an EPUBCheck bug.

Maybe this is the problem that the user is referring to.

IMHO, using combining Unicode characters instead of precomposed characters isn't a good idea anyway, because many fonts don't contain glyphs for them.
Attached Files
File Type: epub accented_chars.epub (3.6 KB, 145 views)

Last edited by Doitsu; 05-05-2024 at 07:58 AM.
Doitsu is offline   Reply With Quote
Advert
Old 05-05-2024, 11:11 AM   #3
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Thank you Doitsu!

Good to know that even Epubcheck gets things wrong sometimes.
KevinH is offline   Reply With Quote
Old 05-05-2024, 01:58 PM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,160
Karma: 5450818
Join Date: Nov 2009
Device: many
Ah, it seems Sigil on a mac can create an .zip (.epub) with Unicode Normalization Form D that will not be in sync with manifest urls that are Unicode Normalization Form C (and visa versa).

So at the boundary Sigil will have to convert zip files names and all files to be Unicode Normalized to form C.

The worst is that macOS HFS+ file systems forced a modified Form D on all paths and filenames but its new APFS filesystem does not meaning that mixed normalization file names and paths can both exist on a mac with the newer APFS filesystem.

Just what I needed.
KevinH is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
what is the transliteration map for non-ascii names and titles zubzero Library Management 1 06-08-2020 08:46 AM
Non-ASCII File Names Hopkins Editor 5 01-18-2018 09:02 AM
Convert an epub to a pdf from another pdf sample file SvenSND Conversion 3 09-02-2016 05:29 PM
Junk chars in splitted file names converting lit to epub ozofmoz Conversion 2 07-15-2011 03:53 AM
Ascii file ProDigit Lounge 1 12-25-2008 11:08 PM


All times are GMT -4. The time now is 04:10 AM.


MobileRead.com is a privately owned, operated and funded community.