07-29-2014, 04:04 PM | #1 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
SQLite: Case-insensitive matching of Unicode characters
According to SQLite Frequently Asked Question #18 at http://sqlite.org/faq.html#q18 :
(18) Case-insensitive matching of Unicode characters does not work. The default configuration of SQLite only supports case-insensitive comparisons of ASCII characters. The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability. Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an "ICU" extension that does these overloads. So, COLLATE NOCASE in a SQLite table definition or in a SELECT is only good for pure ASCII comparisons. Unless, of course, what is described above has been implemented. Does anyone know if Calibre's SQLite has already been implemented with Unicode UTF-8 case insensitive matching as described above? For example, Calibre would need this capability when searching for Tags in Unicode UTF-8 that have very non-ASCII characters, such as in the German word sachbüch, the Hindi word NAHĪMṀ, the Spanish word noficción, the Turkish word gerçek, and so forth. Ditto for Authors, Title, and Series. Thanks in advance. Last edited by DaltonST; 07-29-2014 at 08:51 PM. |
07-29-2014, 11:11 PM | #2 |
creator of calibre
Posts: 44,377
Karma: 23764838
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
calibre only uses sqlite as a disk storage format, not a database. All sorting/searching is performed using ICU on an in memory normalized view of the data from the database.
|
Advert | |
|
09-15-2014, 03:48 AM | #3 |
Junior Member
Posts: 7
Karma: 10
Join Date: Sep 2014
Device: Kindle Touch
|
When I update the cc.db, error prone.
no such collation sequence: icu How can I get rid of it? |
Tags |
lowercase, nocase, sqlite, unicode, utf-8 |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre in mixed case sensitive/insensitive environments | Xwang | Development | 6 | 08-16-2022 03:06 PM |
find unicode characters | Sunlite | Editor | 12 | 01-05-2014 07:04 AM |
¿Convert unicode decomposed characters to unique/normal characters? | JohnQwerty | Calibre | 3 | 04-05-2012 12:08 PM |
Search filters: accented characters not matching plain ones | riki | Calibre | 4 | 11-26-2011 07:38 AM |
Small bug? Case-insensitive tags. | Arrghus | Calibre | 9 | 07-12-2011 01:03 AM |