Sqlite has case insensitive only to ASCII natively, because one of the goals of the database is precisely to be Lite :) and there is the understanding that any application that relies on specific language comparison will already have a function for this, since the database is just one of the points where it will be used.
To have an order in the form you want, using your own criteria, just implement a new collation, using the function family sqlite3_create_collation( )
, which allows you to point out a function of your own in C, which makes the comparison between strings in the way you find most convenient.
After having this collation implemented, just include it in the desired query:
SELECT nome, endereco FROM cadastro ORDER BY nome COLLATE CollationExemploPtBr
Details of the function to be implemented range from the need of each, the important thing is to know that its function will receive the two strings to be compared, and must return 0 for strings considered equal (action and action, in English, should return 0), some positive for string1 greater than string2, and one negative for string1 smaller than string2.
One of the advantages of having your own collation is that you can not only compare characters by accentuation, but also make & and E have the same "weight" when ordering a string, or anything else that is convenient for the desired result.
Since you call the create_collation function before you start using the database functions themselves, you can use your collation (or several different ones at the same time, as you wish) normally as index. This is especially important to maintain DB performance without losing the freedom to sort as you wish. These functions work as efficiently or even better than native Sqlite functions, depending only on the quality of the code implemented.
Remember that the collation is only a part of the process, because when locating a string with LIKE
, for example, you should also implement a compatible function, and you can use the function family create_function()
that uses the same principle, pointing out a function of your code that processes the result the way you want.
Here’s a simplified example, adapted from a function I use in some apps:
// Atencao: esta implementacao esta muito simplificada,
// e foi postada como mero exemplo. Por ter sido rapidamente
// adaptada e simplificada de um codigo de uso interno para
// ser postada no SO, pode conter erros de tipagem e alguma
// condicao nao tratada (como utf mal formado) ou bugs.
//
// Esta implementacao somente esta considerando os caracteres
// acentuados entre u+0000 e u+00ff, para fins de exemplo, e
// considerando as equivalencias basicas em pt_BR
//
// -- Bacco
sqlite3 *db;
if (SQLITE_OK==sqlite3_open( "databasename.db", &db)
{
sqlite3_create_collation( db,
"CollationExemploPtBr",
SQLITE_UTF8, /* Nota [1] */
&example_table_ptbr, /* Nota [2] */
&example_collation );
// [1] A pergunta original menciona 8859-1. Notar que este codigo
// trata de UTF, mas basta ajustar os "ifs" pra ignorar UTF e
// usar a tabela com 256 caracteres "puros".
//
// [2] Notar que estou usando um cargo pra nao referenciar o
// example_table_ptbr direto no example_collation.
// O SQLite repassa esse pointer ao chamar a funcao indicada.
// Este e um otimo jeito de usar a mesma funcao com tabelas diferentes.
// Se sua funcao ja tiver a tabela embutida,
// basta usar null neste parametro.
// ... seu codigo aqui ...
}
static const char example_table_ptbr[] = {
/* u+0000 .. U+007F */
0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07, 0x08,0x09,0x0A,0x0B,0x0C,0x0D,0x0E,0x0F,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17, 0x18,0x19,0x1A,0x1B,0x1C,0x1D,0x1E,0x1F,
0x20,0x21,0x22,0x23,0x24,0x25,'E' ,0x27, 0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37, 0x38,0x39,0x3A,0x3B,0x3C,0x3D,0x3E,0x3F,
0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47, 0x48,0x49,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,
0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57, 0x58,0x59,0x5A,0x5B,0x5C,0x5D,0x5E,0x5F,
0x60,'A' ,'B' ,'C' ,'D' ,'E' ,'F' ,'G' , 'H' ,'I' ,'J' ,'K' ,'L' ,'M' ,'N' ,'O' ,
'P' ,'Q' ,'R' ,'S' ,'T' ,'U' ,'V' ,'W' , 'X' ,'Y' ,'Z' ,0x7B,0x7C,0x7D,0x7E,0x7F,
/* u+0080 .. U+00FF */
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20, 0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20, 0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x20,'!' ,'C' ,'L' ,0x20,'Y' ,0x20,0x20, 0x20,'C' ,'A' ,'<' ,0x20,0x20,'R' ,0x20,
'O' ,0x20,'2' ,'3' ,0x20,'U' ,0x20,0x20, 0x20,'1' ,'O' ,'>' ,0x20,0x20,0x20,'?' ,
'A' ,'A' ,'A' ,'A' ,'A' ,'A' ,0x20,'C' , 'E' ,'E' ,'E' ,'E' ,'I' ,'I' ,'I' ,'I' ,
'D' ,'N' ,'O' ,'O' ,'O' ,'O' ,'O' ,'X' , 'O' ,'U' ,'U' ,'U' ,'U' ,'Y' ,0x20,'S' ,
'A' ,'A' ,'A' ,'A' ,'A' ,'A' ,0x20,'C' , 'E' ,'E' ,'E' ,'E' ,'I' ,'I' ,'I' ,'I' ,
'D' ,'N' ,'O' ,'O' ,'O' ,'O' ,'O' ,'/' , 'O' ,'U' ,'U' ,'U' ,'U' ,'Y' ,0x20,'Y' };
int example_collation(void* pArg,int n1,const void* s1,int n2,const void* s2)
{
int v = 0;
unsigned char b1;
unsigned char b2;
unsigned char *t = (unsigned char *)pArg;
unsigned char *p1 = (unsigned char *)s1;
unsigned char *p2 = (unsigned char *)s2;
unsigned char *c1 = p1 + n1;
unsigned char *c2 = p2 + n2;
while( ( p1 < c1 || p2 < c2 ) && v == 0) {
b1 = *(p1++);
b2 = *(p2++);
if ( p1 > c1 )
b1 = 0x20;
else if ( b1 < 0x80 )
b1 = t[ b1 ];
else if( ( b1 & 0xFE ) == 0xC2 )
b1 = t[ 0x80 | ( ( b1 << 6 ) & 0x40 ) | ( *(p1++) & 0x3F ) ];
if ( p2 > c2 )
b2 = 0x20;
else if ( b2 < 0x80 )
b2 = t[ b2 ];
else if( ( b2 & 0xFE ) == 0xC2 )
b2 = t[ 0x80 | ( ( b2 << 6 ) & 0x40 ) | ( *(p2++) & 0x3F ) ];
v = b1 - b2 ;
}
return v;
}
Have you considered creating an additional standard column?
– utluiz
@utluiz Already! It’s over overhead yet. In some specific situation may even be an (inelegant) solution, but in most cases, it would be absurd to do so.
– Maniero
Create an additional column is a overhead which may be valid in some cases where search efficiency is crucial and comparison is not solved only with encoding. An application is to allow the search of words with small variations of writing with the algorithm [Soundex] (http://en.wikipedia.org/wiki/Soundex).
– utluiz
Ah, and whatever the solution, for it to be efficient, it is not enough to apply a comparison clause in the query, it must act on the bank’s indices to not cause a Full Table Scan.
– utluiz
@utluiz I agree with his last statement and that the additional column can valid in some cases.
– Maniero
I don’t know if it answers the question but on Android if you put
collate localized
in the search query. And in the DB create you putdb.setLocale(Locale.getDefault());
works perfectly in terms of case insensitive and Accent insensitive.– Jorge B.
I don’t know the details on Android but probably the Android Sqlite is customized with extra functions. But what I wanted is a solution for other platforms. Bacco has already given the solution that works. Thank you.
– Maniero