Save without accents and uppercase SQL Server

Question

Save without accents and uppercase SQL Server

Asked 7 years, 9 months ago

Viewed 1,269 times

3

I’m creating a database, where I need to save all fields in uppercase. Searching the internet, I found the following collate that would work, but it still keeps saving with accents and in upper and lower case. Follow the example:

CREATE TABLE Cidades(
  CodCidade int identity(1,1) not null,
  Descricao varchar(80) COLLATE SQL_Latin1_General_Cp1251_CS_AS not null,
  CodEstado int not null,
  CodIBGE varchar(30) not null,
  Excluido bit not null default 0,
  CONSTRAINT PK_Cidades PRIMARY KEY(CodCidade)
)

Does anyone know what is wrong for this collate not to work? Is there any other?

1

You really want to save without accents and upcase? Wouldn’t it be better to save with everything normal and just ignore them (accent and Casing) when making the queries? I mean, I don’t think it’s a good idea to remove the seats and the Casing that the user chose, do you? If I were to do this, I think it would be much better to treat the viewing layer to keep the user input as upcase and ignore the accents...

– Jéf Bueno

2016/10/03 at 17:17

2 answers

1

The collation SQL cannot help in this case. It will define at most that the accentuation and the marry (uppercase/lowercase) should be ignored in searches.

To make the conversion and in fact persist everything in uppercase, without accent, you need to convert the text before persisting, using some specific function for it.

MS SQL Server has no native function to remove accent. You would have to create this function yourself in SQL, or create this function in the application layer if you have one (in C# or Java, for example).

Algorithm

One of the ways to do it is by using two arrays to make one "from-to", then you traverse the word, character by character, search for the position of the original character in the first array and get the substitute character in the second array, at this position.

Having only the accented characters and their substitutes in the arrays, if a character is not found, just return it itself.

In addition to the accented character substitutions for accented characters, you should also convert everything to upper case (for this the languages already include native function, including SQL Server).

Ready functions of language

Eventually the language you use already has some functions ready to help with part of the work.

For example, C# has the String.Normalize that already helps enough. See this example:

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

The above code first separates the letters from their accents or cedilla using the String.Normalize, and then go through the result by removing the accentuation characters that were separated from the letters, leaving only the letters.

To the result of this function above you still need to apply the string.UpperCase(). Sort of like this:

static void Main(string[] args)
{
    var textoOriginal = "É é ç";
    var textoConvertido = RemoveDiacritics(textoOriginal).ToUpper();

    Console.WriteLine(textoConvertido);
}

The exit code above is E E C.

Okay, thank you so much for your help!

– Davi GN

2016/10/05 at 14:51

Browser other questions tagged sql sql-server

You are not signed in. Login or sign up in order to post.

by Andre Figueiredo • **5,030** points · Answer 1 · 2016-10-03T17:59:40+00:00

Solution 1:

Combine collation function UPPER when adding records in column.

INSERT INTO Cidades(Descricao, CodEstado, CodIBGE)
    values(UPPER('São Paulo'), 1, 1)

output:
CodCidade   Descricao   CodEstado   CodIBGE Excluido
1           SAO PAULO   1           1       0

Your collation should already remove the accents, but using SQL_Latin1_General_CP1253_CI_AI also works.

To search for other collations spin:

select name, COLLATIONPROPERTY(name, 'CodePage') as Code_Page, description
from   sys.fn_HelpCollations()

Solution 2:

Create a separate field, such as DescricaoNormalizada, not to lose the original value:

Descricao varchar(80) not null,
DescricaoNormalizada varchar(80) COLLATE SQL_Latin1_General_Cp1251_CS_AS null,

by filling in the standard value by triggers (triggers) or explicitly in INSERT, resulting in:

CodCidade   Descricao   DescricaoNormalizada    CodEstado   CodIBGE Excluido
1           São Paulo   SAO PAULO               1           1       0

Solution 3:

Do nothing in the database, leave the transformation to the backend, or frontend if the goal is formatting.

Descricao varchar(80) not null,