The collation SQL cannot help in this case. It will define at most that the accentuation and the marry (uppercase/lowercase) should be ignored in searches.
To make the conversion and in fact persist everything in uppercase, without accent, you need to convert the text before persisting, using some specific function for it.
MS SQL Server has no native function to remove accent. You would have to create this function yourself in SQL, or create this function in the application layer if you have one (in C# or Java, for example).
Algorithm
One of the ways to do it is by using two arrays to make one "from-to", then you traverse the word, character by character, search for the position of the original character in the first array and get the substitute character in the second array, at this position.
Having only the accented characters and their substitutes in the arrays, if a character is not found, just return it itself.
In addition to the accented character substitutions for accented characters, you should also convert everything to upper case (for this the languages already include native function, including SQL Server).
Ready functions of language
Eventually the language you use already has some functions ready to help with part of the work.
For example, C# has the String.Normalize
that already helps enough. See this example:
static string RemoveDiacritics(string text)
{
var normalizedString = text.Normalize(NormalizationForm.FormD);
var stringBuilder = new StringBuilder();
foreach (var c in normalizedString)
{
var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}
The above code first separates the letters from their accents or cedilla using the String.Normalize, and then go through the result by removing the accentuation characters that were separated from the letters, leaving only the letters.
To the result of this function above you still need to apply the string.UpperCase()
. Sort of like this:
static void Main(string[] args)
{
var textoOriginal = "É é ç";
var textoConvertido = RemoveDiacritics(textoOriginal).ToUpper();
Console.WriteLine(textoConvertido);
}
The exit code above is E E C
.
You really want to save without accents and upcase? Wouldn’t it be better to save with everything normal and just ignore them (accent and Casing) when making the queries? I mean, I don’t think it’s a good idea to remove the seats and the Casing that the user chose, do you? If I were to do this, I think it would be much better to treat the viewing layer to keep the user input as upcase and ignore the accents...
– Jéf Bueno