You can add a helper class to these treatments by adding these methods to the string type and adding the methods to the treatments you want, removing all the characters you find pertinent. Take the example:
public static class StringHelper
{
public static string RemoverAcentos(this string texto)
{
StringBuilder retorno = new StringBuilder();
var arrTexto =
texto.Normalize(NormalizationForm.FormD).ToCharArray();
foreach (var letra in arrTexto)
{
if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(letra) !=
System.Globalization.UnicodeCategory.NonSpacingMark)
retorno.Append(letra);
}
return retorno.ToString();
}
public static string RemoverEspacamentos(this string texto)
{
string retorno = texto.Replace("\t", "").Replace(" ", "");
return retorno.ToString();
}
public static string RemoverCaracteresEspeciais(this string texto) {
string retorno = texto.RemoverAcentos();
retorno = Regex.Replace(retorno.ToLower(), @"[^a-z0-9\s*]", "");
return retorno;
}
}
And use as follows:
string entrada = "São Paulo SP";
string entradaNormalizada = entrada.RemoverCaracteresEspeciais()
.RemoverEspacamentos()
.ToLower();
string cadastro = "Cidade de São Paulo - SP";
string cadastroNormalizado = cadastro.RemoverCaracteresEspeciais()
.RemoverEspacamentos()
.ToLower();
bool comparacao = cadastroNormalizado.Contains(entradaNormalizada); // true
Yet this is only the first part of your journey, as even after these basic treatments you will only get positive results when the input is lower than the base if compared and are in the same order. If the entry is for example "I live in the city of São Paulo" or "SP - São Paulo". The comparison will be false.
Starting from this point you must enrich your mechanism to work with a hit score by comparing how many A terms there are in B and make your decision to validate the comparison.
But you need something more sophisticated you will need to implement a search API that meets your needs, such as Lucene or Reddog.Search.
Remove double spaces, put everything in minuscule, remove accents and then yes to do the check, would not work?
– David Dias
David, yes but that’s the simplest... what I asked is if you have a much more practical way (e.g. Regex) rather than creating an array of possibilities.
– aa_sp
Without a mechanical search (using an array of possibilities for each element), the way to go is long, you would need to create a search engine like the one google has. The variations are numerous: "users register", "users' database", "são paulo - Cad. user", "users' register" and so on...
– Diego Rafael Souza
I believe that determining the limits of variations (considering or not possibilities that can be expressed in Regex, for example) and seeking a more restricted solution.
– Diego Rafael Souza