Exception in Split()

Asked

Viewed 83 times

1

I have in my system an index of terms, know which word is being used most among the messages of the users, for this, I created a class Termos and I made a loop to generate a new object for each word, so I will know in which Flow (User’s Page) and Information (Message within the Flow) was used this Term.

My question is as follows: If the user type something like "Hello! All right?" he will save "Hello!" and "well?" along with some elements I don’t want, commas, exclamations and special characters. Is it possible to define exceptions to this? " Say" for Split() not to pick up these characters?

Follow the excerpt of the code I’m using:

Terms.Cs

namespace ProjetoASPNETMVC.Models
{
    public class Termos
    {
        [Key]
        public int TermoID { get; set; }
        public String Palavra { get; set; }
        public Fluxo Fluxo { get; set; }
        public Informacao Informacao { get; set; }
    }
}

Informacaocontroller.Cs

for (int x = 0; x < i.Mensagem.Split(' ').Length; x++)
                {
                    Termos termo = new Termos();
                    termo.Palavra = i.Mensagem.Split(' ')[x].ToUpper();
                    termo.Fluxo = db.Fluxo.ToList().Where(j => j.Informacoes.Contains(i)).FirstOrDefault();
                    termo.Informacao = i;
                    db.Termos.Add(termo);
                }

I imagine it is possible to do this if counting character by character, I would like another alternative, I imagine that checking letter by letter will end the performance of the system.

  • I don’t quite understand what you want to do, but I can already say that the Split() will not solve it. What you are doing should make sense to you, but who does not know what it is, there is no way to know what the goal is. Something tells me that even the organization of the project is wrong, but this is another matter.

  • I need to separate a message, and create an object with each word of that message to know how many times that word was used in the system, but if one user type "Hello", and another type "Hello!" will be different words, I wonder how to not count special characters.

1 answer

2


You can try a different approach. Before performing the split you can perform the "Cleaning" of string, removing the unwanted characters.

Following your example would look like this:

var padrao = @"[^\w\s]";
var regex = new System.Text.RegularExpressions.Regex(padrao);

//Realiza a "Limpeza" da string
Mensagem = regex.Replace(Mensagem, string.Empty);

for (int x = 0; x < i.Mensagem.Split(' ').Length; x++)
{
    Termos termo = new Termos();
    termo.Palavra = i.Mensagem.Split(' ')[x].ToUpper();
    termo.Fluxo = db.Fluxo.ToList().Where(j => j.Informacoes.Contains(i)).FirstOrDefault();
    termo.Informacao = i;
    db.Termos.Add(termo);
}

Note that the change was made in the contents of the variable Mensagem before the split in the loop for.

Details of Regex: [ #Start of character block. #Negation. Not these characters (letters, numbers). w #characters of words. s #Space characters. ]#End of character block.

To Regex used in this response was based on in this SOEN response

  • The VS ta pointing an error in " w s", "Unrecognized escape Sequence"

  • I modified the answer. You have to put the @ before the pattern string

Browser other questions tagged

You are not signed in. Login or sign up in order to post.