How to find reserved word in a sentence?

Asked

Viewed 238 times

5

I am creating a routine to search reserved words in a text (name of people). I have tested with Contains() and IndexOf(). Works well most of the time. However in some words the return does not satisfy. Example: in the name CORALLINE returns the reserved word ORAL.

Below the code performed:

    public JsonResult GetPalavrasReservada1(string frase)
    {
        var palavras = db.TBL_PALAVRA_RESERVADA.Where(pal => pal.PLR_ATIVO == 0).Select(i => new {i.PLR_DESC}).ToList();

        var palavrareservada = "";

        for (int i = 0; i < palavras.Count; i++)
        {
            if (frase.ToUpper().Contains(palavras[i].PLR_DESC.ToString()))
            {
                return Json(palavras[i].PLR_DESC.ToString());
            }
        }

        return Json(palavrareservada);
    }

I’m using wrong logic or methods?

  • 2

    And isn’t that right? In fact the name "Coralina" contains the word "oral".

  • @LINQ does not agree with the definition of word adopted here. You are thinking of substring, here a word is a properly delimited token. Since "coralline" forms a single token, it does not contain "oral"

  • Wouldn’t be the case to check has blank space before and after the word?

2 answers

1

If you find a reserved word the algorithm stops and you can’t find another one, this is what you want?

If you just want to take whole words you have to break all words and do the check. Only this is not so simple unless you control the content of the text previously and can ensure that it does not have certain patterns. If you can make it break the phrase just give one Split() by space and/or other characters that break a word. But it’s unlikely that you won’t have any exceptions to this rule, so you’d have to take matters into your own hands. Obviously this would require an extra loop to test all reserved words in all words of the phrase.

It would be more performative to make a parser specialized, but complicates even more.

It may be that a Regex helps, but I do not like the idea, it is easy to do wrong. Has examples in the OS (another and with LINQ).

A clear mistake is to use the ToUpper(), this is not correct. See the right way.

The variable palavrareservada has no function in this code, can be deleted.

Picking up words straight from the database may not be the best strategy.

  • Regarding this problem I did tests with the use of the split and also regex that separate the text into words and, in this case, ends up solving. 1. With respect to Toupper() that you said is incorrect. Exactly what? 2. If you have a reserved word with two words (NOT IN, for example) the treatment would have to be different than the one used with split or regex.

  • It is inefficient and there are cases (not in Portuguese, it is true) that does not give the expected result.

0


I would use Regex with the Pattern \d to verify the existence of whole words in the sentence.

Regex.IsMatch("CAROLINE", @"\bORAL\b",RegexOptions.IgnoreCase); // false
Regex.IsMatch("Discurso oral", @"\bORAL\b",RegexOptions.IgnoreCase); // true
Regex.IsMatch("Nada consta contra", @"\bNada Consta\b",RegexOptions.IgnoreCase); // true

In your code, it would look like this:

if ( Regex.IsMatch(frase, "\\b" + palavras[i].PLR_DESC.ToString() + "\\b",RegexOptions.IgnoreCase ))
  • Hello @Fernando, this way, using Regex.Ismatch ended up solving the problem. Very grateful.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.