Regex to filter words, as long as they are not contained in other words

Asked

Viewed 335 times

2

I am trying to create a Regex that filters by certain words, to be used in the validation of the name entered by the user in a conversation with a chatbot. Since it’s about filtering out bad words, I traded them in for palavraox, so that this question would not be offensive.

What I’ve been able to do so far is:

/^((?!palavrao1|palavrao2|palavrao3|Palavrao1|Palavrao2|Palavrao3|PALAVRAO1|PALAVRAO2|PALAVRAO3).)*$/

The problem is, if a person has the name that contains any of these bad words, they will be filtered. In this case, the name "Cuca" would not be valid for the exact reason.

So I wonder what it would be like to search for the swear word literally, not just if the name contains such bad words.

Note: I know it is possible to use the flag /i to make Regex case-insensitive, but chatbot unfortunately does not accept these flags.

  • Use the Word Boundary, the famous \b(palavra)\b. He will make the word not part of another.

1 answer

2


You can change the regex to:

/^((?!\b(palavrao1|palavrao2|palavrao3)\b).)+$/

I grouped all the alternatives in parentheses, and around them I put \b, which is the shortcut to word Boundary (something like "boundary between words"): basically, it indicates a position where there is an alphanumeric character before and a non-alphinical character after, or vice versa.

Thus, the regex will only consider the curse word when it is a full word. If it is only part of a word (as in "Cuca"), the regex disregards.

Of course you still won’t avoid all cases: the person can still be called João Pinto, for example, and your system will stop (if "Pinto" is on the list of bad words). There will always be a case where the filter fails.

See here an example of regex running (swear).

  • I don’t think the last word in your bad word example is really a bad word. ¯\_(ツ)_/¯

  • 2

    @Cypherpotato Even though I was on an external link, I didn’t want to be too hard on the examples :-)

  • 1

    Put Ancap in the examples. I feel offended when they call me that.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.