How to make a regex that ignores non-alphanumeric characters?

Asked

Viewed 4,863 times

2

For example: a regex that finds a match in the string "currency" but also finds a match in the string (m, .o, e.d...a). That is, it ignores non-alphanumeric characters regardless of position or quantity.

NOTE:I CAN DO IT LIKE THIS BUT THE IDEA IS TO ONLY TYPE ONCE.

 #=\W*

m#o#e#d#a

  • Is a string only or wants to capture multiple words in a string and separate them?

5 answers

3

If I understand with the term "capture", what you want is to remove the characters non-alphanumeric, use replace, the negation regex must be so:

[^a-z0-9]

The sign of ^ inside [...] denies any character, then replace will remove all that are not inside [^....]

Javascript should use with the modifier global called /.../g and with the /.../i if you need case-insensitive, example:

var str = "m,.o,e.d...a";
var resposta = str.replace(/[^a-z0-9]/gi, "");
console.log(resposta);

In PHP it would be like this, with preg_replace:

$str = "m,.o,e.d...a";
$resposta = preg_replace('#[^a-z0-9]#', '', $str);

var_dump($resposta);

Online example on ideone

Note:

It is important to realize that if you want to add more characters to are not removed, such as spaces, just add inside [^....], example that "captures" alphanumerics and spaces:

var str = "m,.o,e.d...a ,.n,.a,. ,.,.c,.a,.r,.t,.e,.i,.r,.a";
var resposta = str.replace(/[^a-z0-9\s]/gi, "");
console.log(resposta);


Capture in an array

If in fact you want to capture, then the correct is to use .match in Javascript and preg_match in PHP, regx would also change, to something a little more complex, considering that it is a string with different words and you want to capture all, so it has to be something like this:

(^|\s)([a-z0-9]*[^\s]*)(\s|$)

Example in Javascript:

var str = "m,.o,e.d...a ,.n,.a,. ,.,.c,.a,.r,.t,.e,.i,.r,.a";
var respostas = str.match(/(^|\s)([^\s]+?)(\s|$)/gi, "");
var allowAN = /[^a-z0-9]/gi;

for (var i = 0, j = respostas.length; i < j; i++) {
    respostas[i] = respostas[i].trim().replace(allowAN, "");
}

console.log(respostas);

1

Use only \w+ this will match all the characters in the intervalode and A-Za-z0-9 once or more.

console.log('m#o#e#d#a'.match(/\w+/g));
console.log('m,.o,e.d...a'.match(/\w+/g));
  • the point is that I want to capture a certain, non-generic string, that is, I want it to capture the m-o-e-d-a sequence independent of alpha-numeric characters between digits.

  • @Felipepadilha vc is using some language or specific text editor?

0

Use this regular expression:

[0-9a-zA-Z]

Test here

  • 2

    the point is that I want to capture a certain, non-generic string, that is, I want it to capture the m-o-e-d-a sequence independent of alpha-numeric characters between digits.

  • @Felipepadilla you mean right alpha-nonnumeric characters?

  • @Felipepadilla for example: in a string "mKoKeKdKa" should not give match, but in a string like "$m! #e@d(a" should give match right?

  • yes, non-alphanumeric characters, pardon

0

I’m not sure I understand the request, but

var s="m.o...e,!d--a";
console.log(s.replace(/\W/g,"").match(/moeda/) ? "y":"n")

(that is, remove the "no letter" first and then search for "coin") can be used to detect if a string contains "currency".

0

It is not possible to do what you want using regex.
In a regular expression you must designate what you want to find through a logical sequence of tokens and quantifiers.
In your case you want to seek a certain sequence, but does not want to use quantifiers or tokens to ignore what is between the sequence (as mentioned in the comment of that reply)
That is, the only way to do this would be to add a global flag in regex as \x (that ignores blank spaces in searches), however there is no such flag for your case, because the result you want to achieve can be obtained through tokens and quantifiers, eliminating the need to create a flag for this.

The result you want can be obtained only through this Regex:

(m\W*o\W*e\W*d\W*a)

You can test it here

Explanation

  • () Represents the capture group your regex will return if the condition is satisfied
  • m is the character that must be found first to start the regex "match"
  • \W*represents that there can be between 0 and infinity characters that are non-alphanumeric
  • o another character that must be after the "m" for the condition to be satisfied
  • \W* has the same effect as the previous
  • and so follow the regex until you find e d and a whether or not containing non-alphanumeric characters between them
  • Yes, this way works, but I wanted a way that I didn’t have to type between every alphanumeric character, the idea was to type only once

  • @Felipepadilha do not understand, what you do not want to type?

  • I don’t want to have to type W*, or this named group, between every non-alpha numeric character

  • between all caraccter *numerical alpha

  • 1

    unfortunately there is no other way to get to this, there is no Flavour de regex that contains a global flag that changes the search method as you want, you can read my explanation in the reply to get a better sense of why @Felipepadilha

Browser other questions tagged

You are not signed in. Login or sign up in order to post.