Regex to capture words between two characters

Asked

Viewed 3,690 times

5

My problem is this:

let texto = "teste :1: e também teste :2:"

What I need to do is basically take the positions where these characters appear :1: and :2: with regex, since what appears between the two points is dynamic.

I tried to texto.match('^[:.:]$'), but it didn’t roll (I’m starting with regex).

2 answers

11

To reply by @Leandrade explains very well how to solve the problem. I would just like to add an explanation about your attempt (why it didn’t work), in addition to proposing some alternatives.

Unlocking the expression you used (^[:.:]$):

  • ^ and $ mean, respectively, the beginning and end of the string
  • the square brackets ([]) delimit a class (or set) of characters, that makes the expression match with an occurrence of any character within the brackets.

For example, [ab] means: the letter "a" or the letter "b". If I want the letter "a" followed by the letter "b," I have to take it out of the brackets.

So, [:.:] means: the character ":" or the character "." or the character ":" (that is, the two points are redundant). Depending on the regex engine being used, this may give an error (some do not allow repeated characters inside the brackets and give error, since this is redundant).

Another detail is that the point (.) inside the brackets loses its special meaning of "any character" and becomes only the "point itself".

In short, the expression ^[:.:]$ means "string that has only one occurrence of : or .". How you used ^ and $ to indicate the start and end of the string, the expression only does the match for single character strings.

Alternatives to the solution

To solution of #Leandrade seems to be what you need. But if you want to refine the expression a little more, you can change it according to your use cases.

If among the : can only have numbers, I suggest changing the . for shortcut \d, or to [0-9] (expressions that correspond to any digit from 0 to 9). Both expressions below do this:

:(\d+):
:([0-9]+):

I also changed the quantifier * (zero or more occurrences) to + (one or more occurrences). Using *, there is the risk of doing the match with :: accidentally. Using the +, ensures that there has to be at least one digit.


If they can have numbers or letters, an alternative is to use the brackets by placing these characters inside. As we saw in the previous example, it is possible to define character ranges with the hyphen, so the expression would be:

// letras maiúsculas, minúsculas, ou dígitos de 0 a 9
:([A-Za-z0-9]+):

If I may have anything among the :, an alternative is to use a negated character class: you can put a ^ inside the clasps, and then he goes on to do match with any character that nay be inside of them.

In that case, I would [^:], which means "any character that is not :". The expression would look like this:

:([^:]+):

That is: "two points, followed by one or more occurrences of any character other than two points, followed by two points".


Of course you can use .*? as already suggested. I would only change to .+? to ensure that there is at least one character between the :.

But if you already know what the possible values (only numbers, or letters and numbers, or any other rule) I suggest using a more restricted expression than .*, to avoid false positives.

And as you mentioned you need to pick up the positions where the characters appear, an alternative is to use the property index of the object returned by exec, indicating the position in which the match was found:

let texto = "teste :1: e também teste :2:";
let regex = new RegExp(':([^:]+):', 'g');
let result;

while ((result = regex.exec(texto))) { 				
    console.log("Encontrado entre os : -> " + result[1]);
    console.log("Posição inicial do : -> " + (result.index));
}

  • 2

    Top man explanation.

  • 2

    He taught a class. I learned some interesting details. Top!

  • 2

    Excellent explanation, very good indeed. I need to do a hackerrank and gave me a light.

10


':(.*?):', 'g') - This expression basically means:

  • . takes everything on the same line, * finds 0 or more occurrences in the text, ? Find the first character or the following characters.

  • 'g' returns all specified results found not only the first.

let texto = "teste :1: e também teste :2:";
let regex = new RegExp(':(.*?):', 'g');
let resul;

while ((resul = regex.exec(texto))) { 				

  console.log(resul[1]); 		// mostra o que está entre os :
}

OBS: Very good reference can be seen here.

  • 2

    RegExp is most useful when you have dynamic text in regex, otherwise it is best to use a regex literal. In your case it’s more direct to do let regex = /:(.*):/g. And the 'g' is of global

  • That’s it. Thanks man!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.