How to verify the occurrence of a given string and return part of it?

Asked

Viewed 631 times

1

I have a small solution developed on the Node.js platform, at the moment I need to check if a given string exists, if it exists it will be necessary to capture the value INTEIRO contained in this.

String example:

56 rows returned.

The numerical value (integer) will always be variable, but the next occurrence will always be the same: rows returned..

Problem: how do I check via regex if the occurrence xxx rows returned. is there? And if there is, return only the integer value.

Note: remembering that xxx will always be an integer of variable value.

The code snippet above is part of a repeat loop, hence the need to validate the string as described.

  • The answers below solve the question of the numerical value, but to know if the string exists would require a if extra. But you need to know how and where this string comes from.

2 answers

2


how do I check via regex if the occurrence xxx rows returned. is there? And if there is, return only the integer value.

In this case, you should first check if the string is in the given format (do the replace, as suggested by another answer, only works if the string is already in this format - that is, it is something that could be done afterward that you have already validated the format of the string).

To validate whether the string is in this exact format ("number, space, a fixed string"), you can do with and without regex. Follow both below:

Regex

Using regular expressions (regex), something like that:

let s = '56 rows returned.';
let matches = s.match(/^(\d+) rows returned\.$/);
if (matches) {
    let valor = parseInt(matches[1]);
    console.log(`valor=${valor}`);
} else {
    console.log('string não está no formato correto');
}

The regex uses the markers ^ and $, which indicate respectively the beginning and end of the string. Thus, I guarantee that the string only has what is specified in regex (neither one more character nor less).

Next we have \d+. The shortcut \d means "a digit of 0 to 9", and the quantifier + means "one or more occurrences". That is to say, \d+ will pick up one or more digits. This chunk is in parentheses to form a catch group, so I can recover this stretch later (as we will see below).

Then there’s the stretch rows returned (notice the space before rows). That is, the string must have this exact text after the numbers. Finally we also have \. to the endpoint. Since the endpoint has special meaning in regex (means "any character, except line breaks"), I have to make the escape with \ so that the regex understands it only as the "endpoint character".

Next I use the method match to check if the string corresponds to regex. If the string is a different format, the return is null. Otherwise, it returns an array containing the pouch found.

How I used the capture group (\d+), I can recover that value with matches[1] - as it is the first pair of parentheses of regex, so it is the first capture group, which is available at index 1 of the array matches.

Like the expression \d+ assures me that will only have numbers, I can use parseInt no problem, to convert the string to a number. At the end we have the desired value.

The use of parseInt may be optional depending on what you will do with the value. If it will only show on the screen and nothing else, for example, do not need to convert it to number. But if you are going to do calculations or compare with other numbers, then it is necessary to use parseInt.


Another detail is that \d+ will pick up an unlimited amount of digits as the + means "one or more occurrences" (at least 1 digit, no maximum limit). If you want to limit the number of digits to more specific values, you can use one of the options below:

  • \d{2,7}: not less than 2, not more than 7 digits
  • \d{2,}: at least 2 digits, with no upper limit
  • \d{2}: exactly 2 digits

Use whatever suits your use cases best.


Another detail is that this expression also ends up taking numbers as 001 or 0000. If you want to prevent the first digit from being zero, you can change the expression to [1-9]\d*:

  • [1-9] is a character class which takes a digit of 1 to 9
  • \d* means "zero or more occurrences of digits of 0 to 9" (the quantifier * indicates "zero or more occurrences").

So you make sure that the first digit cannot be zero. You can also combine this solution with the quantifiers between brackets if you want to limit the number of digits.

But this regex has another problem: if the string is 0 rows returned., she ignores. Then we have to make another modification:

^([1-9]\d*|0) rows returned\.$

Now she uses alternation (the character |, which means or), with two options: a number that does not start with zero, or a single digit 0. So she accepts 0 rows returned. and 10 rows returned., but does not accept 01 rows returned. and neither 00 rows returned. (see here some examples).


Regex-free

Without using regular expressions, you can make a split in the string, separating it into parts, and checking each of the parts.

let s = '56 rows returned.';
let partes = s.split(' ');
if (partes.length === 3 && partes[1] === 'rows' && partes[2] === 'returned.') {
    if (isNaN(partes[0])) {
        console.log(`valor '${partes[0]}' não é um número`);
    } else {
        let valor = parseInt(partes[0]);
        console.log(`valor=${valor}`);
    }
} else {
    console.log('string não está no formato correto');
}

The split separates the string by spaces and returns an array with the parts. Then, just see if the array has 3 parts (the number, the string rows and the string returned.).

Finally, I check if it is indeed a number, using the function isNaN, and if it is, I can convert it to number, with parseInt.


When string occurs multiple times in text

The above solutions assume that you are treating a single string, and that it can only have exactly one occurrence of xxx rows returned., and nothing else. But if it is a larger string and you want to check several occurrences of this string, just change the code to:

let texto = `texto blablabla 56 rows returned.
outra linha
mais outra linha com 385 rows returned.
blablabla etc`;
let regex = /(\d+) rows returned\./g;
let match;
while ((match = regex.exec(texto)) !== null) {
    let valor = parseInt(match[1]);
    console.log(`valor=${valor}`);
}

I pulled the markers ^ and $, because now the snippet in question could be anywhere in the string. I also use the flag g, which allows you to find all regex occurrences in the string (without this flag, only the first occurrence is found).

Then I use the method exec, that searches for the next occurrence of the string, and then extracts its number. This code takes the two values contained in the text (56 and 385).

1

Buddy, if the value rows returned is fixed, you can use the method replace to remove it, thus:

var str = ('56 rows returned');
var vNumero = parseInt(str.replace('rows returned',''));

The value of vNumero will be the whole 56.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.