How to find, retrieve and delete a certain value in a multiline string in Javascript?

Asked

Viewed 82 times

2

I need to locate a value (has defined default) in a text, recover it and then delete the record.

const content = `Exemplo de texto:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an.

2 rows returned.

Lorem!`

In the above text the value I want to recover is: 2 rows returned.

Conditions to track value:

  • The first character of the line will always be an integer number of variable
  • Followed by space
  • Followed by 2 to 3 words
  • finished with end point or exclamation mark

Note: the line break character is \r\n

  • Your question seems to have some problems and your experience here in Stack Overflow may not be the best because of this. We want you to get along here and get what you want, but for that we need you: Stack Overflow Survival Guide in English.

1 answer

1

A detail that was not clear: you say "delete the record", does that mean that the snippet in question should be removed from the string? Anyway, let’s see some alternatives...


If you only want to retrieve the information from the string (the snippets that correspond to the given pattern), an alternative is:

const content = `Exemplo de texto:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an.

2 rows returned.

Lorem! Abaixo outro registro

42 the answer!

abaixo registros inválidos
another 2 rows returned
2 rows returned?`;

for (const registro of content.matchAll(/^\d+ \w+ \w+[.!]$/gm)) {
    console.log(`Encontrado: ${registro}`);
}

I use the markers ^ and $, that usually indicate the beginning and end of the string, but how I used the flag m, this activates the mode multiline and the markers ^ and $ indicate the beginning and end of a line. This ensures that the line should only have what is indicated in the expression.

Then I use \d+ for "one or more digits", \w+ for "one or more alphanumeric characters" (and below we will see that we can improve this, if you want), and at the end I use the character class [.!], which means "an end point or an exclamation".

regex also uses the flag g for it to find all occurrences that exist in the string. Without this option, regex would find only the first occurrence in the string.

The code above finds the snippets 2 rows returned. and 42 the answer!, but does not find 2 rows returned?, for example, because this does not end with a period or exclamation.


Note: The method matchAll currently not compatible with all browsers. Another alternative is to use exec, that has a better compatibility:

const content = `Exemplo de texto:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an.

2 rows returned.

Lorem! Abaixo outro registro

42 the answer!

abaixo registros inválidos
another 2 rows returned
2 rows returned?`;

let regex = /^\d+ \w+ \w+[.!]$/gm;
let registro;
while ((registro = regex.exec(content)) !== null) {
    console.log(`Encontrado: ${registro}`);
}


Improving the definition of "word"

The shortcut \w corresponds to letters, numbers and the character _, then \w+ considers things like _123_, 1a2b, 123 and ___ are "words".

If you want to limit it to just letters, you can switch \w+ for [a-zA-Z]+, for example, so regex only considers letters from a to z, upper and lower case. You could also use the flag i to make the regex case insensitive, so it doesn’t differentiate between upper and lower case:

let regex = /^\d+ [a-z]+ [a-z]+[.!]$/gmi;

Using one or the other depends on how your data is. If there are chances of false positives using \w, prefer be as specific as possible and just put what in fact you want to capture.

As for the numbers, in another question from you I’ve already responded by giving other options, if you want to be more specific.


About "delete the record"

If you meant to delete these snippets from the string, one way to do it would be:

const content = `Exemplo de texto:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an.

2 rows returned.

Lorem! Abaixo outro registro

42 the answer!

abaixo registros inválidos
another 2 rows returned
2 rows returned?`;

console.log(content.replace(/^\d+ \w+ \w+[.!]$/gm, ''));

The replace above uses the same regex to replace the found snippet with an empty string. The detail is that the markers ^ and $ only mark positions of the string (the beginning and end of the line), but they do not include characters \n and \r, then they remain in the string. If you want to remove them, just include them in regex:

console.log(content.replace(/^\d+ \w+ \w+[.!]\r\n/gm, ''));

Thus, the entire line containing the number and the two words is removed, including the \r and \n.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.