Bold words from my HTML using Regex

Asked

Viewed 541 times

2

Basically, I need to select all the words "Pattern" in my HTML with Regex and then replace them with your version in bold through Javascript.

My code works, however, only for the first element, for some reason it doesn’t make the others bold. (Obs: this same code was used to capitalize a few words, and worked perfectly on all).

    function patternEmNegrito() {
        const regex = /pattern/ig
        const texto = document.querySelector('.texto')
        const resultado = texto.innerHTML.match(regex)
    
        let textoFinal = texto.innerHTML
        for(let i = 0; i < resultado.length; i++){
            textoFinal = textoFinal.replace(resultado[i], `<b>${resultado[i]}</b>`)
        }
        return texto.innerHTML = textoFinal
    }
    patternEmNegrito()
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <title>selecionando e modificando textos com regEx</title>
    </head>
    <body>
        <h1>texto</h1>
    
        <div class="resultado">
    
        </div>
        
        <div class="texto">
            <h1>What Is a Regular Expression?</h1>
            <p> regular expression is a sequence of characters that forms a search pattern.</p>
            
            <p>When you search for data in a text, you can use this search pattern to describe what you are searching for.</p>
            
            <p>A regular expression can be a single character, or a more complicated pattern.</p>
            
            <p>Regular expressions can be used to perform all types of text search and text replace operations.</p>
        </div>
        <script src="./regex.js"></script>
        
    </body>
    </html>

2 answers

2


Capitalized works because JS is case sensitive, that is, after it changes the first occurrence, it is no longer found by replace and every loop in the loop for will fetch again the first occurrence that is still in lowercase.

This is because replace only replaces the first occurrence. In your case there, you are creating 3 <b> nested in the first occurrence of the string Pattern, because the replace inside the loop is only replacing the first Pattern found.

You don’t need a bow for for this and wouldn’t even work, because even iterating the array created by .match(), does not change the way replace works. replace accepts Regex and the flag g already replaces everything at once.

There’s no need for .match, but kept in the example below only for illustration:

function patternEmNegrito() {
    const regex = /pattern/ig;
    const texto = document.querySelector('.texto')
    const resultado = texto.innerHTML.match(regex)

    let textoFinal = texto.innerHTML
    const re_replace = new RegExp(resultado[0], 'g');
    textoFinal = textoFinal.replace(re_replace, `<b>${resultado[0]}</b>`)
    return texto.innerHTML = textoFinal
}
patternEmNegrito()
<h1>texto</h1>

<div class="resultado">

</div>

<div class="texto">
  <h1>What Is a Regular Expression?</h1>
  <p> regular expression is a sequence of characters that forms a search pattern.</p>

  <p>When you search for data in a text, you can use this search pattern to describe what you are searching for.</p>

  <p>A regular expression can be a single character, or a more complicated pattern.</p>

  <p>Regular expressions can be used to perform all types of text search and text replace operations.</p>
</div>

Only .replace()

Using just replace, you could do so:

function patternEmNegrito() {
    const padrao = 'pattern' // padrão a ser buscado
    const regex = new RegExp(padrao, 'g') // cria o objeto RegExp com o padrão e a flag "g"
    const texto = document.querySelector('.texto') // seleciona a div
    const textoFinal = texto.innerHTML.replace(regex, `<b>${padrao}</b>`) // substitui todas as ocorrências

    return texto.innerHTML = textoFinal // altera o innerHTML da div
}
patternEmNegrito()
<h1>texto</h1>

<div class="resultado">

</div>

<div class="texto">
  <h1>What Is a Regular Expression?</h1>
  <p> regular expression is a sequence of characters that forms a search pattern.</p>

  <p>When you search for data in a text, you can use this search pattern to describe what you are searching for.</p>

  <p>A regular expression can be a single character, or a more complicated pattern.</p>

  <p>Regular expressions can be used to perform all types of text search and text replace operations.</p>
</div>

  • Thank you very much, you helped me a lot! but I still have a question, you said that replace would only change the first occurrence and that the loop for would not work... but I had made a function with that same code, but leaving the words in uppercase, and it worked. I’ll leave the code below:

  • Function deixarMaiusculo() { const regex = /Expression/Ig const text = Document.querySelector('.text') const result = text.innerHTML.match(regex) console.log(result) Let textoFinal = text.innerHTML for(Let i = 0; i < result.; i+){&#xTo; textFinal = textFinal.replace(result[i], result[i].toUpperCase() } Return text.innerHTML = textFinal }

  • 1

    Capitalized works because JS is case sensitive, that is, after it changes the first occurrence, it is no longer found by replace, and so on.

1

Only by complementing the another answer, there is a situation where regex can cause problems: if I have the string "Pattern" inside a tag, for example in the attribute of a input:

function patternEmNegrito() {
    const regex = /pattern/ig;
    const texto = document.querySelector('.texto');
    const resultado = texto.innerHTML.match(regex);

    let textoFinal = texto.innerHTML;
    const re_replace = new RegExp(resultado[0], 'g');
    textoFinal = textoFinal.replace(re_replace, `<b>${resultado[0]}</b>`);
    console.log(textoFinal);
    return texto.innerHTML = textoFinal;
}
patternEmNegrito()
<div class="texto">
  <form>
    <input type="text" pattern="[a-z]+">
  </form>
</div>

The result is that the innerHTML will be:

<form>
  <input type="text" <b>pattern</b>="[a-z]+">
</form>

Note that the attribute pattern was replaced by <b>pattern</b>, but as this is inside the tag, it becomes invalid and the input is no longer displayed correctly by the browser.

If you are sure that your HTML does not have the string "Pattern" as an attribute of a tag, there is no problem. But if you want to treat these cases as well, regex is a little more complicated:

function patternEmNegrito() {
    const texto = document.querySelector('.texto');
    let re = /(\bpattern\b)(?![^<]*>)/gi;
    texto.innerHTML = texto.innerHTML.replace(re, "<b>$1</b>");
}
patternEmNegrito()
<div class="texto">
  <form>
    <input type="text" pattern="[a-z]+">
  </form>
  <h1>What Is a Regular Expression?</h1>
  <p> regular expression is a sequence of characters that forms a search pattern.</p>

  <p>When you search for data in a text, you can use this search pattern to describe what you are searching for.</p>

  <p>A regular expression can be a single character, or a more complicated pattern.</p>

  <p>Regular expressions can be used to perform all types of text search and text replace operations.</p>

  <p>Pattern, PATTERN and pattERN are replaced, keeping their original case.</p>

</div>

The regex is (\bpattern\b)(?![^<]*>) (bars are not part of the regex itself, they only serve to delimit it).

First we have the word "Pattern" surrounded by \b, which is the shortcut to "word border". This ensures that the word "Pattern" will be considered only if there are no other alphanumeric characters before and after (this avoids cases such as "antipattern", for example: as there is a letter before the p, regex disregards and does not replace).

I put this in parentheses to form a catch group. This will be important later, at the time of replacement.

Then we have a Negative Lookahead (the expression within (?!...)), which serves to check if something does not exist in front. In this case, inside this Lookahead we have [^<]*>, which is "zero or more characters that are not <", followed by the character >.

That is, the regex looks for the word "Pattern", since soon after nay has several characters other than tag opening + tag closing. In short, the word "Pattern" cannot be inside a tag (between the < and the >, either as an attribute, or as the value of an attribute).

In substitution I use $1, which is a special variable that corresponds to the first catch group (the first pair of parentheses, which in this case will be the word "Pattern").

I could use the very word "Pattern" in substitution, but how was used the flag i, the regex becomes case insensitive, which means it will replace "Pattern", "PATTERN", "Pattern" and any other combination of upper and lower case. So, use $1 ensures that you will use exactly the same string that was captured, keeping the uppercase and lowercase (I am saying this because in your original code you used this flag, indicating that the text may have this variation between upper and lower case).

I also used the flag g, which causes all occurrences to be overwritten. If you do not use it, only the first occurrence will be overwritten.

Finally, note that you can use the replace directly. No need to make a match and then take the result and use in the replacement. The replace already searches for regex occurrences and only makes the substitution if something is found (ie, it already tries to find a match, and if found, replace the found).

  • 1

    sensational! thank you for sharing your knowledge with us

Browser other questions tagged

You are not signed in. Login or sign up in order to post.