First, your regex does not just search for words that end in "mind", but any word that has at least one character before "mind":
let s = 'seus dementes, a mente engana frequentemente, plante sementes';
console.log(s.match(/\w+mente/g)); // [ "demente", "frequentemente", "semente" ]
See in the example above that, although the text has the words "demented" and "seeds", in the result appears "demented" and "seed". That is, his replacement would result in <mark>demente</mark>s
and <mark>semente</mark>s
, that’s not quite what you need.
This happens because the regex is catching \w+
(one or more letters, numbers or _
), followed by "mind", but this regex alone does not guarantee that it cannot have another letter after.
To avoid this and just take the words that actually end in "mind", use the shortcut \b
, indicating a word Boundary ("boundary between words, "a position having a prior alphanumeric character and a non-alphanumeric character afterwards, or vice versa):
let s = 'seus dementes, a mente engana frequentemente, plante sementes';
console.log(s.match(/\b\w+mente\b/g)); // [ "frequentemente" ]
The word "mind" is also not considered, because \w+
says it must have at least one character before "mind". But if you want to take the word "mind", change it to \w*
.
Another detail is that you are changing all the HTML of the element. Although it works in many cases, it will not always be what you expect, because HTML is much more complex than a regex is able to handle (see more about this here).
I took the liberty of adapting the another answer to illustrate some problems that may occur:
function marcarTexto_adverbio(target) {
// mostrando o HTML no console
$("#content").html(function (_, html) {
let novoHTML = html.replace(new RegExp(target, "g"), '<mark>' + target + '</mark>');
console.log(novoHTML);
return novoHTML;
});
}
function teste() {
let target = $("#content").text();
let exp = /\b\w+mente\b/g;
let resultado = null;
let palavrasReplace = new Map();
while (resultado = exp.exec(target)) {
const palavra = resultado[0];
if (!palavrasReplace.has(palavra)) {
marcarTexto_adverbio(palavra); // função que coloca uma tag <mark> em volta
palavrasReplace.set(palavra, true);
}
};
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p id="content">novamente
<a href="www.novamente.com">link</a>
<img src="novamente.gif" alt="mostra novamente a imagem">
<span>tem comentário aqui<!-- novamente --></span></p>
<button onclick="teste()">Mark</button>
I changed the function marcarTexto_adverbio
to display the final HTML in the console. Note that the output was:
<mark>novamente</mark>
<a href="www.<mark>novamente</mark>.com">link</a>
<img src="<mark>novamente</mark>.gif" alt="mostra <mark>novamente</mark> a imagem">
<span>tem comentário aqui<!-- <mark>novamente</mark> --></span>
That is, both the href
of the link, how much the src
and the alt
of the image, and even the text that was in the comments, had its contents unduly altered.
Using regex in this way, without worrying about the element’s HTML structure, can lead to catastrophic results. regex will only work if inside the element it has only simple text (or if a word that occurs in the text does not occur within HTML attributes, or anywhere other than one textContent
).
The solution to this is a little more complicated, because we have to break every textNode
in several nodes, some of which will be elements mark
, while others will be textNodes
. For example, the text "Happened again today", which in HTML is just one textNode
, will have to be broken into 3 nodes: two textNodes
for the words "Happened" and "today", and an element mark
to "again". And if you have other tags inside the element, I must call the same function recursively, so that it handles the most internal elements of the element.
Would something like this:
function markWords(element) {
let e = document.createElement('div');
for (let child of element.childNodes) {
if (child.nodeType == Node.TEXT_NODE) {
child.nodeValue.split(/(\b\w+mente\b)/g).forEach(s => {
if (! /^\w+mente$/.test(s)) {
e.appendChild(document.createTextNode(s));
} else {
let novo = document.createElement('mark');
novo.appendChild(document.createTextNode(s));
e.appendChild(novo);
}
});
} else e.appendChild(markWords(child));
}
element.innerHTML = e.innerHTML;
return element;
}
function teste() {
markWords(document.querySelector('#content'));
// somente para mostrar o HTML gerado, pode apagar quando for usar na sua página
console.log(document.querySelector('#content').innerHTML);
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p id="content">novamente pois novamente o
<a href="www.novamente.com">link</a>
<img src="novamente.gif" alt="mostra novamente a imagem">
<span>tem comentário aqui<!-- novamente --></span>
<span>antigamente, sementes, demente, novamente</span> <span>e novamente</span> fim.</p>
<button onclick="teste()">Mark</button>
The result is the correct HTML, with only modified words (correctly preserving HTML comments and attributes):
<mark>novamente</mark> pois <mark>novamente</mark> o
<a href="www.novamente.com">link</a><img src="novamente.gif" alt="mostra novamente a imagem"><span>tem comentário aqui<!-- novamente --></span><span><mark>antigamente</mark>, sementes, <mark>demente</mark>, <mark>novamente</mark></span><span>e <mark>novamente</mark></span>
One last detail is that the shortcut \w
takes letters, digits and the character _
. If you want to consider only letters (including accents), see some options here.
i created a text editor in
div
withcontentEditable = True
so I won’t have tags inside it. only plain text. I took care, including to include a cleaning of characters and tags if the user makes 'Ctrl+v' withclipboardData.getData('text/plain')
. And only in thisdiv
i run regex,. Of course, the tagsmarks
are inserted/deleted with js followed bynormalize
to correct some behaviors oftextNode
that have emerged.– Luke Negreiros
@Lukenegreiros All right, anyway, the answers from [pt.so] should also be for anyone who visits the site in the future, so I found it interesting to show a more general case, because regex + HTML can be a dangerous combination if the scenario is not restricted (as is your case)
– hkotsubo