Depending on the complexity of the HTML tags that are contained in your string, do this with regex can no longer be trivial. It is almost consensus, too, that regular expressions should not be used to make the parse of strings containing HTML.
It may sound absurd, but if you are working with HTML, use a parser HTML might not be a bad idea. See an example using the API DOMParser
, present in browsers:
const htmlStr = 'texto inicial <div>Texto dentro da DIV</div> Texto fora da DIV <p>Texto dentro do P</p> texto final';
const parser = new DOMParser();
const doc = parser.parseFromString(htmlStr, 'text/html');
const arr = Array.from(doc.body.childNodes).map((node) => {
const text = node.nodeType === Node.TEXT_NODE
? node.textContent
: node.outerHTML;
return text.trim();
});
console.log(arr);
If you’re in an environment that doesn’t natively support Domparser (like Node.js), you can use some package that does this, such as jsdom
.
Use a parser how this will be, in most cases (especially the more complex ones), better than dealing with regular expressions (and which may not be fully suited to the task). The advantage is that you have a much more robust API to develop as the complexity of the HTML present in the string grows.
Do not use Regex to analyze HTML. Please read Analyzing Html the Cthulhu Way, if you do not know how to read English click with the left mouse button on the page and translate to Portuguese (the same is true for the links suggested by this article).
– Augusto Vasques
By placing something between brackets, you are setting a list of characters, so
[(<.*>.*<\/.*>)]
means "the character(
, or<
, or.
, or*
, etc" (only one of them) - see here. Anyway, regex is not the best way, as already said. It may even "work" for simple cases, but it does a little HTML and the regex begins to turn into a "monster".– hkotsubo