A HTML Entity is perfectly valid information in an HTML and there is no reason to remove it.
What you can do is Decode of his, using HttpUtility.HtmlDecode
(available in namespace System.Web
), or WebUtility.HtmlDecode
(available in namespace System.Net
):
string texto = "<td>Para maiores informações consulte: + informações</td>";
Console.WriteLine(HttpUtility.HtmlDecode(texto));
Console.WriteLine(WebUtility.HtmlDecode(texto));
Both produce the same result:
<td>Para maiores informações consulte: + informações</td>
But if you want remove the HTML Entities (and not replace them with the equivalent characters), so just use:
Console.WriteLine(Regex.Replace(texto, "&[^;]+;", string.Empty));
regex contains the character &
at the beginning and the ;
at the end. Among them, there is:
[^;]
: the [^
creates a character class denied, that is, this excerpt represents any character that is not inside the brackets. Therefore, this excerpt means "any character that nay be it ;
"
- the quantifier
+
means "one or more occurrences"
Therefore, regex means: the character &
, followed by one or more characters other than ;
, followed by ;
. With that, all the HTML Entities are eliminated. The output is:
<td>Para maiores informações consulte: informações</td>
Just to explain why your regex didn’t work.
[;\\/:*?\"<>|&']
: the brackets define a character class, which corresponds to any character between brackets. Therefore, this regex means "the character ;
, or the character \
, or the character /
, or the :
etc...". The detail is that this whole expression corresponds to only one character (and this may be any of those listed).
Therefore, this regex only deletes these characters. In the case of HTML Entity, only the &
and the ;
are deleted, but the numbers and the #
nay.
Why remove? Wouldn’t it be better to simply do Decode? https://stackoverflow.com/q/19692654
– hkotsubo
opa valeu hkotsubo, had forgotten this possibility! It worked here for me, if you want to post as answer, I leave as solved
– aa_sp
It took me a while, but I put an answer :-)
– hkotsubo
thanks, thank you for your time in having responded :)
– aa_sp