4
In terms of efficiency and performance, which of these codes is the best option to remove HTML tags in a string?
Option 1:
string ss = "<b><i>The tag is about to be removed</i></b>";
Regex regex = new Regex("\\<[^\\>]*\\>");
Response.Write(String.Format("<b>Before:</b>{0}", ss)); // HTML Text
Response.Write("<br/>");
ss = regex.Replace(ss, String.Empty);
Response.Write(String.Format("<b>After:</b>{0}", ss));// Plain Text as a OUTPUT
Option 2:
using System;
using System.Text.RegularExpressions;
/// <summary>
/// Methods to remove HTML from strings.
/// </summary>
public static class HtmlRemoval
{
/// <summary>
/// Remove HTML from string with Regex.
/// </summary>
public static string StripTagsRegex(string source)
{
return Regex.Replace(source, "<.*?>", string.Empty);
}
/// <summary>
/// Compiled regular expression for performance.
/// </summary>
static Regex _htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);
/// <summary>
/// Remove HTML from string with compiled Regex.
/// </summary>
public static string StripTagsRegexCompiled(string source)
{
return _htmlRegex.Replace(source, string.Empty);
}
/// <summary>
/// Remove HTML tags from string using char array.
/// </summary>
public static string StripTagsCharArray(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;
for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{
inside = true;
continue;
}
if (let == '>')
{
inside = false;
continue;
}
if (!inside)
{
array[arrayIndex] = let;
arrayIndex++;
}
}
return new string(array, 0, arrayIndex);
}
}
Best option considering which aspects?
– Leonel Sanches da Silva
I prefer the first option. The code is leaner and I trust the regex there more than the non-greedy match :P .
– William Okano
@Gypsy Rhyrrisonmendez Effectiveness and Performance...
– Jedaias Rodrigues
Well, then put it clearly in the question, or else it becomes opinionated.
– Leonel Sanches da Silva
Okay, I am editing.
– Jedaias Rodrigues
Escaping a little from the question, if you seek performance ever thought to do this without regular expressions?
– Leonardo Bosquett
@Leonardobosquett I could not think of a good solution without regular expressions...
– Jedaias Rodrigues
https://dotnetfiddle.net/UZzVJj - Not exactly like its regular expression, which requires a "match" between "<" and ">", but this code uses 0.016s of the CPU according to Fiddle, this can help you.
– Leonardo Bosquett
very good @Leonardobosquett if you can post as an answer, it’s a good solution too...
– Jedaias Rodrigues