Remove HTML tags


Viewed 1,169 times


In terms of efficiency and performance, which of these codes is the best option to remove HTML tags in a string?

Option 1:

string ss = "<b><i>The tag is about to be removed</i></b>";
        Regex regex = new Regex("\\<[^\\>]*\\>");
        Response.Write(String.Format("<b>Before:</b>{0}", ss)); // HTML Text
        ss = regex.Replace(ss, String.Empty);
        Response.Write(String.Format("<b>After:</b>{0}", ss));// Plain Text as a OUTPUT


Option 2:

using System;
using System.Text.RegularExpressions;

/// <summary>
/// Methods to remove HTML from strings.
/// </summary>
public static class HtmlRemoval
    /// <summary>
    /// Remove HTML from string with Regex.
    /// </summary>
    public static string StripTagsRegex(string source)
    return Regex.Replace(source, "<.*?>", string.Empty);

    /// <summary>
    /// Compiled regular expression for performance.
    /// </summary>
    static Regex _htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);

    /// <summary>
    /// Remove HTML from string with compiled Regex.
    /// </summary>
    public static string StripTagsRegexCompiled(string source)
    return _htmlRegex.Replace(source, string.Empty);

    /// <summary>
    /// Remove HTML tags from string using char array.
    /// </summary>
    public static string StripTagsCharArray(string source)
    char[] array = new char[source.Length];
    int arrayIndex = 0;
    bool inside = false;

    for (int i = 0; i < source.Length; i++)
        char let = source[i];
        if (let == '<')
        inside = true;
        if (let == '>')
        inside = false;
        if (!inside)
        array[arrayIndex] = let;
    return new string(array, 0, arrayIndex);


  • Best option considering which aspects?

  • 1

    I prefer the first option. The code is leaner and I trust the regex there more than the non-greedy match :P .

  • @Gypsy Rhyrrisonmendez Effectiveness and Performance...

  • Well, then put it clearly in the question, or else it becomes opinionated.

  • Okay, I am editing.

  • Escaping a little from the question, if you seek performance ever thought to do this without regular expressions?

  • @Leonardobosquett I could not think of a good solution without regular expressions...

  • 1 - Not exactly like its regular expression, which requires a "match" between "<" and ">", but this code uses 0.016s of the CPU according to Fiddle, this can help you.

  • 1

    very good @Leonardobosquett if you can post as an answer, it’s a good solution too...

Show 4 more comments

2 answers


I made a Fiddle for the first case. Times were:

Compile:    0.062s
Execute:    0s
Memory :    8kb
CPU    :    0.047s

I made a Fiddle for the second case. For the method HtmlRemoval.StripTagsRegex(), times were:

Compile:    0.109s
Execute:    0s
Memory :    16kb
CPU    :    0.094s

For the method HtmlRemoval.StripTagsRegexCompiled(), times were:

Compile:    0.063s
Execute:    0.031s
Memory :    16kb
CPU    :    0.109s

For the method HtmlRemoval.StripTagsCharArray(), times were:

Compile:    1.969s
Execute:    0.016s
Memory :    16kb
CPU    :    0.703s


All are equally effective.

The first is undoubtedly the fastest, but is not organized as the second.

The tests I have done do not consider very large strings. For small strings, the test serves well. For larger chains, it would be interesting to establish other criteria and other tests.

  • I can only vote from here 4hs, rsrsrs... Thank you very much for your reply. It opened my eyes as to the Fiddle too, I had not realized the ability to obtain this information. But why the first is not organized?

  • 1

    Because in the second I change a function and the regular expression approach changes completely, since everything is enveloped and ready to use. At first, I have to set the regular expression in hand, instantiate the regular expression, run and put the result in another string.


Considering the performance, can also be done removing tags avoiding the use of regular expressions, which greatly increases the performance, here is an initial code (simple).

test results:

 Compile:   0.189s 
 Execute:   0s 
 Memory:    0b 
 CPU:       0.016s

He doesn’t exactly the same rule as the regular expression \<[^\>]*\>, as this removes only if there are both tags, < and >.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.