Is replacing strings with Regex slower than Replace()?

Asked

Viewed 422 times

20

Let’s say I want to do something like this question:

How to convert Camelcase to snake_case in C#?

You had some answers, but I want to highlight these two.

Answer 1

string stringSnake = string.Concat(
                     stringCamel.Select((x, i) => i > 0 && char.IsUpper(x) ? "_" + x.ToString().ToLower() : x.ToString().ToLower())
                 ); 

Answer 2

string stringSnake = Regex.Replace(input, "(?<=.)([A-Z])", "_$0", RegexOptions.Compiled);

Some S.O. contributors said the answer 2 is much slower, would like to know why?

  • It’s because it’s in regex?
  • It’s the regex that’s "done wrong"?
  • Regex is slow in general?

1 answer

13


Always measure

To know about speed always have to measure the real case. And can change depending on a number of questions. It may be different if it runs on a different platform, if the data is different, if it is used in conjunction with other things and mainly depends on the version of the language or library you are using. What’s worth today may not be worth tomorrow.

Which is faster

Can’t say a Regex will always be worse than a Replace(), because the swap function can be misused and not do in the best way possible. If you keep applying the Replace() several times in the same string may slow down.

But under normal conditions it will be faster because this function goes more straight to the point. Although the Regex algorithm is well done is a general solution that tries to solve any problem, it has to try to see each character meets all situations it intends to work.

Measuring Regex performance is not so intuitive to say it will always be slower. But my experience is that most cases are even slower, in some cases the differences are brutal.

Regex can win if it has several modifications, because it can do everything in one step. Especially if the string of the pattern to be found is already compiled (at least in C# there is compilation of the text pattern and a cache is made). If used several times, it can become interesting. Of course it also depends on the compiler quality of the expression. If it can do some optimizations, it helps well.

Compiling helps but does not miracle.

Other techniques

It is possible to use techniques other than these two that can give an even better result, for example make a character-by-character analysis and make the decision in each case. This can be simpler or more complicated to do depending on the case. This technique can make a "replace" of several things in one step.

I’ve seen a lot of people perform badly because they don’t understand the workings of Garbage Collector, the problem is not always in the manipulation of string in itself, but memory management. It can become tragic in large volumes.

I can guarantee that the programmer will always get a better result at hand than a Regex, provided he uses the appropriate technique (Replace() or not) implemented in the correct way. If it will be more work, if it will be ugly, if it will be confused, if it will have other problems, it is another question, it can happen or not. Regex is not so simple, does not look pretty and is confusing, the question is how.

It is always possible to produce a Regex expression that is not tragic, but can take as much or more work than writing a code in your hand.

In the specific example has an identical code response, with a performance comparison made in the OS.

See a comparison by Microsoft. Note that the StringBuilder() that everyone thinks best for these things turned out worse. Not always what seems to be the best is in fact.

A real example posted by one of the founders of this site citing the problem that Microsoft faced.

Has a website that helps understand Regex and shows the traps that can fall, among them the backtracking.

I find it strange people who speak of legibility defend Regex.

  • 1

    I had read this backtracking article, thank you very much for answering me, I learned a lot. I will think 2x before using regex again.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.