Whenever I am going to scan a string in C I must use "strlen()", right?

Question

Whenever I am going to scan a string in C I must use "strlen()", right?

Asked 8 years, 8 months ago

Viewed 1,949 times

17

It is common to see in C execs that people need to analyze and/or manipulate the content of a string, then we need to make sure that it will not exceed its limit. It is very common to make a for from 0 to 0 strlen() - 1, after all this is the function that takes the number of characters from string. In every language it is so (unless you have one foreach).

Is there a problem in doing so? It has better form?

1 answer

Browser other questions tagged c string performance

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2016-11-24T11:53:42+00:00

The question linked has the definition of strlen(). This function counts how many characters a string has. If that’s all you want, then it can be used, it was made for it. But only use when that number is what you want. Whenever the desire is to sweep the string potentially until its end it should not be used, should not take the amount of characters from the string to limit how far to go, this information is not necessary.

In fact it is bad for the performance of the application. In an exercise everything well use, but in real applications should not be used. You have to be creative and use another way with better performance.

C does not store the size of string, it would complicate a little or waste memory doing so. Remember that decades ago memory was very scarce. The solution was just to put a character at the end of the string (\0) to indicate that it has ended. Thus it is ensured that any string can be represented and only one byte of overhead is necessary to indicate its end.

What’s wrong with this? While languages that store the size next to the text can tell the size of the string with complexity O(1), since it is enough to read the header of string and already has the information, in C we have complexity O(n), IE, has to read all the string, go counting to get your size.

A naive implementation of the function would be:

size_t strlen (char *str) {
    size_t len = 0;
    while (*str != '\0') {
        str++;
        len++;
    }
    return len;
}

To real implementation can be more complicated.

Use a strlen() as a limit on a for almost certainly is a mistake because it will read the whole string to find the size every time you need that number and potentially you’ll read the whole string to execute what you want. It’s duplicate work.

Someone might be thinking that just take the strlen() from within the for, so only reads once. But it is not solution, although it can improve a little.

IS possible that some compiler does an optimization and resolves it alone. But I even doubt it for a basic reason. C has strings changeable, so the size can change inside the loop, catching the size before can be a huge mistake if the loop manipulates the size of the string. Even if this is the solution we will still sweep the string twice, once to find the size and the other to do what you want.

The solution is to do what is there in the function of strlen(). Walk all over the string starting from 0 and go until you find the \0. Do not predetermine when to stop, let the simple condition determine the stop. What you’re doing is just bringing into your loop what’s done inside the strlen().

This even solves the problem of the size of the string change inside the loop (will probably change to smaller, almost always to larger there will be a memory corruption).

So if you want to know how many spaces there are within the string, can make form more "obvious":

for (int i = 0; i < strlen(string); i++) if (string[i] == ' ') count++;

Can improve:

size_t size = strlen(string);
for (int i = 0; i < size; i++) if (string[i] == ' ') count++;

Or you can do the right thing:

while (*string != '\0') if (*string++ == ' ') count++;

I put in the Github for future reference.

Not using pointers in C is a serious error. C does not work well with abstractions, it is a raw language. Abstractions often contain costs, often difficult to evaluate, is not part of the philosophy of language. Where there are abstractions, they need to do so in a way that is intuitive and does not cause excessive cost. Eliminate abstraction, eliminate intermediaries.

Therefore strlen() C is one of the most poorly used functions that programming.