Whenever I am going to scan a string in C I must use "strlen()", right?

Asked

Viewed 1,949 times

17

It is common to see in C execs that people need to analyze and/or manipulate the content of a string, then we need to make sure that it will not exceed its limit. It is very common to make a for from 0 to 0 strlen() - 1, after all this is the function that takes the number of characters from string. In every language it is so (unless you have one foreach).

Is there a problem in doing so? It has better form?

1 answer

20


The question linked has the definition of strlen(). This function counts how many characters a string has. If that’s all you want, then it can be used, it was made for it. But only use when that number is what you want. Whenever the desire is to sweep the string potentially until its end it should not be used, should not take the amount of characters from the string to limit how far to go, this information is not necessary.

In fact it is bad for the performance of the application. In an exercise everything well use, but in real applications should not be used. You have to be creative and use another way with better performance.

C does not store the size of string, it would complicate a little or waste memory doing so. Remember that decades ago memory was very scarce. The solution was just to put a character at the end of the string (\0) to indicate that it has ended. Thus it is ensured that any string can be represented and only one byte of overhead is necessary to indicate its end.

What’s wrong with this? While languages that store the size next to the text can tell the size of the string with complexity O(1), since it is enough to read the header of string and already has the information, in C we have complexity O(n), IE, has to read all the string, go counting to get your size.

A naive implementation of the function would be:

size_t strlen (char *str) {
    size_t len = 0;
    while (*str != '\0') {
        str++;
        len++;
    }
    return len;
}

To real implementation can be more complicated.

Use a strlen() as a limit on a for almost certainly is a mistake because it will read the whole string to find the size every time you need that number and potentially you’ll read the whole string to execute what you want. It’s duplicate work.

Someone might be thinking that just take the strlen() from within the for, so only reads once. But it is not solution, although it can improve a little.

IS possible that some compiler does an optimization and resolves it alone. But I even doubt it for a basic reason. C has strings changeable, so the size can change inside the loop, catching the size before can be a huge mistake if the loop manipulates the size of the string. Even if this is the solution we will still sweep the string twice, once to find the size and the other to do what you want.

The solution is to do what is there in the function of strlen(). Walk all over the string starting from 0 and go until you find the \0. Do not predetermine when to stop, let the simple condition determine the stop. What you’re doing is just bringing into your loop what’s done inside the strlen().

This even solves the problem of the size of the string change inside the loop (will probably change to smaller, almost always to larger there will be a memory corruption).

So if you want to know how many spaces there are within the string, can make form more "obvious":

for (int i = 0; i < strlen(string); i++) if (string[i] == ' ') count++;

Can improve:

size_t size = strlen(string);
for (int i = 0; i < size; i++) if (string[i] == ' ') count++;

Or you can do the right thing:

while (*string != '\0') if (*string++ == ' ') count++;

I put in the Github for future reference.

Not using pointers in C is a serious error. C does not work well with abstractions, it is a raw language. Abstractions often contain costs, often difficult to evaluate, is not part of the philosophy of language. Where there are abstractions, they need to do so in a way that is intuitive and does not cause excessive cost. Eliminate abstraction, eliminate intermediaries.

Therefore strlen() C is one of the most poorly used functions that programming.

  • 1

    Why stop using the for in the latter case? for (char *chr=string; *chr; chr++){...}

  • (The above comment was half humorous/+ to add creative uses of the for) - this is more serious - worth acescentar that some of the texts that warn to at all times check the size of a string before using it, they are concerned about the security issue - since if the string end marker is not there, the program goes out looking at regions of the memory out of context. This can lead to a segfault, or serious security breaches. Recommit is to use functions that allow specifying a maximum string size - such as strnlen instead of strlen.

  • 1

    @jsbueno could, of course, the important thing there is not to use p strlen(). This thing of always checking the size depends a lot. How to check the size? Doing just that. The question is precisely on this subject. If you do not have an end marker of string How do you know what size it is? It’s already rolled. Can you keep that size? Yes, of course, but that’s not the way string standard of C. O strnlen makes sense in very specific contexts. Most of the time if it can be applied, you don’t need it. Like I said, it was a bad idea for C to do string that way

Browser other questions tagged

You are not signed in. Login or sign up in order to post.