Special treatment for string, why?

Asked

Viewed 169 times

5

I know arrays are static elements used when you have a predetermined size that you can use. But speaking of initialization, when the size is already set next to the array, I would like to know, essentially why you can’t use other types of arrays as you use strings in C (pointer notation). For example I can write:

#include <stdio.h> 

int main(void)
{
   char *sstring = "Olá, Mundo!";

   char schars[] = {'O', 'l', 'a', '\0'};

   int mnumbers[] = {1, 2, 3, 4, 5};

   printf("Sstring : %s\n", sstring); 
   printf("Schars : %s\n", schars);
   printf("Mnumber : %d\n", *mnumbers);

   return 0;
}

But in return I cannot write:

char *sstring = "Olá, Mundo!";

char *schars = {'O', 'l', 'a', '\0'};

int *mnumbers = {1, 2, 3, 4, 5};

Even if the size is known, after all I am initiating the arrays. Why does this happen? Why, even in an array of chars initialized with parentheses, it is not possible to treat as pointers?

This occurs even with larger arrays and pointers (obviously):

#include <stdio.h> 
#include <stdlib.h>

int main(void)
{
   int arrayInt[][3] = {{1, 2, 3}, {4, 5, 6}};

   char *arrayChar[] = {"PALAVRA", "teste", "HEY"};

   char **names = malloc(3 * sizeof(char *));   

   *names = "Teste";
   *(names + 1) = "de";
   *(names + 2) = "Arrays";


   printf("%d\n", arrayInt[1][2]); 
   printf("%c\n", arrayChar[0][4]); 

   printf("Nome: %s %s %s\n", *names, *(names + 1), *(names + 2)); 

   return 0;
}

It is not possible to do int **arrayInt = {{1, 2, 3}, {4, 5, 6}}; and it is still necessary to inform a dimension of the array, even stating it explicitly. A mixed form char *arrayChar[] = {"PALAVRA", "teste", "HEY"}; is still possible, but char *arrayChar[] = {{'O', 'l', 'a', '\0'}, {'M', 'u', 'n', 'd', 'o', '\0'}}; no. It seems that the use of brackets [] is connected to boot with keys {}. I’d like to know why.

  • 1

    { } are key. Parentheses: ( ) - but it’s probably just a confusion at the time of writing - almost a typo .

  • @jsbueno Obg. Fixed

1 answer

5


The quick answer is yes, C has special treatment for strings, "because yes".

The long answer is that you are assuming that C vectors and pointers are totally interchangeable in C, which is not true! What happens is that in C is that on several occasions there is an automatic conversion of the vector to a pointer to the first element of the vector. In your question, there are two places that this difference appears:

1) You need to have a vector allocating memory.

C strings are a special case. When you use a string literal the C compiler will allocate a memory space in the executable’s data (read-only) area. It can also do optimizations like allocating two string literals of same content in a single place.

For types that are not string and even for mutable vectors containing characters, you will need to allocate an array somewhere to store your data. The C compiler won’t put them anywhere special for you.

2) Pointer to pointer and multidimensional vector is not the same thing.

Take as an example the 3x3 matrix

int mat[3][3] = {
  00, 10, 20,
  30, 40, 50,
  60, 70, 80,
};

The memory representation is a vector of 9 elements, with one line after another.

mat --> [ 00 10 20 30 40 50 60 70 80 ]

And when you access mat[i][j], compiler takes the element 3*i + j for you. Note that he needed to know the number of columns in each row in order to do this.

Already a matrix using int ** will have to be stored differently.

// Acho que precisa de C99 pra compilar isso aqui.
// Mas se não rodar dá pra ter uma idéia...

int linhaA[] = {00, 10, 20};
int linhaB[] = {30, 40, 50};
int linhaC[] = {60, 70, 80};

int *linhas[3] = {linhaA, linhaB, linhaC};

int **mat = linhas; // Aqui ocorre um cast automático de tipo

Which in memory appears as

mat --> [linhaA, linhaB, linhaC]
           |       |       |
           |       |       +-----> [60, 70, 80]
           |       +-------------> [30, 40, 50]
           +---------------------> [00, 10, 20]

Note that we have a pointer vector and that the data does not need to be in a single vector or even with the lines in order. To access mat[i][j] really only make two meltdowns one after the other.

At the end of the day what it all means is that a int [3][3] (3x3 two-dimensional integer vector) can be automatically converted to a int (*)[3] (pointer to vector of 3 integers) but cannot be converted to int ** (pointer to integer pointer). The root of all this is that when we use a two-dimensional vector the compiler needs to know the size of all vector dimensions (except the leftmost one -- the number of lines) to be able to access an element.

  • So basically it’s "Strings in C are a special case. When you use a string literal the C compiler will allocate a memory space in the executable’s data (read-only) area. It can also perform optimizations such as allocating two string literals of the same content in a single place" because Assembly code does so?

  • I think it has more to do with the C rules than with Assembly. But yes, strings are a special case.

  • "compiler needs to know the size of all dimensions + 'except the last' "

  • remembering that "strings" do not even exist in C at runtime: they are only an array of bytes (or characters if you disregard any question of text encoding, which is not correct today). But all string language support is limited to these special cases at compile time, and to functions in the standard library that interpret the byte value 0 in a sequence like the end of the sequence. (I’m writing why sometimes people don’t realize that this "ending in ' x00' is a library convention, not a language determination)

  • In fact int *linhas[] = {linhaA, linhaB, linhaC}; without the column width worked on C99. Why then I can’t directly assign to **?

  • 1

    Try declaring a two-dimensional vector without specifying any dimension -- int foo[][] -- and see the error message you get. As jsbueno said, when it comes to multidimensional vectors the compiler needs to know all dimensions except the leftmost one. (If you look at the bill you need to do to catch an element it becomes clear)

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.