How to write read information from a . csv file to a C Language struct

Asked

Viewed 152 times

0

I’m needing to open a . csv file and save your information in a struct array

I need the way out:
Code - Region - UF - Date

But you’re being:
Code - Region - UFDATA - DATA

I would like to know what is making the output wrong, follows below the code:

typedef struct{
    char Codg[10];
    char Regiao[10];
    char UF[2];
    char Data[10];
    }dados_cov;

int main{

    FILE *file;
    dados_cov D[10];

    file = fopen("COV.csv", "r");

    for(int i = 0;i < 3;i++) //Por algum motivo quando leio o arquivo .csv vem com 3 caracteres aleatórios
        fgetc(file);

    if(file)
        for(int i = 0;i < 10;i++){
            fscanf(file,"%10[^;];%10[^;];%2[^;];%10[^\n]\n",  D[i].Codg, D[i].Regiao, D[i].UF, D[i].Data);
            printf("%s - %s - %s - %s\n", D[i].Codg, D[i].Regiao, D[i].UF, D[i].Data);
        }
    fclose(file);

    return 0;
}

Can anyone tell me why I’m not giving the expected exit?
Saída obtidaArquivo .csv

1 answer

1


Program error

The error of your program is to process the data within the struct as if they were string without leaving room for the character \0.

Remember that in the linguagem C one string is a array/vetor unidimensional finished by the character \0. Example:

That eh string 0

This is an array of characters

Character array is not string

The character \0 indicates where a string ends. If a array be treated as string but he doesn’t have the \0 then language cannot guess where its end is.

Observe:

char Codg[10];

Here we have a vetor to store exactly 10 caracteres and in your file .csv the code has exactly 10 caracteres, then there won’t be room for the \0. Without the \0 you cannot use this vector as a string (as you do in the printf), because the program has no way of knowing where the end is.

Another problem is when you use the fscanf with the argument %10[^;]; to read the code, here you are using this vector as if it were a string. If you’re using as a string then the fscanf will put the \0 at the end of this vector, at the position Codg[10]. Notice:

Codg[0] = 'R'

Codg[1] = 'O'

Codg[2] = '2'

Codg[3] = '1'

Codg[4] = '/'

Codg[5] = '0'

Codg[6] = '1'

Codg[7] = '/'

Codg[8] = '0'

Codg[9] = '3'

Codg[10] = ' 0'

Here we have a problem, the position Codg[10] is not part of your vector (remember that 10 positions is from 0 to 9). The problem with this is that the \0 is theoretically in a memory position that is not reserved for you and so this area can be overwritten at any time (which would be bad).

Right away we have:

char Regiao[10];

The interesting thing is that this vector is declared right after the vector of the code and with that they are side by side, that is, the position Codg[10] is equal to Regiao[0] (Codg[10] is not part of the vector code, but the position Codg[10] is what comes after the Codg[9], this is the end). So when you add something to Regiao[0] then the character \0 code will be lost. Notice how these vectors look after reading content to Regiao:

Codg[0] = 'R'

Codg[1] = 'O'

Codg[2] = '2'

Codg[3] = '1'

Codg[4] = '/'

Codg[5] = '0'

Codg[6] = '1'

Codg[7] = '/'

Codg[8] = '0'

Codg[9] = '3'

Region[0] = 'N' // Codg[10] = ' 0' was overwritten

Region[1] = 'o'

Region[2] = 'r'

Regiao[3] = ’t'

Region[4] = 'e'

Region[5] = ' 0' // Indicates the end of the string

Regiao[6] = '' // I left it empty, but in reality there will be some garbage in this place

Regiao[7] = '' // I left it empty, but in reality there will be some garbage in this place

Regiao[8] = '' // I left it empty, but in reality there will be some garbage in this place

Regiao[9] = '' // I left it empty, but in reality there will be some garbage in this place

Note that the \0 which indicated the end of the code was lost. Now if you use a printf to print the contents of the code as if it were string then the program will print everything until it finds a \0, like the \0 is after the name of the region so the content of the region will also be printed.

Notice that the fundamental difference between these two vetores is that one the \0 stays out of vetor and so can be overwritten while the other \0 stays inside the vetor and cannot be overwritten (may be overwritten manually, but this is not the case now).

For your problem to be solved it is necessary that the \0 always stay inside the vector, to do this just increase in 1 the size of the vector, that is, if a string will have at most 65 letters, so your vector has to have 66 positions (one more for the \0). In your code it would be something like this:

typedef struct {
    char Codg[10 + 1]; // Adicionando +1 para o \0
    char Regiao[10 + 1]; // Adicionando +1 para o \0
    char UF[2 + 1]; // Adicionando +1 para o \0
    char Data[10 + 1]; // Adicionando +1 para o \0
}dados_cov;

Now that part I don’t understand:

for(int i = 0; i < 3; i++) //Por algum motivo quando leio o arquivo .csv vem com 3 caracteres aleatórios
        fgetc(file);

I ran your code without it and it worked normal.

Your final code would look something like this:

#include <stdio.h>

typedef struct {
    char Codg[10 + 1];
    char Regiao[10 + 1];
    char UF[2 + 1];
    char Data[10 + 1];
}dados_cov;

int main(void) {

    FILE *file;
    dados_cov D[10];

    file = fopen("COV.csv", "r");

    /*
    for(int i = 0; i < 3; i++) //Por algum motivo quando leio o arquivo .csv vem com 3 caracteres aleatórios
        fgetc(file);
    */

    if(file)
        for(int i = 0; i < 10; i++) {
            fscanf(file,"%10[^;];%10[^;];%2[^;];%10[^\n]\n",  D[i].Codg, D[i].Regiao, D[i].UF, D[i].Data);
            printf("%s - %s - %s - %s\n", D[i].Codg, D[i].Regiao, D[i].UF, D[i].Data);
        }
    fclose(file);

    return 0;
}
  • 1

    Just noting that the ideal here would be a program that parses CSV through the separator and goes on to make strcat for the strings of the structure. There is no way to guarantee the correct sizes from the CSV if any of the fields changes size, so the ideal here would be not to use char[] to store strings, but char* or even implement your Dynamic library strings. Parse is relatively simple because CSV has a field stop and may or may not use "" for strings.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.