Count how many columns are in a C++ CSV file

Asked

Viewed 557 times

3

I am doing a design of an electronic ballot box, and for that I need to read a csv file where there is information about each candidate.

As this csv file has a lot of information that is not relevant, I decide to use only certain columns of the csv file, for example: NM_CANDIDATO, NM_PARTIDO, ...

The solution I thought was to start a counter to save the "index" of the desired columns, but I can’t determine the end of the first line, and the index keeps being incremented with all the data.

"DT_GERACAO"; "SG_PARTIDO"  ;  "HH_GERACAO"
"03/09/2018"; "DC"          ;   "08:01:43"  
"03/09/2018"; "MDB"         ;   "08:01:43"
"03/09/2018"; "PODE"        ;   "08:01:43"

In this example, only the column SG_PARTIDO interests me. Thus, a counter i is initialized i=1 and during getline() of the first line is incremented. When a desired column is found, the position of that column is saved, so that when the counter is initialized i=1 in the next row some action is performed in the desired column.

The code I wrote is this below:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    ifstream file("presidentes.csv");

    if(!file.is_open())
    {
        cout<<"Erro! Não foi possível abrir esse arquvio"<<'\n';
    }

    string buffer;
    int i=1 p_SG_PARTIDO = 0;

    while(!file.eof())
    {
        getline(file, buffer, ';');

        if(buffer == "SG_PARTIDO" || i == p_SG_PARTIDO)
        {
            p_SG_PARTIDO = i;
            cout << buffer;
        }

        i++;

        if(buffer == "\n") i=1;
    }

    file.close();
    return 0;

}

This buffer condition is never true. I suspect the reason is a double quote, ""SG_PARTIDO"". When I removed the first and last character, before the comparison, that condition becomes true, but I still have the problem of not knowing when the first line ends.

The code that I remove the character is this below:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    ifstream file("presidentes.csv");

    if(!file.is_open())
    {
        cout<<"Erro! Não foi possível abrir esse arquvio"<<'\n';
    }

    string buffer;
    int i=1;
    int p_ds_cargo = 0;

    while(!file.eof())
    {
        getline(file, buffer, ';');

        if(buffer[0]=='"') // remover """"
        {
            buffer.erase(0,1);
            buffer.erase(buffer.size() - 1);
        }

        if(buffer == "DT_GERACAO")
        {
            p_ds_cargo = i;
            cout << buffer << endl;   
        }

        if(buffer == "\n") i = 1;

        i++;
    }



    file.close();
    return 0;
}

I appreciate if anyone knows an easier way to read only specific columns in a csv file.

The link to the csv I’m using is this: https://gitlab.com/oofga/eps/eps_2018_2/ep1/raw/master/data/consulta_cand_2018_BR.csv?inline=false

1 answer

2


Problems

The csv file has all the contents inside double quotes(") soon the comparison you have will never work:

if(buffer == "SG_PARTIDO"

Getting around this problem is by comparing with quotation marks:

if(buffer == "\"SG_PARTIDO\""

Or remove quotes from the read value before comparing:

if(buffer.substr(1, buffer.size() - 2) == "SG_PARTIDO"

However the line change test also does not work:

if(buffer == "\n") i=1;

For reading with getline is always made up to the ; soon will never read only the line break.

Other Approach

I suggest another approach to the problem, which turns out to be more robust and would also allow you to get data from multiple columns, something that would get more complicated the way it was doing.

The idea is:

  • Read each line normally with getline the normal delimiter prevailing \n
  • Get each column of each row through getline changing the delimiter to ;
  • Stores each line in one vector<string> and all lines in one vector<vector<string> >

Reading

There are many ways to try to read a csv in a matrix, but I chose one that I consider simple and allows you to get the information you want.

Implementation:

int main()
{
    //...
    string buffer;
    vector<vector<string> > linhas; //vetor de vetor para toads as linhas

    while(!file.eof())
    {
        getline(file, buffer); //ler cada linha
        stringstream ss(buffer); //colocar a linha lida num stringstream

        vector<string> linha; //iniciar o vetor da linha
        while (getline(ss, buffer, ';')) { //ler cada coluna
            linha.push_back(buffer); //adicionar ao vetor da linha
        }

        linhas.push_back(linha); 
    }
    //...

Note that I made both use of vector as stringstream and so I needed two includes additional:

#include <vector>
#include <sstream>

Another possibility here is to add only the texts already without the quotation marks, which will make it easier later when you need to show and compare, exchanging linha.push_back(buffer); for linha.push_back(buffer.substr(1, buffer.size() - 2);, but in the rest of the answer I assume they were added with the quotation marks.

Use of a column

A naïve and not very efficient yet simple way to get only information for all rows referring to the column SG_PARTIDO is:

for (size_t i = 0; i < linhas.size(); ++i){
    for (size_t j = 0; j < linhas[i].size(); ++j){
        //se na primeira linha desta coluna tem SG_PARTIDO
        if (linhas[0][j] == "\"SG_PARTIDO\""){ 
            cout << linhas[i][j].substr(1, linhas[i][j].size() - 2); //mostrar sem as "
        }
    }
    cout << endl;
}

Naturally this assumes that the first line has csv headers. I used csv substr to show only content excluding double quotes.

Test example on my machine:

inserir a descrição da imagem aqui

Use of multiple columns

If you are interested in multiple columns you can construct an array with the indices of the columns you are interested in and then just iterate over those:

vector<int> colunasRelevantes;
for (size_t i = 0; i < linhas[0].size(); ++i){
    string nomeCol = linhas[0][i].substr(1, linhas[0][i].size() - 2);
    if (nomeCol == "SG_PARTIDO" || nomeCol == "NM_CANDIDATO" || nomeCol == "NM_PARTIDO"){
        colunasRelevantes.push_back(i);
    }
}

for (size_t i = 0; i < linhas.size(); ++i){
    for (size_t j = 0; j < colunasRelevantes.size(); ++j){
        int coluna = colunasRelevantes[j];
        string texto = linhas[i][coluna].substr(1, linhas[i][coluna].size() - 2);
        cout << texto << "\t";
    }
    cout << endl;
}

It is important to mention that I had to remove the last blank line from the file in order not to error accessing columns that do not exist.

In C++11 these loops are simpler, but I didn’t do it initially to not already show syntax that may be new. Still I’ll leave it anyway to stay as a reference:

//Esta parte fica igual
vector<int> colunasRelevantes;
for (size_t i = 0; i < linhas[0].size(); ++i){
    string nomeCol = linhas[0][i].substr(1, linhas[0][i].size() - 2);
    if (nomeCol == "SG_PARTIDO" || nomeCol == "NM_CANDIDATO" || nomeCol == "NM_PARTIDO"){
        colunasRelevantes.push_back(i);
    }
}

//Aqui c++ enhanced for loop
for (auto linha : linhas){
    for (auto coluna: colunasRelevantes){
        string texto = linha[coluna].substr(1, linha[coluna].size() - 2);
        cout << texto << "\t\t\t";
    }
    cout << endl;
}

If you need to use many columns then it is easier to use an array for those names and build the indices based on a double for:

vector<string> nomesColunasRelevantes = {"SG_PARTIDO", "NM_CANDIDATO", "NM_PARTIDO"};
vector<int> colunasRelevantes;
for (size_t i = 0; i < linhas[0].size(); ++i){
    string nomeCol = linhas[0][i].substr(1, linhas[0][i].size() - 2);
    for (string nome : nomesColunasRelevantes){
        if (nome == nomeCol){
            colunasRelevantes.push_back(i);
        }
    }
}

Column

Now it is also easy to answer the question you have in the title of the question:

Count how many columns are in a CSV file with C++

Just access the size() of any of the lines:

cout << linhas[0].size();

That gives 58 to the file presented.

  • I really liked this new way of reading the relevant columns!! Thank you very much, you opened my mind!!

  • @Durvalcarvalho No problem we’re here to help :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.