1
I need to read a cvs file with the following fields:
Id,OwnerUserId,CreationDate,Score,Title,Body
Id inteiro
OwerUserID inteiro
Data vou armazerna como char
Score inteiro
Title texto
Body texto
Example of a line(many lines) :
469, 147, 2008-08-02T15:11:16Z, 21, How can I find the full path to a font from its display name on a Mac?, "Iam using the Photoshop's.......</ul> "
Those..... in the body field was to shorten, because the text size is much larger
and store in a vector of type structs:
struct Questions {
int id;
int ownerUsedId;
char creationDate[30];
int score;
char title[100];
char body[200];
};
For that I did the following function:
void loadQuestions( fstream &file, Questions *questions)
{
string registro;
getline(file, registro);
char *buffer =new char[500];
char *ptr;
getline(file, registro);
strcpy(buffer, registro.c_str());
ptr = strtok(buffer, ",");
cout << atoi(ptr) << " ";// pega o campo id
ptr = strtok(NULL, ",");
cout << atoi(ptr) << " ";// pega o campo ownerUsedId
ptr = strtok(NULL, ",");// pega o campo data
cout << (ptr) << " ";
ptr = strtok(NULL, ",");// pega o campo score
cout << atoi(ptr) << " ";
} // mostrei na tela para ver se o programa tava certo, nao armazenei ainda na struct
Up to the fourth comma everything worked out, I’m separating by comma, but the problem arises: either in the title or body field can appear a comma in the middle of text, forcing strtok
for at this point , with that messed up the whole reading.
Question : how to store each field correctly in my estruct, since in the body and Tittle field can appear several commas, one thing I realized was that the Body field starts and ends with quotation marks("") , ie could use quotation marks as a delimiting point to copy this field, but inside the body( field which is a text) there can be quotes (" ")
How to copy each of these fields correctly?
If what you want to capture is quite specific then it becomes easier to use a regex. In your particular case the simplest will probably be to change the
strotok
from the title to catch"
instead of,
– Isac