How to separate a string into pieces?

Asked

Viewed 10,403 times

7

In other languages there is split, explode or something like that string in pieces according to some separator. There is something ready in C or I have to do in hand?

2 answers

6


You don’t have something so ready, but there is strtok() that analyzes the string and replaces a specified delimiter with a null character and thus what was a single string becomes several, since the null ends the string at that point.

But note that it does not return one array of strings as is common in other languages, so does not with all delimiters. He does only with the first one he finds, to do in the second need wheel the strtok() again and so on. Of course every C programmer does some function(s) utilities to facilitate and deliver what they want.

#include <stdio.h>
#include <string.h>

int main(void) {
    char frutas[] = "banana,laranja,morango";
    int tamanho = strlen(frutas); //isto funciona só para delimitador de 1 caractere
    char *token = strtok(frutas, ",");
    for (int i = 0; i < tamanho; i++) printf(token[i] == 0 ? "\\0" : "%c", token[i]);
    while(token != NULL) {
        printf("\n%s", token);
        token = strtok(NULL, ",");
    }
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

1

In C++ there is no native split function for strings.

Searching the subject is a huge variety of ways to separate a string.

Some examples I found interesting.


Example 1

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;

int main()
{
   // string a ser separada
   string tokenString { "aaa     bbb ccc" };

   // as sub-strings separadas vão ser colocadas neste vetor
   vector<string> tokens;

   // stream de strings de input inicializado com a string a ser separada
   istringstream tokenizer { tokenString };

   // variável de trabalho
   string token;

   // separa as string por espaço e coloca no vetor destino
   while (tokenizer >> token)
     tokens.push_back(token);

   // mostra na tela as sub-strings separadas
   for (const string& token: tokens)
       cout << "* [" << token << "]\n";
}                                                            

Result from example 1:

* [aaa]
* [bbb]
* [ccc]

Example 2

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;

int main()
{
   // string a ser separada
   string tokenString { "aaa, bbb, ccc,,ddd   ,   eee" };

   // as sub-strings separadas vão ser colocadas neste vetor
   vector<string> tokens;

   // stream de strings de input inicializado com a string a ser separada
   istringstream tokenizer { tokenString };

   // variável de trabalho
   string token;

   // separa as sub-strings por vírgula e coloca no vetor destino
   while (getline(tokenizer, token, ','))
      tokens.push_back(token);

   // mostra na tela as sub-strings separadas
   for (const string& token: tokens)
       cout << "* [" << token << "]\n";
}

Result from example 2:

* [aaa]
* [ bbb]
* [ ccc]
* []
* [ddd   ]
* [   eee]

Note that spaces in the destination substrings have been kept. (This would be the case of using another common function for strings called 'Trim' that also does not exist in C++).


Example 3

#include <iostream>
#include <regex>
#include <string>
#include <vector>
using namespace std;

int main()
{
   // string a ser separada
   string tokenString { "aaa, bbb, ccc,,ddd   ,   eee" };

   // as sub-strings separadas vão ser colocadas neste vetor
   vector<string> tokens;

   // expressão regular contendo os delimitadores: espaço e vírgula
   regex delimiters { "[\\s,]+" };

   // cria um iterator para um objeto contendo as sub-strings separadas
   // obs. estou usando uma "receita" pronta, não sei o motivo exato do parametro '-1'
   sregex_token_iterator tokens_begin { tokenString.begin(), tokenString.end(), delimiters, -1 };

   // iterator finalizador
   auto tokens_end = sregex_token_iterator {};

   // copia as sub-strings separadas para o vetor destino
   for (auto token_it = tokens_begin; token_it != tokens_end; token_it++)
      tokens.push_back(*token_it);

   // mostra na tela as sub-strings separadas
   for (const string& token: tokens)
       cout << "* [" << token << "]\n";
}

Result of example 3:

* [aaa]
* [bbb]
* [ccc]
* [ddd]
* [eee]

That’s all for now Folks.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.