implementation of Split() in C++

Asked

Viewed 1,489 times

2

I was looking for a way to implement the . split() C# function in C++ and found the following code in google:

std::vector<std::string> split(const std::string& text, char sep)
{
    std::vector<std::string> tokens;
    std::size_t start = 0, end = 0;

    while ((end = text.find(sep, start)) != std::string::npos)
    {
        tokens.push_back(text.substr(start, end - start));
        start = end + 1;
    }

    tokens.push_back(text.substr(start));
    return tokens;
}

I put the code in my little project, tested it and saw that it worked, so I left soon to try to understand exactly how the algorithm works. I understood most of the code, but I stuck to the while condition part:

while ((end = text.find(sep, start)) != std::string::npos)

I kind of understood that it checks to see if there’s still the value of char Sep (a ',' e.g. ) in the string, but I couldn’t understand as he does that. Could someone please explain to me in detail how this piece of code works? Thank you very much.

3 answers

2

There’s something simpler than that:

#include <bits/stdc++.h>
using namespace std;

vector <string> split(string text, char separator = ' '){
    string str;
    stringstream ss(text);
    vector <string> result;
    while(getline(str, ss, separator)){
        result.pushback(str);
    }
    return result;
}

I know you don’t want to see this, but there’s a way to reverse it:

#include <bits/stdc++.h>
using namespace std;

string unsplit(vector<string> setence, char spliter = ' '){
    string result;
    for(int i = 0; i < setence.size; i++){
        result = result + setence[i] + spliter;
    }
    result.remove(result.size - 1);
    return result;
}

2


The method find find the position of the character sep starting from the index start, if it finds it returns the position of the character to end, else it returns npos for end.

That’s why the condition of while is that he continues to iterate while end is different from npos.

The method substr copies a certain amount of characters from a string starting with an index that would be start and the amount would be end - start.

Simplifying would be that:

std::string text { "texto1,texto2,texto3" };

size_t start = 0, end = 0;

end = text.find(',', start); // start(0), end(6) != npos (true, executa o while)

{
  text.substr(start, end - start); // 0, (6 - 0 = 0) = texto1

  start = end + 1; // 6 + 1 = 7
}

end = text.find(',', start); // start(7), end(13) != npos (true, executa o while)

{
  text.substr(start, end - start); // 7, (13 - 7 = 6) = texto2

  start = end + 1; // 13 + 1 = 14
}

end = text.find(',', start); // start(14), end(npos) != npos (false, encerra o while)

text.substr(start); // copia do índice 14 até o final, que seria o texto3

Links to function documentation:

https://en.cppreference.com/w/cpp/string/basic_string/find

https://en.cppreference.com/w/cpp/string/basic_string/npos

https://en.cppreference.com/w/cpp/string/basic_string/substr

  • 2

    Let me get this straight; if the. find() not finding the character (end = text.find(Sep, start)), it returns npos to the end and then evaluates whether the end is different from npos, right? (end != Std::string::npos)

  • @Pedro yes, that’s right.

  • 1

    thank you very much, man. I understand perfectly now!

0

I wrote a split() as if it were a stl algorithm in Modern C++ :

template<typename it, typename valueT, typename outIt>
void split(it first, it last, const valueT& e, outIt&& o) {
    auto next = first;
    while ( next != last ) {
        next = std::find(first, last, e);
        *o++ = { first, next };
        first = next;
        first++;
    }
}

1 - Works with iterators, Begin(), end(), istream_iterator...

string x = ",teste,oi,vamos";
split(x.begin(), x.end(), ',', ostream_iterator<string>(cout, "\n"));

2 - Works with empty sequences;

 string x = ",teste,oi,vamos";
 string y = ",,,,";
 string z = "";

3 - Works with any container, not just strings;

 vector v = {30, 20, 0, 40, 50, 10, 70, 0, 20, 30};
 vector<vector<int>> splitted;
 cout << "Separando um vetor com em vários vetores " << endl;  
 split(v.begin(), v.end(), 0, back_inserter(splitted));

4 - Has lambda version;

split_if(v.begin(), v.end(), [](auto const& val) { return val < 20; }, back_inserter(w));

5 - Does not copy values ( except in case iterator copy );

vector<string_view> w;  // <-  string view não copia
split(x.begin(), x.end(), ',', back_inserter(w));

functional example https://godbolt.org/z/x76ojc

Browser other questions tagged

You are not signed in. Login or sign up in order to post.