Trusted answers to developer questions

How to tokenize a string in C++

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

In this shot, we are going to learn how to tokenize a string in C++. We can create a stringstream object or use the built-in strtok() function to tokenize a string. However, we will create our own tokenizer in C++.

Follow the steps below to tokenize a string:

  • Read the complete string.

  • Select the delimiter the point that you want to tokenize your string. In this example, we will tokenize the string in every space.

  • Iterate over all the characters of the string and check when the delimiter is found.

  • If delimiter is found, then you can push your word to a vector.

  • Repeat this process until you have traversed the complete string.

  • For the last token, you won’t have any space, so that token needs to be pushed to the vector after the loop.

  • Finally, return the vector of tokens.

Now let’s look at the code for clarity.

#include <bits/stdc++.h>
using namespace std;
vector<string> mystrtok(string str, char delim){
vector<string> tokens;
string temp = "";
for(int i = 0; i < str.length(); i++){
if(str[i] == delim){
tokens.push_back(temp);
temp = "";
}
else
temp += str[i];
}
tokens.push_back(temp);
return tokens;
}
int main() {
string s = "Learn in-demand tech skills in half the time";
vector<string> tokens = mystrtok(s, ' ');
for(string s: tokens)
cout << s << endl;
}

Explanation

  • In line 1, we include the bits/stdc++.h library, which includes all the libraries for us (we do not need to include each library explicitly).
  • In line 4, we create a mystrtok() function that accepts the string and the delimiter and returns a vector of tokens.
  • In line 5, we create a vector that will store the tokens.
  • From lines 7 to 14, we run a loop to traverse each character in the string. If we find the delimiter, then we push the token to the vector. Otherwise, we continue to build our token.
  • In line 15, as discussed above, the last token will not be pushed to the vector, so we need to push the last token after the loop.
  • In line 16, we return the vector of tokens.
  • In the main() function, we call the function and then print every token.

In this way, we can build our own string tokenizer in C++ to use in string problems.

RELATED TAGS

c++
string
Did you find this helpful?