[C++] Imitate Split() from Python

Started by
3 comments, last by Zipster 14 years ago
Hi I am trying to make my own Split function like that in Python but written in C++. If you dont know what the split function does, you use it like this:
string s = "hello. hi. Yo.";
result = string.split('.') 

// result[0] = "hello"
   result[1] = " hi" 
   etc.
My problem is that I am having trouble thinking up my algorithm in an efficient way. Right now my split function is really inefficient, I check every character in a string & see if it is == to the target split character. What do you think pythons algorithm would be for their split() function? Can you suggest a better algorithm to split a string at a target character? Also in C++ is it possible to return an array from a function? Maybe I can only return a pointer to an array?

#include <iostream>
#include <cstdlib>
#include <vector>
#include <string>

using namespace std;

vector<string> split(string s, char target);

int main() {
    
    string a = "m kdfjkdmdfjdhf Mhkfjdjkfh M hnkjsjkdf m jkdsjcfkm sjfjkdsh";
    
    vector <string> splitted = split(a,'m');
    
    for (int i=0; i<splitted.size(); i++) 
    {
        cout << splitted.at(i) << endl;
        
    }
    
    system("pause");
    return 0;
}

vector<string> split(string s, char target)
{
    // Post: This function is the same as the split() function in python. 
    //       A string is split at every occurence of target & each section 
    //       is stored in a vector element.
    
    vector <string> result;
    string tempStr;
    
    //if (!islower(target)) {
    target = tolower(target);
    //}
    
    for (int i=0; i<s.length(); i++) 
    {
        s = tolower(s);
        if (s == target) {
           string res = "'"+tempStr+"'";
           result.push_back(res);
           tempStr = "";
        }
        else tempStr += s;
    }
    
    if (tempStr.length() > 0) {
         string res = "'"+tempStr+"'";
         result.push_back(res);
    }
    
    return result; 
}


Advertisement
Here's some examples.
Quote:Also in C++ is it possible to return an array from a function? Maybe I can only return a pointer to an array?
Returning a vector is a pretty standard thing to do if you want to return an array, especially if the size of the array isn't known at compile-time.

On a side note, when passing strings around in C++ it's best to pass them by const-reference, like:
vector<string> split(const string& s, char target)
Boost provides a library with string algorithms that might be worth a look (it has split).
The Python split() allows you to specify a string as the delimiter (since Python does not distinguish a separate character type from the string type; '.' is a string of length 1 rather than a character, and 'hi mom'[0][0][0][0][0][0][0][0] will not raise an exception), and is case-sensitive (so you shouldn't be doing anything with tolower().

Also, the Python split function does not add quotes to the substrings. That's part of Python's built-in formatting for displaying a representation of strings. You should definitely not add them in the C++ version.

Finally, there are a bunch of 'find' functions in the std::string interface that you seem to be unaware of. Don't make life hard for yourself.

Quote:Also in C++ is it possible to return an array from a function?


You can wrap an array in a structure and return an instance of the structure. Or you can use a pre-made structure for that purpose: boost::array (which is designed to let you treat it just like an array, with [] subscripting and everything).

However, for the current task, an array is inappropriate because you do not know ahead of time how many substrings there will be.

Here's what I came up with:

vector<string> split(const string& source, const string& delimiter, int limit = -1) {	string::size_type position = 0;	const int delimiter_size = delimiter.size();	if (delimiter_size == 0) { throw invalid_argument("empty delimiter"); }	vector<string> result;	// Note the use of '!=' here rather than '<' which allows us to treat a -1	// limit value as "infinity". The loop will still break when the string can't	// be found any more.	for (int i = 0; i != limit; ++i) {		string::size_type found_at = source.find(delimiter, position);		if (found_at == string::npos) { break; }		result.push_back(string(source, position, found_at - position));		position = found_at + delimiter_size;	}	result.push_back(string(source, position));	return result;}
Our library's split function looks something like this:

vector<string> split(const string& in, const string& delim) {   string::size_type start = in.find_first_not_of(delim), end = 0;      vector<string> out;   while(start != in.npos) {      end = in.find_first_of(delim, start);      if(end == in.npos) {         out.push_back(in.substr(start));         break;      } else {         out.push_back(in.substr(start, end-start));      }      start = in.find_first_not_of(delim, end);   }   return out;      }

But of course there are probably a dozen different ways you could go about writing a split function, with a ton of different parameters you could potentially pass in to tweak the behavior to precisely your needs.

This topic is closed to new replies.

Advertisement