Split() function in C

Started by
5 comments, last by Russell 21 years, 8 months ago
In Perl there is a function that will split a string according to a given character. For example, I want to split the string "this is a test" into an array where: args[0] is "this" args[1] is "is" args[2] is "a" args[3] is "test" So I might do: int split(char * string, char * args[], char ch); and call it like: char input[1024]; char * args[256]; input = GetInput(); int nargs = split(input, args, '' ''); I''m not very good with arrays of null terminated strings. Can anyone help me out? Thanks, Russell
Advertisement
Look at the strtok function; it splits a string into tokens separated by any set of characters you want to use. Beware that it modifies the string in-place, so you almost certainly want to make a copy of the string before tokenizing it. Here's an example:

    char* srcString = "This is a test";char* tmpString;char* token;tmpString = new char[strlen(srcString)+1];strcpy(tmpString, srcString);token = strtok(tmpString, " ");while(token != NULL) {  printf("%s\n", token);  token = strtok(NULL, " "); // Note the use of NULL here.}delete[] tmpString;    


Things to note are the copying of the source string and the use of NULL in the strtok function to indicate that the tokenization is a continuation of the first string. If you were to look at the string pointed to by tmpString partway through the tokenization, you'd see that the separator character " " is replaced by an ASCII \0 character. This is why you want to copy the string first.

Hope this helps!

Cheers, dorix

edit: clarified notes a bit.


[edited by - dorix on August 13, 2002 6:50:56 PM]
Is there any way to do it in a non-destructive manner, where the string you pass to be split won''t be changed? I was able to write a function that split it, but it changed the original string, and I''d like to avoid that if at all possible.

Thanks for pointing out the strtok function though. That will come in handy

Russell
The easiest thing to do is just to make a copy of the source string and tokenize the copy. Otherwise, you'll have to write your own tokenizer, which would probably look something like this:

      // inString: Pointer to the string you want tokenized, or NULL to//           return subsequent tokens// cDelim:   For simplicity, just a single character delimiter.char* myTokenizer(const char* inString, char cDelim){  static char* ptrCurrent = NULL; // Not thread safe because of this  int tokenLength;  char* ptrToken;  char* retToken;  // Check for continuation of a previous tokenization  if (inString != NULL) {    ptrCurrent = inString;  }  // Check for end-of-string reached on a previous call  if (ptrCurrent == NULL) return NULL;  // Skip over any tokens at the beginning of the current segment  while(*ptrCurrent == cDelim && *ptrCurrent != '\0') ptrCurrent++;  // Check for end-of-string reached  if (*ptrCurrent == '\0') {    ptrCurrent = NULL; // Leave a quick way to test end-of-string on the next call.    return NULL; // End of string reached.  }  // Save the start of this token, and search for the end of this token  ptrToken = ptrCurrent;  tokenLength = 0;  while(*ptrCurrent != cDelim && *ptrCurrent != '\0') {    ptrCurrent++;    tokenLength++;  }  // Return a new copy of the token.  Don't forget to delete the returned pointer when you're done with it.  retToken = new char[tokenLength+1];  strncpy(retToken, ptrToken, tokenLength);  retToken[tokenLength] = '\0'; // Make sure it's null-terminated  return retToken;}      

Call that function the same way you'd call strtok, except for simplicity I changed the delimiter argument to a single char instead of a string.

I just threw that together on the spot and haven't actually tried it out, but it should work. Don't forget to delete[] the returned pointer when you're finished with each token.

One thing I forgot to mention in my original suggestion to use strtok, is not to delete the copy of your original string until you're finished with the tokens. Each token is actually a pointer into the string, and not a separately allocated block of memory.

Cheers, Dorix

edit: properly null-terminated the returned token.

[edited by - dorix on August 13, 2002 7:25:04 PM]

[edited by - dorix on August 13, 2002 7:30:45 PM]
well, in c++ you could do this:



  #include <iostream>#include <vector>#include <string>using namespace std;vector<string> split(const string& str,const string& delim){	vector<string> ret;	size_t beg = 0;	size_t end = 0;	while((end = str.find(delim,beg))!=(size_t)-1)	{		ret.push_back(str.substr(beg,end-beg));		beg = end + delim.length();	}	ret.push_back(str.substr(beg,string::npos));	return ret;}int main(int argc,char** argv){	vector<string> v = split("This is a test"," ");	for(vector<string>::iterator i = v.begin();i!=v.end();++i)		cout << *i << endl;}  
daerid@gmail.com
Yeah in C++ it''s a snap. C seems like the hard language to do it in, unless you don''t mind it changing the original copy.

Russell
You really should just make a copy of your original string and strtok() the copy. strtok() is a standard function, quite possibly well optimized, and pretty much guaranteed to work the way you expect it to. Plus there''s a thread-safe version called strtok_r that you should look into as well.

It''s really not much of a pain to use if you copy your source string before working with it -- unless it''s one *very* long string and you don''t want a second copy of it in memory.

Cheers, dorix

This topic is closed to new replies.

Advertisement