Archived

This topic is now archived and is closed to further replies.

Russell

Split() function in C

Recommended Posts

In Perl there is a function that will split a string according to a given character. For example, I want to split the string "this is a test" into an array where: args[0] is "this" args[1] is "is" args[2] is "a" args[3] is "test" So I might do: int split(char * string, char * args[], char ch); and call it like: char input[1024]; char * args[256]; input = GetInput(); int nargs = split(input, args, '' ''); I''m not very good with arrays of null terminated strings. Can anyone help me out? Thanks, Russell

Share this post


Link to post
Share on other sites
Look at the strtok function; it splits a string into tokens separated by any set of characters you want to use. Beware that it modifies the string in-place, so you almost certainly want to make a copy of the string before tokenizing it. Here's an example:

    
char* srcString = "This is a test";
char* tmpString;
char* token;

tmpString = new char[strlen(srcString)+1];
strcpy(tmpString, srcString);

token = strtok(tmpString, " ");
while(token != NULL) {
printf("%s\n", token);
token = strtok(NULL, " "); // Note the use of NULL here.

}
delete[] tmpString;


Things to note are the copying of the source string and the use of NULL in the strtok function to indicate that the tokenization is a continuation of the first string. If you were to look at the string pointed to by tmpString partway through the tokenization, you'd see that the separator character " " is replaced by an ASCII \0 character. This is why you want to copy the string first.

Hope this helps!

Cheers, dorix

edit: clarified notes a bit.


[edited by - dorix on August 13, 2002 6:50:56 PM]

Share this post


Link to post
Share on other sites
Is there any way to do it in a non-destructive manner, where the string you pass to be split won''t be changed? I was able to write a function that split it, but it changed the original string, and I''d like to avoid that if at all possible.

Thanks for pointing out the strtok function though. That will come in handy

Russell

Share this post


Link to post
Share on other sites
The easiest thing to do is just to make a copy of the source string and tokenize the copy. Otherwise, you'll have to write your own tokenizer, which would probably look something like this:

      
// inString: Pointer to the string you want tokenized, or NULL to

// return subsequent tokens

// cDelim: For simplicity, just a single character delimiter.

char* myTokenizer(const char* inString, char cDelim)
{
static char* ptrCurrent = NULL; // Not thread safe because of this

int tokenLength;
char* ptrToken;
char* retToken;

// Check for continuation of a previous tokenization

if (inString != NULL) {
ptrCurrent = inString;
}

// Check for end-of-string reached on a previous call

if (ptrCurrent == NULL) return NULL;

// Skip over any tokens at the beginning of the current segment

while(*ptrCurrent == cDelim && *ptrCurrent != '\0') ptrCurrent++;

// Check for end-of-string reached

if (*ptrCurrent == '\0') {
ptrCurrent = NULL; // Leave a quick way to test end-of-string on the next call.

return NULL; // End of string reached.

}

// Save the start of this token, and search for the end of this token

ptrToken = ptrCurrent;
tokenLength = 0;
while(*ptrCurrent != cDelim && *ptrCurrent != '\0') {
ptrCurrent++;
tokenLength++;
}

// Return a new copy of the token. Don't forget to delete the returned pointer when you're done with it.

retToken = new char[tokenLength+1];
strncpy(retToken, ptrToken, tokenLength);
retToken[tokenLength] = '\0'; // Make sure it's null-terminated

return retToken;
}

Call that function the same way you'd call strtok, except for simplicity I changed the delimiter argument to a single char instead of a string.

I just threw that together on the spot and haven't actually tried it out, but it should work. Don't forget to delete[] the returned pointer when you're finished with each token.

One thing I forgot to mention in my original suggestion to use strtok, is not to delete the copy of your original string until you're finished with the tokens. Each token is actually a pointer into the string, and not a separately allocated block of memory.

Cheers, Dorix

edit: properly null-terminated the returned token.

[edited by - dorix on August 13, 2002 7:25:04 PM]

[edited by - dorix on August 13, 2002 7:30:45 PM]

Share this post


Link to post
Share on other sites
well, in c++ you could do this:



  
#include <iostream>

#include <vector>

#include <string>

using namespace std;

vector<string> split(const string& str,const string& delim)
{
vector<string> ret;
size_t beg = 0;
size_t end = 0;
while((end = str.find(delim,beg))!=(size_t)-1)
{
ret.push_back(str.substr(beg,end-beg));
beg = end + delim.length();
}
ret.push_back(str.substr(beg,string::npos));
return ret;
}

int main(int argc,char** argv)
{
vector<string> v = split("This is a test"," ");
for(vector<string>::iterator i = v.begin();i!=v.end();++i)
cout << *i << endl;
}

Share this post


Link to post
Share on other sites
You really should just make a copy of your original string and strtok() the copy. strtok() is a standard function, quite possibly well optimized, and pretty much guaranteed to work the way you expect it to. Plus there''s a thread-safe version called strtok_r that you should look into as well.

It''s really not much of a pain to use if you copy your source string before working with it -- unless it''s one *very* long string and you don''t want a second copy of it in memory.

Cheers, dorix

Share this post


Link to post
Share on other sites