• Advertisement
Sign in to follow this  

String Parsing With Strtok

This topic is 4231 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to create file parsing class. The file reads a file into a local char buffer. I tried using strtok to parse my file. It works fine. However, the original contents is lost after the 1st search. I can't re-use my buffer because the strtok slices half the file away. Is there a way around this? I want to avoid using the MFC CString class for porting issues.

Share this post


Link to post
Share on other sites
Advertisement
You can make a copy of your buffer or load the file via a fstream and parse it out as you read from the file or create a parser using a parser generator like yacc or spiritm or dump the buffer into a stringstream and read from there, etc.

Share this post


Link to post
Share on other sites
I assume you have no objection to using the C++ standard library from a portablility point of view. With this in mind, it is pretty trivial to implement your own version of strtok that uses std::string and does not modify the original buffer:


// not tested but fairly sure principle is sound :)

#include <string>
#include <iostream>

using namespace std; // just for brevity here

string mytok(const string &src,unsigned &index,char delim=' ')
{
string token;

while(src[index]==delim) ++index;
while(src[index] && src[index]!=delim) token.push_back(src[index++]);

return token;
}

int main()
{
string s="this is a string to tokenise";

unsigned i=0; string t=mytok(s,i);
while(t!="")
{
cout << t << endl; t=mytok(s,i);
}

cout << s << endl; // s is unchanged

return 0;
}





Having said that, there may well be something in the C++ standard library to do this for you even better. The above would also be much improved by using string iterators instead of an unsigned as well.

Share this post


Link to post
Share on other sites
I wanted to avoid making a second copy of my buffer. I'll try making my own version of it. Thanks for the help.

Share this post


Link to post
Share on other sites
The example I posted doesn't copy your whole buffer, it just uses a second buffer to hold the current token.

You're going to have to either copy the whole buffer or at least have a seperate copy of the current token.

I suppose that you COULD do a C-style way of swapping in and out '\0' chars at the end of tokens and moving a char* pointer to the start of each, but it would be a horrible solution and completely unnecessary on a modern computer.

Share this post


Link to post
Share on other sites

#include <string>

template <
typename OutputIterator,
typename CharT,
typename Traits,
typename Alloc
>
OutputIterator
stringtok(OutputIterator out,
const std::basic_string<CharT, Traits, Alloc>& in,
const CharT* const delimiters) {

typedef typename std::basic_string<CharT, Traits, Alloc>::size_type size_type;

const size_type len = in.length();
size_type i = 0;

while(i < len) {
// eat leading whitespace
i = in.find_first_not_of(delimiters, i);
if(i == std::string::npos)
return out; // nothing left but white space

// find the end of the token
size_type j = in.find_first_of(delimiters, i);

// push token
if(j == std::string::npos)
return out = in.substr(i);
else {
out = in.substr(i, j-i);
++out;
}

// set up for next loop
i = j + 1;
}

return out;
}

/*************************** EXAMPLE *****************************/
#include <iterator>
#include <algorithm>
#include <deque>
#include <iostream>

int main() {


std::string sentance("This is a sample string\njust testing\nthis is working;");

typedef std::deque<std::string> tokens;

tokens t;

stringtok(std::back_inserter(t), sentance, "\n");

std::copy(t.begin(), t.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));

return 0;
}

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement