String Parsing With Strtok

Started by
4 comments, last by snk_kid 17 years, 9 months ago
I'm trying to create file parsing class. The file reads a file into a local char buffer. I tried using strtok to parse my file. It works fine. However, the original contents is lost after the 1st search. I can't re-use my buffer because the strtok slices half the file away. Is there a way around this? I want to avoid using the MFC CString class for porting issues.
Advertisement
You can make a copy of your buffer or load the file via a fstream and parse it out as you read from the file or create a parser using a parser generator like yacc or spiritm or dump the buffer into a stringstream and read from there, etc.
I assume you have no objection to using the C++ standard library from a portablility point of view. With this in mind, it is pretty trivial to implement your own version of strtok that uses std::string and does not modify the original buffer:

// not tested but fairly sure principle is sound :)#include <string>#include <iostream>using namespace std; // just for brevity herestring mytok(const string &src,unsigned &index,char delim=' '){    string token;    while(src[index]==delim) ++index;    while(src[index] && src[index]!=delim) token.push_back(src[index++]);    return token;}int main(){    string s="this is a string to tokenise";    unsigned i=0; string t=mytok(s,i);    while(t!="")        {        cout << t << endl; t=mytok(s,i);        }    cout << s << endl; // s is unchanged    return 0;}


Having said that, there may well be something in the C++ standard library to do this for you even better. The above would also be much improved by using string iterators instead of an unsigned as well.
I wanted to avoid making a second copy of my buffer. I'll try making my own version of it. Thanks for the help.
The example I posted doesn't copy your whole buffer, it just uses a second buffer to hold the current token.

You're going to have to either copy the whole buffer or at least have a seperate copy of the current token.

I suppose that you COULD do a C-style way of swapping in and out '\0' chars at the end of tokens and moving a char* pointer to the start of each, but it would be a horrible solution and completely unnecessary on a modern computer.
#include <string>template <	typename OutputIterator,	typename CharT,	typename Traits,	typename Alloc>OutputIteratorstringtok(OutputIterator out,          const std::basic_string<CharT, Traits, Alloc>& in,	  const CharT* const delimiters) {	typedef typename std::basic_string<CharT, Traits, Alloc>::size_type size_type;    const size_type len = in.length();	size_type i = 0;    while(i < len) {        // eat leading whitespace        i = in.find_first_not_of(delimiters, i);        if(i == std::string::npos)            return out;   // nothing left but white space        // find the end of the token        size_type j = in.find_first_of(delimiters, i);        // push token        if(j == std::string::npos)            return out = in.substr(i);        else {            out = in.substr(i, j-i);            ++out;        }            // set up for next loop        i = j + 1;    }	return out;}/*************************** EXAMPLE *****************************/#include <iterator>#include <algorithm>#include <deque>#include <iostream>int main() {    std::string sentance("This is a sample string\njust testing\nthis is working;");   typedef std::deque<std::string> tokens;   tokens t;      stringtok(std::back_inserter(t), sentance, "\n");   std::copy(t.begin(), t.end(),             std::ostream_iterator<std::string>(std::cout, "\n"));   return 0;}

This topic is closed to new replies.

Advertisement