String Parsing With Strtok
I'm trying to create file parsing class. The file reads a file into a local char buffer. I tried using strtok to parse my file. It works fine. However, the original contents is lost after the 1st search. I can't re-use my buffer because the strtok slices half the file away. Is there a way around this? I want to avoid using the MFC CString class for porting issues.
You can make a copy of your buffer or load the file via a fstream and parse it out as you read from the file or create a parser using a parser generator like yacc or spiritm or dump the buffer into a stringstream and read from there, etc.
I assume you have no objection to using the C++ standard library from a portablility point of view. With this in mind, it is pretty trivial to implement your own version of strtok that uses std::string and does not modify the original buffer:
Having said that, there may well be something in the C++ standard library to do this for you even better. The above would also be much improved by using string iterators instead of an unsigned as well.
// not tested but fairly sure principle is sound :)#include <string>#include <iostream>using namespace std; // just for brevity herestring mytok(const string &src,unsigned &index,char delim=' '){ string token; while(src[index]==delim) ++index; while(src[index] && src[index]!=delim) token.push_back(src[index++]); return token;}int main(){ string s="this is a string to tokenise"; unsigned i=0; string t=mytok(s,i); while(t!="") { cout << t << endl; t=mytok(s,i); } cout << s << endl; // s is unchanged return 0;}
Having said that, there may well be something in the C++ standard library to do this for you even better. The above would also be much improved by using string iterators instead of an unsigned as well.
I wanted to avoid making a second copy of my buffer. I'll try making my own version of it. Thanks for the help.
The example I posted doesn't copy your whole buffer, it just uses a second buffer to hold the current token.
You're going to have to either copy the whole buffer or at least have a seperate copy of the current token.
I suppose that you COULD do a C-style way of swapping in and out '\0' chars at the end of tokens and moving a char* pointer to the start of each, but it would be a horrible solution and completely unnecessary on a modern computer.
You're going to have to either copy the whole buffer or at least have a seperate copy of the current token.
I suppose that you COULD do a C-style way of swapping in and out '\0' chars at the end of tokens and moving a char* pointer to the start of each, but it would be a horrible solution and completely unnecessary on a modern computer.
#include <string>template < typename OutputIterator, typename CharT, typename Traits, typename Alloc>OutputIteratorstringtok(OutputIterator out, const std::basic_string<CharT, Traits, Alloc>& in, const CharT* const delimiters) { typedef typename std::basic_string<CharT, Traits, Alloc>::size_type size_type; const size_type len = in.length(); size_type i = 0; while(i < len) { // eat leading whitespace i = in.find_first_not_of(delimiters, i); if(i == std::string::npos) return out; // nothing left but white space // find the end of the token size_type j = in.find_first_of(delimiters, i); // push token if(j == std::string::npos) return out = in.substr(i); else { out = in.substr(i, j-i); ++out; } // set up for next loop i = j + 1; } return out;}/*************************** EXAMPLE *****************************/#include <iterator>#include <algorithm>#include <deque>#include <iostream>int main() { std::string sentance("This is a sample string\njust testing\nthis is working;"); typedef std::deque<std::string> tokens; tokens t; stringtok(std::back_inserter(t), sentance, "\n"); std::copy(t.begin(), t.end(), std::ostream_iterator<std::string>(std::cout, "\n")); return 0;}
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement