Im Not So Good At Writing Parsers

Started by
11 comments, last by Tradone 18 years, 1 month ago
I've never been really good at this so im all hears as to how i can parse the following data solidly. From a file a string is read that can be any length containing any number of floats: "1.0,1.0,2.0,123.34" What is a solid way of parsing this without using any existing libraries? Dave
Advertisement
use the strtok function to tokenize the "," character and save the results into a vector of some sort.

http://www.cplusplus.com/ref/cstring/strtok.html

You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.
I've never been fond of strtok, isn't that C anyhoo, i'd like to write the algorithm myself.

Dave
If you'd prefer std::strings you can use getline().

string myDatawhile (!(getline(myFloatFile, myData, ',')).eof()) // get data until a comma is encountered{  // convert std::string myData to a float with a stringstream or what have you, push back}
It only takes one mistake to wake up dead the next morning.
From the strtok-man page: Never use these functions.
The idea is good though. If you're using C++, you might find the following piece of code useful:
std::vector<std::string> StringUtils::tokenize(std::string const &str, std::string const &delims){	std::vector<std::string> tokens;	size_t pos, pos2;	pos = str.find_first_not_of(delims, 0);	while (pos != std::string::npos)	{		pos2 = str.find_first_of(delims, pos);		if (pos2 == std::string::npos)			pos2 = str.length();		tokens.push_back(str.substr(pos, pos2-pos));		pos = str.find_first_not_of(delims, pos2);	}	return tokens;}

It returns a vector of strings, each containing a number.
So, your code will be something like this:
  vector<string> tokens = tokenize(myStr, ",");  for( vector<string>::iterator i = tokens.begin(); i != tokens.end(); ++i )     doSomethingWithNumber( atof(i->c_str()) );
So where does strtok stand with us these days. I was under the impression that it was old hat now?

Dave
Ok thanks guys, it seems that the simplest solution is the getline method, thanks also for your contribution DaBono.

Dave
Quote:Original post by Dave
So where does strtok stand with us these days.


non-reentrant, not thread safe

There's strtok_r() if you want to be re-entrant.

However, when scanning a list of values, I prefer to just do it manually. Using C style libraries:

char * str = "1,2,3.14";char * end;while( str && *str ) {  end = 0;  double d = strtod( str, &end );  if( !end || end == str ) {    break;  }  handle_double( d ); // put in an array or whatever  str = strcspn( end, "-+0123456789." ); // wind forward to next number}


If you want to be a little more specific about what characters you accept for delimiters, you can instead use 'str = strspn( end, ", " )' (for when you only want to accept commas and spaces).

You can formulate this same loop using std::string::find_first_of().
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement