Im Not So Good At Writing Parsers

Started by
11 comments, last by Tradone 18 years, 1 month ago
I've never been really good at this so im all hears as to how i can parse the following data solidly. From a file a string is read that can be any length containing any number of floats: "1.0,1.0,2.0,123.34" What is a solid way of parsing this without using any existing libraries? Dave
use the strtok function to tokenize the "," character and save the results into a vector of some sort.

You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.
I've never been fond of strtok, isn't that C anyhoo, i'd like to write the algorithm myself.

If you'd prefer std::strings you can use getline().

string myDatawhile (!(getline(myFloatFile, myData, ',')).eof()) // get data until a comma is encountered{  // convert std::string myData to a float with a stringstream or what have you, push back}
It only takes one mistake to wake up dead the next morning.
From the strtok-man page: Never use these functions.
The idea is good though. If you're using C++, you might find the following piece of code useful:
std::vector<std::string> StringUtils::tokenize(std::string const &str, std::string const &delims){	std::vector<std::string> tokens;	size_t pos, pos2;	pos = str.find_first_not_of(delims, 0);	while (pos != std::string::npos)	{		pos2 = str.find_first_of(delims, pos);		if (pos2 == std::string::npos)			pos2 = str.length();		tokens.push_back(str.substr(pos, pos2-pos));		pos = str.find_first_not_of(delims, pos2);	}	return tokens;}

It returns a vector of strings, each containing a number.
So, your code will be something like this:
  vector<string> tokens = tokenize(myStr, ",");  for( vector<string>::iterator i = tokens.begin(); i != tokens.end(); ++i )     doSomethingWithNumber( atof(i->c_str()) );
So where does strtok stand with us these days. I was under the impression that it was old hat now?

Ok thanks guys, it seems that the simplest solution is the getline method, thanks also for your contribution DaBono.

Quote:Original post by Dave
So where does strtok stand with us these days.

non-reentrant, not thread safe

There's strtok_r() if you want to be re-entrant.

However, when scanning a list of values, I prefer to just do it manually. Using C style libraries:

char * str = "1,2,3.14";char * end;while( str && *str ) {  end = 0;  double d = strtod( str, &end );  if( !end || end == str ) {    break;  }  handle_double( d ); // put in an array or whatever  str = strcspn( end, "-+0123456789." ); // wind forward to next number}

If you want to be a little more specific about what characters you accept for delimiters, you can instead use 'str = strspn( end, ", " )' (for when you only want to accept commas and spaces).

You can formulate this same loop using std::string::find_first_of().
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.
