Sign in to follow this  
dave

Im Not So Good At Writing Parsers

Recommended Posts

I've never been really good at this so im all hears as to how i can parse the following data solidly. From a file a string is read that can be any length containing any number of floats: "1.0,1.0,2.0,123.34" What is a solid way of parsing this without using any existing libraries? Dave

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
use the strtok function to tokenize the "," character and save the results into a vector of some sort.

http://www.cplusplus.com/ref/cstring/strtok.html

Share this post


Link to post
Share on other sites
You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.

Share this post


Link to post
Share on other sites
If you'd prefer std::strings you can use getline().


string myData
while (!(getline(myFloatFile, myData, ',')).eof()) // get data until a comma is encountered
{
// convert std::string myData to a float with a stringstream or what have you, push back
}

Share this post


Link to post
Share on other sites
From the strtok-man page: Never use these functions.
The idea is good though. If you're using C++, you might find the following piece of code useful:

std::vector<std::string> StringUtils::tokenize(std::string const &str, std::string const &delims)
{
std::vector<std::string> tokens;
size_t pos, pos2;
pos = str.find_first_not_of(delims, 0);

while (pos != std::string::npos)
{
pos2 = str.find_first_of(delims, pos);
if (pos2 == std::string::npos)
pos2 = str.length();

tokens.push_back(str.substr(pos, pos2-pos));
pos = str.find_first_not_of(delims, pos2);
}
return tokens;
}

It returns a vector of strings, each containing a number.
So, your code will be something like this:
  vector<string> tokens = tokenize(myStr, ",");
for( vector<string>::iterator i = tokens.begin(); i != tokens.end(); ++i )
doSomethingWithNumber( atof(i->c_str()) );

Share this post


Link to post
Share on other sites
There's strtok_r() if you want to be re-entrant.

However, when scanning a list of values, I prefer to just do it manually. Using C style libraries:


char * str = "1,2,3.14";
char * end;

while( str && *str ) {
end = 0;
double d = strtod( str, &end );
if( !end || end == str ) {
break;
}
handle_double( d ); // put in an array or whatever
str = strcspn( end, "-+0123456789." ); // wind forward to next number
}


If you want to be a little more specific about what characters you accept for delimiters, you can instead use 'str = strspn( end, ", " )' (for when you only want to accept commas and spaces).

You can formulate this same loop using std::string::find_first_of().

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.


Assuming Python, you could just call eval() on the string and it will give you a tuple of the values.

Share this post


Link to post
Share on other sites
Quote:
Original post by Flarelocke
Quote:
Original post by SiCrane
You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.


Assuming Python, you could just call eval() on the string and it will give you a tuple of the values.

And risk someone having embedded some python code in the input to your program...

John B

Share this post


Link to post
Share on other sites

void ParseToArray( int Parse_This_Count, string Parse_This, string& Array[256] ){
for ( int iter=0; iter <= Parse_This_Count; iter++ ){
Array[i]=Parse_This.substr( 0,Parse_This.find(",",0) );
Parse_This.substr( Parse_This.find(".",0), Parse_This.length() );
}
}

int main(){
string my_string="1.0,1.0,2.0,123.34";
string my_string_array[256];
int my_string_array_count=3;
ParseToArray( my_string_array_count, my_string, my_string_array )

return 0;
}



does this help?

however, your "without using any existing libraries" would probably mean, you can't use the STL meaning you can't use vectors, and the arrays are limited unless you want to make your own vector class.
well maybe that info up there could have helped.
bye.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this