• Advertisement
Sign in to follow this  

Im Not So Good At Writing Parsers

This topic is 4364 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've never been really good at this so im all hears as to how i can parse the following data solidly. From a file a string is read that can be any length containing any number of floats: "1.0,1.0,2.0,123.34" What is a solid way of parsing this without using any existing libraries? Dave

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
use the strtok function to tokenize the "," character and save the results into a vector of some sort.

http://www.cplusplus.com/ref/cstring/strtok.html

Share this post


Link to post
Share on other sites
You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.

Share this post


Link to post
Share on other sites
I've never been fond of strtok, isn't that C anyhoo, i'd like to write the algorithm myself.

Dave

Share this post


Link to post
Share on other sites
If you'd prefer std::strings you can use getline().


string myData
while (!(getline(myFloatFile, myData, ',')).eof()) // get data until a comma is encountered
{
// convert std::string myData to a float with a stringstream or what have you, push back
}

Share this post


Link to post
Share on other sites
From the strtok-man page: Never use these functions.
The idea is good though. If you're using C++, you might find the following piece of code useful:

std::vector<std::string> StringUtils::tokenize(std::string const &str, std::string const &delims)
{
std::vector<std::string> tokens;
size_t pos, pos2;
pos = str.find_first_not_of(delims, 0);

while (pos != std::string::npos)
{
pos2 = str.find_first_of(delims, pos);
if (pos2 == std::string::npos)
pos2 = str.length();

tokens.push_back(str.substr(pos, pos2-pos));
pos = str.find_first_not_of(delims, pos2);
}
return tokens;
}

It returns a vector of strings, each containing a number.
So, your code will be something like this:
  vector<string> tokens = tokenize(myStr, ",");
for( vector<string>::iterator i = tokens.begin(); i != tokens.end(); ++i )
doSomethingWithNumber( atof(i->c_str()) );

Share this post


Link to post
Share on other sites
So where does strtok stand with us these days. I was under the impression that it was old hat now?

Dave

Share this post


Link to post
Share on other sites
Ok thanks guys, it seems that the simplest solution is the getline method, thanks also for your contribution DaBono.

Dave

Share this post


Link to post
Share on other sites
Quote:
Original post by Dave
So where does strtok stand with us these days.


non-reentrant, not thread safe

Share this post


Link to post
Share on other sites
There's strtok_r() if you want to be re-entrant.

However, when scanning a list of values, I prefer to just do it manually. Using C style libraries:


char * str = "1,2,3.14";
char * end;

while( str && *str ) {
end = 0;
double d = strtod( str, &end );
if( !end || end == str ) {
break;
}
handle_double( d ); // put in an array or whatever
str = strcspn( end, "-+0123456789." ); // wind forward to next number
}


If you want to be a little more specific about what characters you accept for delimiters, you can instead use 'str = strspn( end, ", " )' (for when you only want to accept commas and spaces).

You can formulate this same loop using std::string::find_first_of().

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.


Assuming Python, you could just call eval() on the string and it will give you a tuple of the values.

Share this post


Link to post
Share on other sites
Quote:
Original post by Flarelocke
Quote:
Original post by SiCrane
You'll need to use *some* library, unless you plan on writing this in assembly and using raw system calls to perform file I/O. Assuming Python, you could just read it in and use split on the read string.


Assuming Python, you could just call eval() on the string and it will give you a tuple of the values.

And risk someone having embedded some python code in the input to your program...

John B

Share this post


Link to post
Share on other sites

void ParseToArray( int Parse_This_Count, string Parse_This, string& Array[256] ){
for ( int iter=0; iter <= Parse_This_Count; iter++ ){
Array=Parse_This.substr( 0,Parse_This.find(",",0) );
Parse_This.substr( Parse_This.find(".",0), Parse_This.length() );
}
}

int main(){
string my_string="1.0,1.0,2.0,123.34";
string my_string_array[256];
int my_string_array_count=3;
ParseToArray( my_string_array_count, my_string, my_string_array )

return 0;
}



does this help?

however, your "without using any existing libraries" would probably mean, you can't use the STL meaning you can't use vectors, and the arrays are limited unless you want to make your own vector class.
well maybe that info up there could have helped.
bye.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement