Sign in to follow this  
rangler

Help needed for String Tokenizer

Recommended Posts

Hi, when I use the following code my program does an illegal error can someone review my code and point out my mistake please.
void StringTokenizer::Tokenize(string String, string Delimiter)
{
    string temp;
    char *buffer = new char[String.length()];

    strcpy(buffer, String.c_str());
    temp = strtok(buffer, Delimiter.c_str());

    if(!temp.empty())
    {
        Tokens.push_back(temp);

        while(true)
        {
            temp = strtok(NULL, Delimiter.c_str());

            if(!temp.empty())
            {
                Tokens.push_back(temp);
            } else
            {
                break;
            }
        } 
    }
    delete buffer;
}

Share this post


Link to post
Share on other sites
This is where i'd look first-

temp = strtok(NULL, Delimiter.c_str());

if strtok doesn't find the delimiter, it returns NULL. I dunno how the String class handles null assignments so that would be my first guess.

You'd best be off using the debugger

Share this post


Link to post
Share on other sites
Ahh, I'm assuming you're using an IDE like MSVC or Borland. You should read its help section (if it has one). Using the debugger is a MUST in C++ programming. eg- the debugger would be able to tell you which line your program is breaking on.

Share this post


Link to post
Share on other sites
This line:
char *buffer = new char[String.length()];

should be:
char *buffer = new char[String.length() + 1];

strcpy copies the string but you must remember to leave room for the zero terminator at the end of the string - which is not included in the length of the string.

That's all I've got time to check. Sorry. Hope this helps.

Share this post


Link to post
Share on other sites
I do not see where ¡¥temp¡¦ is being filled with any data what so ever. So if you were to make a reference from that data, then you would get an error.


Also, you are not checking for a NULL value. Read from MSDN „³


Return Value

Returns a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered.


Remarks

The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call. wcstok and _mbstok are wide-character and multibyte-character versions of strtok. The arguments and return value of wcstok are wide-character strings; those of _mbstok are multibyte-character strings. These three functions behave identically otherwise.

Here is there example code

/* STRTOK.C: In this program, a loop uses strtok
* to print all the tokens (separated by commas
* or blanks) in the string named "string".
*/


#include <string.h>
#include <stdio.h>

char string[] = "A string\tof ,,tokens\nand some more tokens";
char seps[] = " ,\t\n";
char *token;

void main( void )
{
printf( "%s\n\nTokens:\n", string );
/* Establish string and get the first token: */
token = strtok( string, seps );
while( token != NULL )
{
/* While there are tokens in "string" */
printf( " %s\n", token );
/* Get next token: */
token = strtok( NULL, seps );
}
}


Output

A string of ,,tokens

and some more tokens



Tokens:

A

string

of

tokens

and

some

more

tokens



I¡¦ve been trying to write my own lexical analyzer and parse tree for a while; just to let you know :)

Share this post


Link to post
Share on other sites
got it to work with


void StringTokenizer::Tokenize(LPSTR String, char *Delimiter)
{
char *temp;

temp = strtok(String, Delimiter);

while(temp != NULL)
{
Tokens.push_back(temp);
temp = strtok(NULL, Delimiter);
}
delete temp;
}


thanks to everyone who helped!

Share this post


Link to post
Share on other sites
Hi if all you wont to do is tokenize strings then you can use this:

stringtok.hpp

#include <string>

template< typename Container >
void
stringtok(Container& container, const std::string& in,
const char * const delimiters = " \t\n") {
const std::string::size_type len = in.length();
std::string::size_type i = 0;

while(i < len) {
// eat leading whitespace
i = in.find_first_not_of(delimiters, i);
if (i == std::string::npos)
return; // nothing left but white space

// find the end of the token
std::string::size_type j = in.find_first_of (delimiters, i);

// push token
if(j == std::string::npos) {
container.push_back(in.substr(i));
return;
} else
container.push_back(in.substr(i, j-i));

// set up for next loop
i = j + 1;
}
}


test.cpp

#include "stringtok.hpp"
#include <list>
#include <iostream>

int main() {
std::list<std::string> _tokens;
std::string sentance;

std::cout << "Enter a sentance:\n";

std::getline(std::cin, sentance);

stringtok(_tokens, sentance);

for(std::list<std::string>::const_iterator itr = _tokens.begin(),
end = _tokens.end();
itr != end;
++itr) {
std::cout << *itr << '\n';
}

return 0;
}


But if your looking to make some kind of lexer/scanner then i suggest having a constructor that takes a reference to a stream, retrieve a stream buffer from it. Then have some function say get_token that reads in a character one at time to build up a token with the stream buffer & use the stream to get values, return say an enum type that represents the type of token that was read in. Have another function that allows you to retieve the current value.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this