Help needed for String Tokenizer

Started by
7 comments, last by snk_kid 19 years, 9 months ago
Hi, when I use the following code my program does an illegal error can someone review my code and point out my mistake please.

void StringTokenizer::Tokenize(string String, string Delimiter)
{
    string temp;
    char *buffer = new char[String.length()];

    strcpy(buffer, String.c_str());
    temp = strtok(buffer, Delimiter.c_str());

    if(!temp.empty())
    {
        Tokens.push_back(temp);

        while(true)
        {
            temp = strtok(NULL, Delimiter.c_str());

            if(!temp.empty())
            {
                Tokens.push_back(temp);
            } else
            {
                break;
            }
        } 
    }
    delete buffer;
}
//My SignatureString sig = "Rangler";System.out.println(sig);
Advertisement
This is where i'd look first-

temp = strtok(NULL, Delimiter.c_str());

if strtok doesn't find the delimiter, it returns NULL. I dunno how the String class handles null assignments so that would be my first guess.

You'd best be off using the debugger
I tried changing the temp to a char* and testing if it was NULL but still causes illegal error, and for the record I have no clue how to use the debugger.
//My SignatureString sig = "Rangler";System.out.println(sig);
Ahh, I'm assuming you're using an IDE like MSVC or Borland. You should read its help section (if it has one). Using the debugger is a MUST in C++ programming. eg- the debugger would be able to tell you which line your program is breaking on.
This line:
char *buffer = new char[String.length()];

should be:
char *buffer = new char[String.length() + 1];

strcpy copies the string but you must remember to leave room for the zero terminator at the end of the string - which is not included in the length of the string.

That's all I've got time to check. Sorry. Hope this helps.
That last suggestion had no effect. Thank you for trying.
//My SignatureString sig = "Rangler";System.out.println(sig);
I do not see where ¡¥temp¡¦ is being filled with any data what so ever. So if you were to make a reference from that data, then you would get an error.


Also, you are not checking for a NULL value. Read from MSDN „³


Return Value

Returns a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered.


Remarks

The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call. wcstok and _mbstok are wide-character and multibyte-character versions of strtok. The arguments and return value of wcstok are wide-character strings; those of _mbstok are multibyte-character strings. These three functions behave identically otherwise.

Here is there example code
/* STRTOK.C: In this program, a loop uses strtok * to print all the tokens (separated by commas * or blanks) in the string named "string". */#include <string.h>#include <stdio.h>char string[] = "A string\tof ,,tokens\nand some  more tokens";char seps[]   = " ,\t\n";char *token;void main( void ){   printf( "%s\n\nTokens:\n", string );   /* Establish string and get the first token: */   token = strtok( string, seps );   while( token != NULL )   {      /* While there are tokens in "string" */      printf( " %s\n", token );      /* Get next token: */      token = strtok( NULL, seps );   }}


Output

A string of ,,tokens

and some more tokens



Tokens:

A

string

of

tokens

and

some

more

tokens



I¡¦ve been trying to write my own lexical analyzer and parse tree for a while; just to let you know :)
Take back the internet with the most awsome browser around, FireFox
got it to work with

void StringTokenizer::Tokenize(LPSTR String, char *Delimiter){	char	*temp;	temp = strtok(String, Delimiter);	while(temp != NULL)	{		Tokens.push_back(temp);		temp = strtok(NULL, Delimiter);	}	delete temp;}


thanks to everyone who helped!
//My SignatureString sig = "Rangler";System.out.println(sig);
Hi if all you wont to do is tokenize strings then you can use this:

stringtok.hpp
#include <string>template< typename Container >voidstringtok(Container& container, const std::string& in,           const char * const delimiters = " \t\n") {    const std::string::size_type len = in.length();          std::string::size_type i = 0;    while(i < len) {        // eat leading whitespace        i = in.find_first_not_of(delimiters, i);        if (i == std::string::npos)            return;   // nothing left but white space        // find the end of the token        std::string::size_type j = in.find_first_of (delimiters, i);        // push token        if(j == std::string::npos) {            container.push_back(in.substr(i));            return;        } else            container.push_back(in.substr(i, j-i));        // set up for next loop        i = j + 1;    }}


test.cpp
#include "stringtok.hpp"#include <list>#include <iostream>int main() {   std::list<std::string> _tokens;   std::string sentance;   std::cout << "Enter a sentance:\n";   std::getline(std::cin, sentance);   stringtok(_tokens, sentance);   for(std::list<std::string>::const_iterator itr = _tokens.begin(),       end = _tokens.end();       itr != end;       ++itr) {         std::cout << *itr << '\n';   }   return 0;}


But if your looking to make some kind of lexer/scanner then i suggest having a constructor that takes a reference to a stream, retrieve a stream buffer from it. Then have some function say get_token that reads in a character one at time to build up a token with the stream buffer & use the stream to get values, return say an enum type that represents the type of token that was read in. Have another function that allows you to retieve the current value.

This topic is closed to new replies.

Advertisement