Problems with parsing

Started by
26 comments, last by webwraith 17 years, 7 months ago
Hi, I was wondering how best to parse text files in C++. I'm doing this for my game engine, so the scripts will basically sit "on top" of that. I wanted to use a set of functions in the script that directly call certain functions in the engine, but give the script-writer the ability to write their own functions, along with any variables they may need. I can handle the variables fine, but what I want to know is how to get my parser to handle;

if(((2*AVariable)/74)>20)
and treat it the same way as

if ( ( (2* AVariable)/ 74)  >20)
Advertisement
Do you mean, how to evaluate that expression appropriately? (As opposed to just knowing where to insert whitespace!) You may want to look up Backus-Naur form, and recursive descent parsers, or parser generators (Lex and Yacc are the traditional starting points, though far from being the easiest to work with). Alternatively, consider using an existing scripting language for this; they've done most of the hard work for you.
I would recommend using a scripting language that someone else already created for you. It's going to make you get to your goal much, much faster.

Unless your goal is to learn how to write a script language runtime and parser, of course :-) Typically, you parse the file by reading one token at a time, and making the appropriate decision based on the current context/state. You will build a tree representation of the entire program, and then execute that tree in some form to execute your script.

Generally, the "getToken()" function will skip leading whitespace before it isolates and returns the next token in the input data. A token is something like "if" or "(" or "/" or "74".
enum Bool { True, False, FileNotFound };
What I'm getting at is I don't know how to get the parser to recognise two tokens that AREN'T separated by whitespace.

And, yeah, this is a bit of a learning thing for me[lol]
you could have a look at boost::spirit.
here
or here
Quote:Original post by webwraith
What I'm getting at is I don't know how to get the parser to recognise two tokens that AREN'T separated by whitespace.


Put very simply, you keep reading the token until you find something that isn't part of the token, then you put that bit back. If you have a choice, you try each potential token until you find one that matches.

So, one char at a time, then... OK, thanks, I'll try it and come back with the results
OK, here is the start of my scanner class, but I just keep getting back that it doesn't recognise the token. maybe I'm blind, but I can't see what's wrong with the code, does someone mind looking over the listing for me?

Scanner:
enum TOKEN_TYPE{NUM,IDENT,COND,STRING};struct TokenStruct{    char delimiter;    std::string alpha;    std::string num;    std::string other;};class Scanner{    protected:        char m_delimiter;//defines the end of a line        std::string m_alpha;//letters in both Upper and Lower case        std::string m_num;//every numeric digit        std::string m_other;//everything else        TOKEN_TYPE token_type;//stores the type of token    public:        Scanner():m_delimiter(';'),m_alpha("_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"),m_num("0123456789"),m_other("#\"=+-*/.:!<>,'{}[]()"){}        Scanner(TokenStruct tokstr):m_delimiter(tokstr.delimiter),m_alpha(tokstr.alpha),m_num(tokstr.num),m_other(tokstr.other){}        ~Scanner(){}        //Note: the file pointer given to GetToken() must already be open.        std::string GetToken(std::ifstream& fin)        {            std::string token="";            char tempchar;            bool decimalpoint=false,whtspc=true;//decimalpoint is used to check if the token already has a decimal point. if it has, then the token is returned without the second decimal            //test if the file is actually open            if(!(fin.is_open()))                return std::string("file not open");            //get rid of whitespace            //do{                tempchar = fin.get();            //}while(isspace(tempchar)==0);            if(strcmp(&tempchar,""))                tempchar = fin.get();            //set token type            if(m_alpha.find(tempchar,0)!=std::string::npos)            {                token_type=IDENT;                token.append((const char*)tempchar);            }            else if(m_num.find(tempchar,0)!=std::string::npos)            {                token_type=NUM;            }            else if(m_other.find(tempchar,0)!=std::string::npos)            {                if(!strcmp((const char*)tempchar,"\""))                    token_type=STRING;                else                    token_type=COND;            }            else return std::string("token type not recognised");            return token;            //time to start reading in the token        }        std::vector<std::string> TokenizeLine(std::ifstream& fin)        {            std::vector<std::string> ret_vec;            std::string temp;            size_t line_size;            while(!(temp[0]==m_delimiter))            {                temp=GetToken(fin);            }            return ret_vec;        }};


main.cpp:
#include "tokeniser.h"#include <fstream>using namespace std;int main(){    Scanner tokens;    ifstream fin;    fin.open("test.txt");    std::string myString;    std::vector<std::string> myVector;    myString=tokens.GetToken(fin);    //myVector=tokens.TokenizeLine(fin);    std::cout << myString << std::endl;    fin.close();}


test.txt:
a; a;	a;a;abc;ab_;ab_c;a1;a1_;a_1;a,b;_a;_a_b;_1;2;234;2.3;2.34;23.4;23.45;23.;"The (quick) br0wn f0x j&umped over_the_lazy_dog";(;::;==;<=;>=;!=;++;--;=!;<";>%;:[;+-;-+;#hell, we made it!


the
 return token


at the end of the Scanner::GetToken() method is just there until I know that this section is working

[Edited by - webwraith on July 4, 2006 3:26:34 PM]
I'm pretty sure that strcpy is not what you intended there. You shouldn't be using strcpy on std::strings anyway.
That might be it, I'll change that now [embarrass]. That was supposed to be a strcmp(), because tempchar is of type char, and the only times I'm using it is when I'm testing against other chars.


Nope, just changed it, and still got the bad token response

[Edited by - webwraith on July 4, 2006 9:55:02 PM]

This topic is closed to new replies.

Advertisement