Compiler Design - Lexical & syntactical analysis - General and Gameplay Programming

315

June 25, 2007 09:30 PM

a scanner is the first part of the compiler. It turns a character stream into a token stream. Most programming languages have pretty simple token syntax and so they can be described with a regular expression. Regular expressions are equivilent to finite state machines. Study up on finite state machines. I recommend using goto to implement your fsm. It will be far more elegant than an object oriented solution. Just make sure to make a new scope under each label, example:

state1:
{

}

this keeps things from getting ugly.

Also, I found compiler construction tools to be worse than useless when designing a new language.

ToohrVyk

1,596

June 26, 2007 01:09 AM

Quote:Original post by Glak
Regular expressions are equivilent to finite state machines. Study up on finite state machines. I recommend using goto to implement your fsm. It will be far more elegant than an object oriented solution.

If you have access to a smart language (one with tail-recursion), you can and should use mutually recursive functions, which are safer to manipulate and allow the passing of arguments. This, as well as goto and switches, however, are slow code-based solutions which will get ugly as soon as your fsm reaches a few thousand nodes (as it does in almost any non-trivial programming language) and quite a bit slower than an automaton compiled as data with a tight and small code-bit working on it.

Quote:Also, I found compiler construction tools to be worse than useless when designing a new language.

I found lexer and parser generation tools to be insanely useful when designing a language of reasonable complexity—they have the inherent advantage of letting you specify what you want done without having to explain how you want it done, and they usually do it in the most efficient way possible. The lexer side of things is not that indispensable (you can get it done quite easily with regular expressions, it's true) but the parser side involves a lot of work to get it working correctly and efficiently without sacrificing functionality.

Blog — Facebook

Glak

315

June 26, 2007 12:16 PM

"which will get ugly as soon as your fsm reaches a few thousand nodes (as it does in almost any non-trivial programming language)"

Are you talking about tokenizing? You need like 15 states max. Oh wait, operators and keywords. What I do is treat keywords like identifiers so that problem is solved. I am guessing that nearly all compilers do it that way too. Then for operators what I do it allow them to build up as one big token until a space or non operator symbol intervenes. Then I take the resulting string and greedily consume it from the start. I have a vector of strings representing my operators sorted by length. Then I check to see if the first element of that vector is a prefix of the string. If so I make it a token, remove it, and repeat. If not I go down the vector until a match is found.

"found lexer and parser generation tools to be insanely useful when designing a language of reasonable complexity—they have the inherent advantage of letting you specify what you want done without having to explain how you want it done, and they usually do it in the most efficient way possible."

Well maybe for experienced compiler writers. However in my case I know that I could not have succeeded with a parser generator like yacc or bison. I also looked into the boost one whatever it is called. However handwriting a parser was not that difficult. It was actually much shorter than the scanner.

ToohrVyk

1,596

June 26, 2007 12:48 PM

Quote:Original post by Glak
Are you talking about tokenizing? You need like 15 states max. Oh wait, operators and keywords.

Recognizing integers with an FSM is at least 6 states. Recognizing floating-point numbers with an FSM is at least 15 states. Strings and comments add a dozen states as well. It's true that keywords are often moved out of the lexer automaton, but it's not really the case for operators.

Quote:Well maybe for experienced compiler writers.

I may sound elitist, but inexperienced compiler writers seldom "develop new languages" as much as they "develop a toy language based on an existing one". The question of whether a tool is adapted to compiler development should be first and foremost applied to people who actually develop compilers as a living—one does not provide single-developer teams as a reason why CVS (or alternatives) is utterly useless in software development.

Quote:However in my case I know that I could not have succeeded with a parser generator like yacc or bison.

I find it strange that you managed to specify both the how and the what of your parser, yet couldn't specify the what alone. I usually find parser generators to be extremely simple to manipulate, much simpler than writing things by hand.

Quote:However handwriting a parser was not that difficult. It was actually much shorter than the scanner.

It casually takes me a few hundred lines explaining what the grammar of the language is—and I have trouble imagining a more concise description than that of Menhir. My only guess is that the grammar of your language was extremely short and simple, which is why the parser was as well. For instance, this is a small portion of the Caml Language syntax (there are several other bits quite similar to this one in size and complexity), and can be expressed with Menhir in just as many lines of code as there are grammar possibilities. Unless you're using a very innovative trick, a hand-written parser for a grammar like this one is bound to be much longer.

Blog — Facebook

DevFred

840

June 26, 2007 04:29 PM

Quote:Original post by Glak
handwriting a parser was not that difficult. It was actually much shorter than the scanner.

Very unlikely. Can we see a a language description and some sourcecode?

nmi

978

June 26, 2007 04:55 PM

Quote:Original post by Running_Wild
Ok I've done this for 2 years and this is my personal recommendation. The best *overall* book (and yes I know... Blasphemy!) is *not* the Dragon book. The Dragon book is very unfriendly to someone implementing a compiler and is very VERY theory driven. The best book IMHO is "Compiler Design in C" by Allen Holub. The source can be downloaded for free from here : http://www.holub.com/software/compiler.design.in.c.html

What about error handling during compilation, like a missing semicolon or something more complex. The dragon book contains only a few pages on that topic. However it seems important to let the compiler continue to parse over the source code if an error is found, so that as many errors as possible are found.

Are there any good books concerning this topic ?

Glak

315

June 26, 2007 06:16 PM

ok, I'll give you guys some info.

My language has simple syntax. It has a bunch of binary operators and some postfix unary operators. There are three outfix operators () [] {} that I call grouping operators in my code. There is also an implied adjacency operator. So the code A B means A adj B. The semicolon is the sequence operator and the comma is the tuple operator. This means that there are no statements, the entire program is always a single expression. So for example f x and f(x) are equivilent statements. () only does precedence. Now f x,y and f(x,y) are completely different. The first is a tuple, the first element of which is a function call. The second expression is a function call with a tuple as an argument. Well that is if f is a function. If f is a type then the expression f x is a variable declaration. If f is the class keyword then x had better be a class expression and the overall expression is a class definition. If f is a container type x had better be an index. It all just depends. Operators were chosen to eliminate ambiguity. That is one reason why there are no prefix operators and why < > is not a outfix operator. Ok now to reply to some quotes:

"I may sound elitist, but inexperienced compiler writers seldom "develop new languages" as much as they "develop a toy language based on an existing one". The question of whether a tool is adapted to compiler development should be first and foremost applied to people who actually develop compilers as a living—one does not provide single-developer teams as a reason why CVS (or alternatives) is utterly useless in software development."

No my language is pretty new. All of the features exist in other languages but the combination is pretty unique and I really like it.

"I find it strange that you managed to specify both the how and the what of your parser, yet couldn't specify the what alone. I usually find parser generators to be extremely simple to manipulate, much simpler than writing things by hand."

Now I would probably have no trouble using a parser generator for my language, but at the time I wasn't really sure how my language would look. Sure I had an idea, but not a formal understanding. Some stuff changed as I figured out that certain things were nearly unparseable. Also a language that is easy to write a parser for is easy to learn. So I started pretty simple and worked from there. I kept printing out the parse tree and reviewing it. Now I am using graphviz to help visualize things.

I decided to take a year off from working on it due mainly to work issues and also I wasn't sure how I was going to handle some stuff. I am not a professional programmer nor have I ever been and I have never completed a program (including this one) so my code will have a rough look. I also don't have much error checking in the the compiler (I will add it later) however if given correct input a correct parse tree is generated.

I have a lot of files, if you want them all I will post them but for now I will just give you the tokenizer and parser

first the tokenizer (btw I am using the word rune to refer to operators)

#include "../Core/global.h"#include "../CharacterStream/CharacterStream.h"#include "../Core/Runes and Articles.h"bool is_prefix_of(const string& substr, const string& str){	if( substr.size() > str.size() )		return false;	for(int i=0; i<substr.size(); ++i)	{		if( substr != str )			return false;	}	return true;}string remove_prefix_from(const string& substr, const string& str){	string s(str.begin()+substr.size(),str.end());	if(substr+s != str)		cout<<"error in remove_prefix_from"<<endl;	return s;}vector<Token> AllTokens;set<char> alpha; //$ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzset<char> numeric; //1234567890set<char> whitespace; //\t \nset<char> special; //\"\\\'set<char> alphanumeric;set<char> non_rune;Token current("","");void InitializeTokenizer(bool testing){	char a[] = "$ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz";	alpha.insert(a,a+sizeof(a)-1);	char n[] = "1234567890";	numeric.insert(n,n+sizeof(n)-1);	char w[] = "\t \n";	whitespace.insert(w,w+sizeof(w)-1);	char s[] = "\"\\\'";	special.insert(s,s+sizeof(s)-1);	alphanumeric.insert(alpha.begin(),alpha.end());	alphanumeric.insert(numeric.begin(),numeric.end());	non_rune.insert(alphanumeric.begin(),alphanumeric.end());	non_rune.insert(whitespace.begin(),whitespace.end());	non_rune.insert(special.begin(),special.end());	if(testing)	{		set<char>::iterator iter;		cout<<"alpha: ";		for(iter=alpha.begin(); iter!=alpha.end(); ++iter)			cout<<(*iter);		cout<<endl;		cout<<"numeric: ";		for(iter=numeric.begin(); iter!=numeric.end(); ++iter)			cout<<(*iter);		cout<<endl;		cout<<"whitespace: ";		for(iter=whitespace.begin(); iter!=whitespace.end(); ++iter)			cout<<(*iter);		cout<<endl;		cout<<"special: ";		for(iter=special.begin(); iter!=special.end(); ++iter)			cout<<(*iter);		cout<<endl;		cout<<"alphanumeric: ";		for(iter=alphanumeric.begin(); iter!=alphanumeric.end(); ++iter)			cout<<(*iter);		cout<<endl;		cout<<"non_rune: ";		for(iter=non_rune.begin(); iter!=non_rune.end(); ++iter)			cout<<(*iter);		cout<<endl;	}}void Tokenize(CharacterStream& stream, bool testing){	InitializeTokenizer(false);//----Ready-State-------------------------------------------------	READY_STATE:	{		//check for tokenization complete		if(stream.empty())		{			return;		}		char c = stream.top();		current.type="";		current.value="";		if(c=='\"')		{			current.type="string";			stream.pop();			goto STRING;		}		if(c=='\'')		{			current.type="char";			stream.pop();			goto CHAR;		}		if(numeric.count(c))		{			current.type="int";			goto NUMBER;		}		if(alpha.count(c))		{			current.type="identifier";			goto IDENTIFIER;		}		if(whitespace.count(c))		{			stream.pop();			goto READY_STATE;		}		current.type="rune"; //not really relevant		goto RUNE;	}//----Rune-----------------------------------------------------------	RUNE:	{		char c = stream.top();		if(stream.empty() || non_rune.count(c))		{			//break up the runes here and push them onto all tokens			//also where all errors are signaled			while(current.value.size())			{				set<Token>::iterator iter;				set<Token>::iterator end=AllRunes.end();				for(iter=AllRunes.begin(); iter!=end; ++iter)				{					if(is_prefix_of((*iter).value,current.value))					{						AllTokens.push_back(*iter);						if(testing)							cout<<*iter<<endl;						current.value=remove_prefix_from((*iter).value,current.value);						goto continue_;					}				}				if(iter==end) //match not found!				{					cout<<"illegal rune("<<current.value<<") size: "<<current.value.size()<<endl;					//cout<<unsigned int(s[0])<<endl;					return;					//cout<<'('<<s<<')'<<endl;				}				continue_:;			}			goto READY_STATE;		}		current.value.push_back(c);		stream.pop();		goto RUNE;	}//----Identifier-----------------------------------------------------------	IDENTIFIER:	{		char c = stream.top();		if(alphanumeric.count(c))//found		{			current.value.push_back(c);			stream.pop();			goto IDENTIFIER;		}		AllTokens.push_back(current);		if(testing)			cout<<current<<endl;		goto READY_STATE;	}//----String----------------------------------------------------------	STRING:	{		if(stream.empty())		{			cout<<"error, EOF in string literal";		}		char c = stream.top();		stream.pop();		if(c=='\\')		{			goto STRING_ESCAPE;		}		if(c=='\"')		{			AllTokens.push_back(current);			if(testing)				cout<<current<<endl;			goto READY_STATE;		}		current.value.push_back(c);		goto STRING;		STRING_ESCAPE:		{			//put special cases here			if(stream.empty())			{				cout<<"error, EOF in string literal";			}			char c = stream.top();			stream.pop();			if(c=='\\')			{				current.value.push_back('\\');				goto STRING;			}			if(c=='\"')			{				current.value.push_back('\"');				goto STRING;			}			if(c=='\'')			{				current.value.push_back('\"');				goto STRING;			}			if(c=='t')			{				current.value.push_back('\t');				goto STRING;			}			if(c=='n')			{				current.value.push_back('\n');				goto STRING;			}			cout<<"illegal escape sequence in string literal"<<endl;			return;		}	}//----Number------------------------------------------------------------	NUMBER:	{		char c = stream.top();		if(numeric.count(c))		{			current.value.push_back(c);			stream.pop();			goto NUMBER;		}		if(c=='.')		{			current.value.push_back(c);			stream.pop();			current.type="float";			goto FLOAT;		}		AllTokens.push_back(current);		if(testing)			cout<<current<<endl;		goto READY_STATE;		FLOAT:		{			char c = stream.top();			if(!stream.empty() && numeric.count(c))			{				current.value.push_back(c);				stream.pop();				goto FLOAT;			}			if(current.value[current.value.size()-1]=='.')			{				cout<<"error: unfinished floating point literal"<<endl;			}			AllTokens.push_back(current);			if(testing)				cout<<current<<endl;			goto READY_STATE;		}	}//----Character------------------------------------------------------	CHAR:	{		if(stream.empty())		{			cout<<"error, EOF in char literal";		}		char c = stream.top();		stream.pop();		if(c=='\\')		{			goto CHAR_ESCAPE;		}		current.value.push_back(c);		goto CHAR_END;		CHAR_ESCAPE:		{			//this was left out, I just put it in			if(stream.empty())			{				cout<<"error, EOF in char literal";			}			char c = stream.top();			stream.pop();			if(c=='\\')			{				current.value.push_back('\\');				goto CHAR_END;			}			if(c=='\"')			{				current.value.push_back('\"');				goto CHAR_END;			}			if(c=='\'')			{				current.value.push_back('\"');				goto CHAR_END;			}			if(c=='t')			{				current.value.push_back('\t');				goto CHAR_END;			}			if(c=='n')			{				current.value.push_back('\n');				goto CHAR_END;			}			cout<<"invalid escape character in char literal"<<endl;		}		CHAR_END:		{			if(stream.empty())			{				cout<<"error, EOF in char literal";			}			char c = stream.top();			stream.pop();			if(c=='\'')			{				AllTokens.push_back(current);				if(testing)					cout<<current<<endl;				goto READY_STATE;			}			else			{				//signal error			}		}	}}

ok here is parser.cpp, the main parsing file. The function "show_top" is a display function.

#include "../Core/global.h"#include "../Core/Runes and Articles.h"void FixOperators(	Iterator begin,	Iterator end,	Token* operators,	int size,	string fixation="infix",	string associativity="left");void Group(Iterator begin, Iterator end);void HandleAdjacent(Iterator begin, Iterator end, Token* operators, int size);Iterator next(Iterator iter){	return ++iter;}void show_top(string text, Iterator begin, Iterator end){	cout<<text<<":\n";	++begin;	for(Iterator iter=begin; iter!=end; ++iter)	{		cout<<(*iter)->token.value<<' ';	}	cout<<endl;}void Parse(Iterator begin, Iterator end){	Token infix[] = {TIMES,MOD,DIV,PLUS,MINUS, //5			SHIFT_LEFT,SHIFT_RIGHT,LESS_THAN,LESS_THAN_EQUAL,//4			GREATER_THAN,GREATER_THAN_EQUAL,IS_EQUAL_TO,//3			NOT_EQUAL_TO,BITWISE_AND,BITWISE_OR,AND,OR,//5			MAKE_EQUAL,PLUS_EQUALS,COMMA,SEMICOLON,//4			INTERSECTION,UNION					};//2    if(next(begin)==end)    {		cout<<"nothing to parse here"<<endl;		return;    }	show_top("unparsed",begin,end);    Group(next(begin),end); //also parses each group	show_top("grouped",begin,end);	Token function_operator[] = {ARROW};    FixOperators(next(begin),end,function_operator,1);    show_top("function",begin,end);    Token postfix[] = {INCREMENT,DECREMENT,TO_REFERENCE,TO_RELATION,TO_OPTION};    FixOperators(next(begin),end,postfix,5,"postfix");	show_top("postfix",begin,end);	HandleAdjacent(next(begin),end,infix,23);	show_top("adjacentized",begin,end);//----standard-binary-operators----------------------------------------    Token scaling[] = {TIMES,MOD,DIV};    FixOperators(next(begin),end,scaling,3);    Token displacing[] = {PLUS,MINUS};	FixOperators(next(begin),end,displacing,2);	Token setwise[] = {INTERSECTION,UNION};	FixOperators(next(begin),end,setwise,2);    Token shifting[] = {SHIFT_LEFT,SHIFT_RIGHT};    FixOperators(next(begin),end,shifting,2);    Token comparison[] = {LESS_THAN,LESS_THAN_EQUAL,    						GREATER_THAN,GREATER_THAN_EQUAL};	FixOperators(next(begin),end,comparison,4);    Token equality[] = {IS_EQUAL_TO,NOT_EQUAL_TO};	FixOperators(next(begin),end,equality,2);    Token bitwise_and[] = {BITWISE_AND};	FixOperators(next(begin),end,bitwise_and,1);    Token bitwise_or[] = {BITWISE_OR};	FixOperators(next(begin),end,bitwise_or,1);    Token logical_and[] = {AND};	FixOperators(next(begin),end,logical_and,1);    Token logical_or[] = {OR};	FixOperators(next(begin),end,logical_or,1);	show_top("infix",begin,end);//----slightly-special-binary-operators------------------------    Token assignment[] = {MAKE_EQUAL,PLUS_EQUALS};	FixOperators(next(begin),end,assignment,2,"infix","right");	show_top("assignment",begin,end);	Token tupling[] = {COMMA};	FixOperators(next(begin),end,tupling,1);	show_top("tupling",begin,end);    Token sequencing[] = {SEMICOLON};	FixOperators(next(begin),end,sequencing,1);	show_top("sequencing",begin,end);}

adjacent.cpp

#include "../Core/global.h"#include "../Core/Runes and Articles.h"#include <algorithm>using namespace std;void HandleAdjacent(Iterator begin, Iterator end, Token* operators, int size){	Iterator iter=begin;	Iterator next=++begin;	while(next!=end)	{		Token& r = (*next)->token;		Token& l = (*iter)->token;		if(l==PERIOD &&r.type=="identifier")		{			(*iter)->right=(*next);			next=AllExpressions.erase(next);			++iter;			++next;			continue;		}		Expression& right = *(*next);		Expression& left = *(*iter);		bool infix=false;		if(left.match(operators,size))			infix=true;		if(right.match(operators,size))			infix=true;		if(!infix)		{			Expression* p = new Expression(ADJACENCY);//adj			p->left=(*iter);			p->right=(*next);			(*iter)=p;			next=AllExpressions.erase(next);			continue;		}		++iter;		++next;	}}//this problem should not come up because it doesn't repeat: t . r y

group.cpp, handles all outfix operators

#include "../Core/global.h"#include "../Core/Runes and Articles.h"void Parse(Iterator begin, Iterator end);//returns true if errorbool ProcessGroup(	Iterator& iter,	Iterator& end,	Iterator& start,	Token OPEN,	Token CLOSE,	Token RESULT){	int nesting=0;	while(true)	{		if(iter==end)		{			cout<<"Grouping:GROUPING: iter==end, unmatched ("<<endl;			return true;		}		Token& t( (*iter)->token );		if(t==OPEN)		{			++nesting;			++iter;			continue;		}		if(t==CLOSE)		{			if(nesting)			{				--nesting;				++iter;				continue;			}			else			{				Iterator sub_begin=start;				++sub_begin; //points to first element, or to ) if empty				Iterator sub_end=iter;//subend points to )				//Parse(sub_begin,sub_end,parent,"group");				Parse(start,iter);				//ok now sub_begin invalid				//sub_end still points to the valid )				//start still points to (				Iterator sub_expression = AllExpressions.erase(start);				//sub_expression points to the sub_expression				//unless it was empty				if(sub_expression==sub_end) //the group was empty				{					//overwrite the ) with () and contents==null					(*sub_end)= new Expression(RESULT);				}				else //non-empty				{					(*sub_end)->token=RESULT;					(*sub_end)->contents=(*sub_expression);					AllExpressions.erase(sub_expression);				}				return false;//goto READYSTATE;			}		}		//neither ( nor ) found		++iter;	}}void Group(Iterator begin, Iterator end){    Iterator iter=begin;    Iterator start;    Iterator finish;	while(true) //READYSTATE:	{		if(iter==end)		{			return;		}		cout<<"Group: ready_state"<<endl;		Token& t( (*iter)->token );	//----open-symbol---------------------------		if(t==OPENPAREN)		{			start=iter;			++iter;			if			(				ProcessGroup(iter,end,start,OPENPAREN,CLOSEPAREN,GROUP)			)cout<<"error processing GROUP"<<endl;			continue;		}		if(t==OPENBRACKET)		{			start=iter;			++iter;			if			(				ProcessGroup(iter,end,start,OPENBRACKET,CLOSEBRACKET,BOX)			)cout<<"error processing BOX"<<endl;			continue;		}		if(t==OPENBRACE)		{			start=iter;			++iter;			if			(				ProcessGroup(iter,end,start,OPENBRACE,CLOSEBRACE,BLOCK)			)cout<<"error processing BLOCK"<<endl;			continue;		}	//----close-symbol----------------------------------------------		if(t==CLOSEPAREN || t==CLOSEBRACKET || t==CLOSEBRACE)		{			cout<<"error unexpected closing symbol: "<<t<<endl;			return;		}	//----other-symbol----------------------------------------------		++iter;	}}

infix.cpp (also handles postfix operators)

#include "../Core/global.h"#include "../Core/Runes and Articles.h"//uses rotations to handle right-associative operators//not sure how this technique would work if there were//right-associative operators at multiple precedence levels//currently handles infix and postfixvoid FixOperators(	Iterator begin,	Iterator end,	Token* operators,	int size,	string fixation,	string associativity){	if(fixation!="infix" && fixation!="postfix" && fixation!="prefix")		cout<<"error, improper fixation"<<endl;    //deal with empty list    if(begin==end)        return;    Iterator iter=begin;    if(associativity!="prefix")    	++iter;    while(iter!=end)    {        if( (*iter)->match(operators,size) )        {			Iterator left=iter;			Iterator right=iter;			--left;			++right;			if(right==end && fixation!="postfix")			{				//special case, insert empty group: ()				/*if( (*iter)->token==SEMICOLON )				{					AllExpressions.insert(right,&EMPTYGROUP);					right=iter;					++right;				}*/				//special case, drop the comma				//else				if( (*iter)->token==COMMA || (*iter)->token==SEMICOLON)				{					//check this code					iter=AllExpressions.erase(iter);					goto NO_INCREMENT;				}				else					cout<<"FixOperators: no right operand"<<(*iter)->token<<endl;			}			if(fixation!="postfix")			{				(*iter)->right = (*right);				AllExpressions.erase(right);			}			if(fixation!="prefix")			{				(*iter)->left = (*left);				AllExpressions.erase(left);			}			(*iter)->fixation=fixation;			//check for chained right-associative operators			if			(				associativity=="right" &&				fixation=="infix" &&				(*iter)->left->match(operators,size)			)//rotation required			{				Expression* temp=*iter;				*iter=(*iter)->left;				Expression* right_most=*iter;				while(right_most->right->match(operators,size))					right_most=right_most->right;				temp->left=right_most->right;				right_most->right=temp;			}    	}    	++iter; //FIX ME:this probably goes here, maybe above    	NO_INCREMENT:;	}}

postparse.cpp, restructures things a little, considered part of the parser

#include "../Core/global.h"#include "../Core/Runes and Articles.h"#include "../SemanticAnalyzer/Scope.h"void RemoveParenthesis(Expression* e);void LexicallyScope(Expression* e, Scope* scope);void RestructureFunctions(Expression* e);void SetParent(Expression* e, Expression* parent);void RestructureClasses(Expression* e);void PostParse(Expression* e){	RemoveParenthesis(e);	SetParent(e,0);	RestructureClasses(e);	RestructureFunctions(e);	SetParent(e,0);	//note that root is a global	root = new Scope(0);	LexicallyScope(e,root);}void RemoveParenthesis(Expression* e){	if(!e)		return;	if(e->token==GROUP && e->contents)	{		Expression* temp=e->contents;		*e=*temp;		delete temp;		cout<<"removing ()"<<endl;	}	RemoveParenthesis(e->left);	RemoveParenthesis(e->right);	RemoveParenthesis(e->contents);}void LexicallyScopeChildren(Expression* e, Scope* scope){	LexicallyScope(e->right,scope);	LexicallyScope(e->contents,scope);	LexicallyScope(e->left,scope);}void LexicallyScope(Expression* e, Scope* scope){	if(!e)		return;	if(!scope)	{		cout<<"error in LexicallyScope, scope==0"<<endl;		return;	}	if(e->token==BLOCK)	{		Scope* new_scope = new Scope(scope);		e->scope=new_scope;		LexicallyScopeChildren(e,new_scope);		return;	}	e->scope=scope;	LexicallyScopeChildren(e,scope);}void RestructureFunctions(Expression* e){	if(!e)		return;	if //standard function	(		e->token==ADJ &&		e->left->token==ADJ &&		e->left->left->token==ARROW &&		e->right->token==BLOCK	)	{		e->right->left=e->left->left->left;		e->right->right=e->left->left->right;		Expression* p;		p=e->left->left->left;		e->left->left->left=ExtractType(p);		p=e->left->left->right;		e->left->left->right=ExtractType(p);		RestructureFunctions(e->right->contents);		return;	}	if //implicit function (block)	(		e->token==BLOCK &&		!(e->left) &&		!(e->right)	)	{		Expression* e2 = new Expression(MAKE_EQUAL);		e2->right=e->contents;		e2->left= new Expression(Token("identifier","__return_value"));		e->contents=e2;		e->left=new Expression(EMPTYGROUP);		e->right= new Expression(ADJACENCY);		e->right->right = new Expression(Token("identifier","__return_value"));		e->right->left = new Expression(TYPEOF); //creates cycle!		e->right->left->contents=e->contents->right;		RestructureFunctions(e->contents);		return;	}	RestructureFunctions(e->left);	RestructureFunctions(e->right);	RestructureFunctions(e->contents);}void LexicallyScope(Expression* e, Scope* scope);void SetParent(Expression* e, Expression* parent){	if(!e)		return;	e->parent=parent;	if(e->token==TYPEOF)		return;	SetParent(e->left,e);	SetParent(e->right,e);	SetParent(e->contents,e);}void RestructureClasses(Expression* e){	if(!e)		return;	RestructureClasses(e->right);	RestructureClasses(e->contents);	RestructureClasses(e->left);	if(e->token==KEYWORD_CLASS)	{		cout<<"RestructureClasses KEYWORD_CLASS begin"<<endl;		Expression* p=e->parent->parent;		if(p->token != MAKE_EQUAL)		{			cout<<"error, = needed in class definition"<<endl;		}		p->token=CLASS_DEF;		cout<<"RestructureClasses KEYWORD_CLASS end"<<endl;	}	if(e->token==CLASS_DEF)	{		cout<<"RestructureClasses CLASS_DEF begin"<<endl;		Expression* temp=e->left;		e->left=e->left->right;		delete temp->left;		delete temp;		cout<<"RestructureClasses CLASS_DEF end"<<endl;	}}/*void oldLexicallyScope(Expression* e, Scope* scope){	if(!e)		return;	if(!scope)	{		cout<<"error in LexicallyScope, scope==0"<<endl;		return;	}	cout<<"LexicallyScope 1"<<endl;	if(e->token==ARROW)	{		Expression* parent=e->parent;		if(!parent)			goto BLOCKLESS;		Expression* brother=parent->right;		if(!brother)			goto BLOCKLESS;		if( brother->token == Token("special","{}") )		{			Scope* new_scope = brother->scope;			e->scope=new_scope;			LexicallyScopeChildren(e,new_scope);			return;		}		Expression* grandparent=parent->parent;		if(!grandparent)			goto BLOCKLESS;		Expression* uncle=grandparent->right;		if(!uncle)			goto BLOCKLESS;		if( uncle->token == Token("special","{}") )		{			Scope* new_scope = uncle->scope;			e->scope=new_scope;			LexicallyScopeChildren(e,new_scope);			return;		}		BLOCKLESS: //just a function, no block		{			Scope* new_scope= new Scope(scope);			e->scope=new_scope;			LexicallyScopeChildren(e,new_scope);			return;		}	}	if(e->token==Token("special","{}"))	{		Scope* new_scope = new Scope(scope);		e->scope=new_scope;		LexicallyScopeChildren(e,new_scope);		return;	}	cout<<"LexicallyScope 3"<<endl;	e->scope=scope;	LexicallyScopeChildren(e,scope);	cout<<"LexicallyScope 4"<<endl;}*/

Compiler Design - Lexical & syntactical analysis

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Compiler Design - Lexical & syntactical analysis

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines