Creating a Scripting Language with YACC?

Started by
30 comments, last by Basiror 17 years, 8 months ago
Now that u point that out Basior I might look into that... I have no idea what a bytecode is but thats what books and internet are for =P, As you must have noticed I have alot to learn =/ but its all fun else i wouldnt be here.
Advertisement
I would also suggest you use an existing scripting language (probably lua).

I have written three scripting languages, one by hand, one in C using Bison, and one two-stage in C++ using spirit.
Spirit was the easiest for me to write, but it brought my compiler to its knees by the time I completed the compiler. Bison I never quite got the hang of, and the interpereter still suffers from the occaissional crash. The hand coded one never progressed past the assembler stage, hand parsing a complex grammar was definately beyond my abilities at the time.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

I am also working on a C++ like scripting language since yesterday evening.

So far I am still fiddling with a decent grammar.
I use Antlr to generate the parser code for C++

Writing a good grammar seems to be quite a complex step, especially to keep it extendable.

http://www.8ung.at/basiror/theironcross.html
Well after examining some articles and talking to my dad and reading some of the books I have concluded how to write a compiler to byte codes. I'm writing a virtual micro processor with a thing called an accumulator for the current value. If this was how i had to program I think i'd shoot myself. But I should be able to pull it off. Thankyou for the pokes in the right direction. I want to write my own scripting language because i'm controlling and like bragging rights =P I'll post back when i finish the compiler and the runner for the game
Thankyou again.
http://www.peroxide.dk/tuts_scr.shtml
Well i'm just coding it myself now I'm almost done with my "Tokenizer" And I have my Bytecodes planned its gonna be a sinch so yah i'm not using lex or Yacc...
I am using ANTLR as a parser and lexer generator

I am still in the grammar design phase.
The lexer is done, the parser however needs another few days of work


I didn t run or compile the generator code yet.
Maybe someone could have a look at the grammar and check for mistakes, ANTLR generates LL(1) parsers and can optionally create LL(k) parsers where you specify k in the options
options {
k=someconstant
}

Quote:
options {
language="Cpp";
mangleLiteralPrefix = "TK_";
}

class cross_scriptparser extends Parser;
{
}
block : LCURLY ( block | statement )* RCURLY;

statement : expression TERM
| ifstatement
;

ifstatement : TK_if LPAREN expression RPAREN (block (elseifstatement)? | expression )
;

elseifstatement : TK_else ( ifstatement | block | expression)
;

variable : identifier
;


expression : relexpr ((AND|OR) relexpr)*
;
//relative expressions
relexpr : addexpr ( (EQUAL|NOTEQUAL|GREATER|LESS|GEQUAL|LEQUAL) addexpr)*
;
//addition and subtraction
addexpr : timeexpr ( (PLUS | MINUS) timeexpr)*
;
//multiplication
timeexpr : signexpr ( (TIMES|DIVIDE|MOD) signexpr)*
;
//signed variables, constants ...
signexpr : LPAREN (PLUS|MINUS) variable RPAREN
| variable
;

declarations : "lol"
;

classdeclaration : TK_class identifier LCURLY classbody RCURLY TERM
;
classbody : (constructordeclaration| destructordeclaration| (memberdeclarations)*)
;

memberdeclarations : (TK_public SEMI| TK_protected SEMI| TK_private SEMI)? variabledeclaration
;

constructordeclaration : TK_constructor LCURLY RCURLY TERM
;

destructordeclaration : TK_destructor LCURLY RCURLY TERN
;

variabledeclaration : (TK_static)? type identifier TERM
;

constant : (PLUS | MINUS)? NUMBER
;
type : TK_int | TK_uint | TK_float | TK_byte | TK_char | TK_string | TK_array | TK_list | TK_map | identifier ;

identifier : LITERAL
;
unaryop : STAR | DIVIDE | MINUS | PLUS | MOD;

assignment_operator : ASSIGN
| TIMESEQUAL
| DIVIDEEQUAL
| MODEQUAL
| PLUSEQUAL
| MINUSEQUAL
| LEFTSHIFTEQUAL
| RIGHTSHIFTEQUAL
| BITWISEANDEQUAL
| BITWISEOREQUAL
| BITWISEXOREQUAL
;


class cross_scriptlexer extends Lexer;
options {
}
tokens {
"import";
"if"; "else"; "while"; TK_else_if="else if";
"void"; "int"; "uint"; "float"; "byte"; "char"; "array"; "list"; "map"; "string";
"class"; "constructor"; "destructor"; "public"; "private"; "protected";
"const"; "static";
TIMESEQUAL ="*=";
PLUSEQUAL ="+=";
MINUSEQUAL ="-=";
DIVIDEEQUAL ="/=";
MODEQUAL ="%=";
RIGHTSHIFTEQUAL =">>=";
LEFTSHIFTEQUAL ="<<=";
EQUAL ="==";
NOTEQUAL ="!=";
LEQUAL ="<=";
GEQUAL =">=";
LEFTSHIFT ="<<";
RIGHTSHIFT =">>";
BITWISEANDEQUAL ="&=";
BITWISEOREQUAL ="|=";
BITWISEXOREQUAL ="^=";
SCOPE ="::";
}
WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
{ $setType(Token.SKIP); };

protected
DIGIT : '0'..'9' ;

protected
INT : (DIGIT)+ ;

protected
FLOAT : INT '.' INT | '.' INT ;

NUMBER : INT { $setType(INT); } ('.' INT { $setType(FLOAT); })?
| '.' INT { $setType(FLOAT); }
;
protected
SMALL_LIT : 'a'..'z' ;

protected
BIG_LIT : 'A'..'Z' ;

LITERAL : (SMALL_LIT | BIG_LIT | '_') (SMALL_LIT | BIG_LIT | '_' | DIGIT)* ;

LCURLY : '{' ;
RCURLY : '}' ;
LSQUARE : '[';
RSQUARE : ']';
LPAREN : '(' ;
RPAREN : ')' ;
STAR : '*' ;
MOD : '%';
PLUS : '+';
MINUS : '-';
DIVIDE : '/';
ASSIGN : '=';
LESS : '<';
GREATER : '>';
TERM : ';';
SEMI : ':';




Little update on the grammar,

I really wonder who rated my down :pf

[Edited by - Basiror on July 23, 2006 1:15:52 PM]
http://www.8ung.at/basiror/theironcross.html
Semi has a normal colon instead of a semi colon and Term has a semi colon. Your missing just the ! operator alone

while were showing code How's my Tokenizer?
It doesnt support two character operators yet...
and
And yes i know i can use usingnamespace I'm just too lazy to put in a time saving peice of code, thats lazy!
void CCompiler::Tokenize (void){	std::string SingleTokenCharacters = "=+-\*{}(),;<>";	std::string WhiteSpaces = " /n/t";	int Location;	std::string CurrentToken;	for (Location = 0;Location < Script.length ();Location++)	{		int loc = WhiteSpaces.find (Script [Location],0); //Test the Character to the Whitespace Definition String		if (loc != std::string::npos) //We found that it is a Whitespace		{			//Terminate the Token			if (CurrentToken.length () > 0) //See if the current token has anything in it			{				AddToken2InstructList (&CurrentToken); //Store the current token away in the stream				CurrentToken.clear (); //Clear the temporary token buffer			}			continue; //Test the next character		}		loc = SingleTokenCharacters.find (Script [Location],0); //Test if it matches a special character		if (loc != std::string::npos)//The character is a special character		{                        //Terminate the token			if (CurrentToken.length () > 0) //Test if there is anything in the token			{				AddToken2InstructList (&CurrentToken); //Add it to the stream				CurrentToken.clear (); // Clear the Buffer			}			AddToken2InstructList (&Script.substr(Location,1)); //Add the specail token to the Token Stream			continue; //Test next character		}		CurrentToken += Script [Location]; //The character didnt match any of the delimiters tack it onto the Token Buffer and move on.	}}


[Edited by - Tocs1001 on July 24, 2006 9:55:55 PM]
Hi,
Yes I just correct the SEMI TERM thing ^^

The whole grammar development is a mess, its 2 years ago that I wrote my last parser generator.
First thing to implement is pretty printing (scanning the sources and printing it to cout to compare it with the input, if that works ok, you are half way through the mess :)


As for your Tokenizer, how does this work?
Do you scan for white spaces to skip them first and then you scan for tokens of your grammar?
http://www.8ung.at/basiror/theironcross.html
Well it loops through the script examining it character by character.
it uses the find function of std::string to see if the character matches a whitespace. If it does, it means that the current token needs to be terminated example int i; int has a space after it which means that "int" is a single token. So it tests if CurrentToken has anything. If it does have something it stores it away in the Tokens vector which is the stream of tokens. CurrentToken is what each character of a token gets added to temporarly when the token ends it adds the CurrentToken to the TokenStream and clears the CurrentToken then continues the loop bringing it back to the begining. The next test is for "Single Character Tokens" which are characters that should be their own tokens. they are like ;'s and /'s and {}'s and whatnot. It uses std::string::find to check if the current character is a "single character token" If it is then it needs to terminate the current token and add it to the stream as well as make a token for the "Single Character Token" So once again it checks if the current token is anything, if it is it adds it to the stream, clears the temporary token buffer (CurrentToken). and then adds another token for the character. If the Character the Tokenizer is examining is neither a Whitespace or a "Single Character Token" its added onto the end of Current Token.

and presto u have a simple Tokenizer. Only problem is it doesnt support 2 character operators like >= or <= or != but instead it seperates to "!" and "=" in seperate tokens which is fine but the compiler will have to check ahead when it reads certain characters for an equals sign. My dad is bring home a book on writing a Lexical Analyser from scratch using a table method which he explained a little.

I need to learn to comment my code i'm a bad coder X( Put some comments into the b4 post for your viewing pleasure.

When this compiler is done its back to writing the seamless world engine for the actual game. The script is a side trip but a nessasary one... I cant wait till the seamless worlds done then i can post screens of a world and write the world editor.

my projects website is http://home.fuse.net/dragonstorm/main.shtml The address sucks but when alpha's done i'm gettin a domain name =). My forums are swamped by a noob who made 2 names and posts about everything rediuclus and thinks hes part of the dev team when he has no skills... But thats offtopic ^^

P.S. Your missing the ! operator like for if (!Boolean) {}

[Edited by - Tocs1001 on July 24, 2006 9:08:16 PM]

This topic is closed to new replies.

Advertisement