Sign in to follow this  
Tocs1001

Creating a Scripting Language with YACC?

Recommended Posts

Well My "MMORPG" could use a decent scripting language to easily add some content and give some of my not so computer savy friends a chance to help out =). The scripting language will be able to Give specail features to NPC's allowing some out of the rote standard NPC conversation/kill the NPC I could script special pathes and Specail Attacks. As well as script things like Traps and Special Items that arnt like potions or food or a weapon. Its not to create the game but to embelish it and add some specail features that would normally take some editing of the games source code. but to the question... I was working on the logic involved when writing a script parser, but my dad told me about a thing called YACC (Yet Another Compiler Compiler) and how I could create through a special programming language to create a set of "rules" or "Grammer" for a language and it would create a c++ file to parse it. He didnt have much time to explain it and he hasnt used it but he remembers it from a book he read with it. Question is.. Is this what i want? I've been looking at some YACC tutorials it doesnt really seem to be it Could anyone give some pointers about it and if I'm drifting the wrong way. I finished how i'm going to parse the script if YACC doesnt work out.

Share this post


Link to post
Share on other sites
I could also suggest, in the same vein, Bison (and flex for lexing). Of course, you might be able to go for something much simpler if your language has certain properties (assembler: you don't need more parsing than ifstream provides, XML: use an XML parser, Python: use boost, Lua: use the default parser etc)

Share this post


Link to post
Share on other sites
Consider using an existing scripting language, such as Lua. Whether you write your own language or not, you'll still spend a good chunk of time integrating that language in a usefull fashion with your game. By using an existing language, you remove a lot of extra work, and you get something that is probably far more stable and useful than what you'd create yourself as a first attempt.

Also, in addition to Lex/Yacc and the various ports thereof, you may want to take a look at ANTLR, which tends to generate code that is much more maintainable (in my opinion).

Share this post


Link to post
Share on other sites
Well I dont know much about scripting so its a humble beginning if i use an existing scripting language how do i create my own functions to be used in the scripts if their already made? (the most experience with scripting i have is Second Life lol)

I have a list of functions that scripts should be able to use like GiveExp (int Ammount, int SkillID) is one I want to be able to call that in my script if i use a premade scripting language can i do that? I'm aiming for a C like simplified language that is read "compiled" when the script gets triggered. I wrote out a flow chart of how to write the parser for my script.

Basically it will look kinda like this

Func Main
{
GiveItem (ITEM_ID_SWORD,1);
SpawnNPC (MONSTER_ID_GOBLIN, <0,0,0>);
SayDialog ("Use the sword to kill the Goblin");
}

and loops would just be

Loop (3)
{

}
would loop 3 times.

I am using XML to give definitions for Weapon Stats, Potion Effects, Zone Details like where things are, and other things that fit within the norm of a RPG things that dont need some special scripting. But to create various things like A Portal which a demon comes out Causing some damage as he comes through or even coding special AI for a boss would be done through the script

The problem is I'm scratching my head with all the YACC tutorials, i wish i could use it but I dont quite understand it. Most of them address YACC working with something called Lex and its very confusing.

I guess to say should i learn to use the YACC or similar thing or should i just write my own OO parser with just straight c++ to get it done...

PS ANTLR it seems to have a better base of documentation, its intro didnt leave me scratching my head =P

[Edited by - Tocs1001 on July 21, 2006 1:05:48 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
Consider using an existing scripting language, such as Lua. Whether you write your own language or not, you'll still spend a good chunk of time integrating that language in a usefull fashion with your game. By using an existing language, you remove a lot of extra work, and you get something that is probably far more stable and useful than what you'd create yourself as a first attempt.

Also, in addition to Lex/Yacc and the various ports thereof, you may want to take a look at ANTLR, which tends to generate code that is much more maintainable (in my opinion).



I can only second this post,

sadly that none of my favorite scripting languages workes on my current installation Suse10.0 x86_64
squirrel with sqplus creates a segfault and angelcode seems to be pretty unportable to 64 bit platforms, it doesn t work either, I simply can t bind a function and all function calls always return false :(

The only option is to write my own scripting language now. With bytecode of course, this allows the construction of a language independently from the script processor the Virtual Machine

Share this post


Link to post
Share on other sites
Quote:
Original post by Basiror
*snip* this allows the construction of a language independently from the script processor the Virtual Machine


well, theoretically, yes. Unless you plan on creating a front end/back end combination, with an intermediate representation of your code, then this isn't really feasible. You'll need to know the opcodes and operands of the vm's byte-code language, which will be closely tied with how the Virtual Machine works.

Share this post


Link to post
Share on other sites
Quote:
Original post by daerid
Quote:
Original post by Basiror
*snip* this allows the construction of a language independently from the script processor the Virtual Machine


well, theoretically, yes. Unless you plan on creating a front end/back end combination, with an intermediate representation of your code, then this isn't really feasible. You'll need to know the opcodes and operands of the vm's byte-code language, which will be closely tied with how the Virtual Machine works.


a wa? You lost me, I'm just gonna write a parser in c++, it will be good learning as well... Although the Flipcode tut on Script Languages gave me more understanding of Lex and YACC

Share this post


Link to post
Share on other sites
Tocs1001: The big advantage is bytecode is you can compile your script and release them without releasing the actual source code to anyone.
Bytecode can look like this
32 bit per line
#########
addintref
address1-
address2-
addintval
----50050
----12334

... You should at least know assembler to do this efficiently

Writing an interpreter will probably work in most cases, we did this in our first semester at university, it was really fun :)


(yes I know thats a huge wast of memory, but it helps speeding up the VM on 32 bit systems)
Whats the block size of 64 bit systems? Also 32 bit? You can plug in the same memory)

[Edited by - Basiror on July 22, 2006 3:53:59 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Basiror
Tocs1001: The big advantage is bytecode is you can compiler your script and release them without releasing the actual source code to anyone.


Compilering scripts sounds great! I want to learn how to compiler scripts! Sorry I couldn't help it [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by deathkrush
Quote:
Original post by Basiror
Tocs1001: The big advantage is bytecode is you can compiler your script and release them without releasing the actual source code to anyone.


Compilering scripts sounds great! I want to learn how to compiler scripts! Sorry I couldn't help it [grin]


Some people just can t contribute in a constructive manner.

Share this post


Link to post
Share on other sites
Now that u point that out Basior I might look into that... I have no idea what a bytecode is but thats what books and internet are for =P, As you must have noticed I have alot to learn =/ but its all fun else i wouldnt be here.

Share this post


Link to post
Share on other sites
I would also suggest you use an existing scripting language (probably lua).

I have written three scripting languages, one by hand, one in C using Bison, and one two-stage in C++ using spirit.
Spirit was the easiest for me to write, but it brought my compiler to its knees by the time I completed the compiler. Bison I never quite got the hang of, and the interpereter still suffers from the occaissional crash. The hand coded one never progressed past the assembler stage, hand parsing a complex grammar was definately beyond my abilities at the time.

Share this post


Link to post
Share on other sites
I am also working on a C++ like scripting language since yesterday evening.

So far I am still fiddling with a decent grammar.
I use Antlr to generate the parser code for C++

Writing a good grammar seems to be quite a complex step, especially to keep it extendable.

Share this post


Link to post
Share on other sites
Well after examining some articles and talking to my dad and reading some of the books I have concluded how to write a compiler to byte codes. I'm writing a virtual micro processor with a thing called an accumulator for the current value. If this was how i had to program I think i'd shoot myself. But I should be able to pull it off. Thankyou for the pokes in the right direction. I want to write my own scripting language because i'm controlling and like bragging rights =P I'll post back when i finish the compiler and the runner for the game
Thankyou again.

Share this post


Link to post
Share on other sites
Well i'm just coding it myself now I'm almost done with my "Tokenizer" And I have my Bytecodes planned its gonna be a sinch so yah i'm not using lex or Yacc...

Share this post


Link to post
Share on other sites
I am using ANTLR as a parser and lexer generator

I am still in the grammar design phase.
The lexer is done, the parser however needs another few days of work


I didn t run or compile the generator code yet.
Maybe someone could have a look at the grammar and check for mistakes, ANTLR generates LL(1) parsers and can optionally create LL(k) parsers where you specify k in the options
options {
k=someconstant
}

Quote:

options {
language="Cpp";
mangleLiteralPrefix = "TK_";
}

class cross_scriptparser extends Parser;
{
}
block : LCURLY ( block | statement )* RCURLY;

statement : expression TERM
| ifstatement
;

ifstatement : TK_if LPAREN expression RPAREN (block (elseifstatement)? | expression )
;

elseifstatement : TK_else ( ifstatement | block | expression)
;

variable : identifier
;


expression : relexpr ((AND|OR) relexpr)*
;
//relative expressions
relexpr : addexpr ( (EQUAL|NOTEQUAL|GREATER|LESS|GEQUAL|LEQUAL) addexpr)*
;
//addition and subtraction
addexpr : timeexpr ( (PLUS | MINUS) timeexpr)*
;
//multiplication
timeexpr : signexpr ( (TIMES|DIVIDE|MOD) signexpr)*
;
//signed variables, constants ...
signexpr : LPAREN (PLUS|MINUS) variable RPAREN
| variable
;

declarations : "lol"
;

classdeclaration : TK_class identifier LCURLY classbody RCURLY TERM
;
classbody : (constructordeclaration| destructordeclaration| (memberdeclarations)*)
;

memberdeclarations : (TK_public SEMI| TK_protected SEMI| TK_private SEMI)? variabledeclaration
;

constructordeclaration : TK_constructor LCURLY RCURLY TERM
;

destructordeclaration : TK_destructor LCURLY RCURLY TERN
;

variabledeclaration : (TK_static)? type identifier TERM
;

constant : (PLUS | MINUS)? NUMBER
;
type : TK_int | TK_uint | TK_float | TK_byte | TK_char | TK_string | TK_array | TK_list | TK_map | identifier ;

identifier : LITERAL
;
unaryop : STAR | DIVIDE | MINUS | PLUS | MOD;

assignment_operator : ASSIGN
| TIMESEQUAL
| DIVIDEEQUAL
| MODEQUAL
| PLUSEQUAL
| MINUSEQUAL
| LEFTSHIFTEQUAL
| RIGHTSHIFTEQUAL
| BITWISEANDEQUAL
| BITWISEOREQUAL
| BITWISEXOREQUAL
;


class cross_scriptlexer extends Lexer;
options {
}
tokens {
"import";
"if"; "else"; "while"; TK_else_if="else if";
"void"; "int"; "uint"; "float"; "byte"; "char"; "array"; "list"; "map"; "string";
"class"; "constructor"; "destructor"; "public"; "private"; "protected";
"const"; "static";
TIMESEQUAL ="*=";
PLUSEQUAL ="+=";
MINUSEQUAL ="-=";
DIVIDEEQUAL ="/=";
MODEQUAL ="%=";
RIGHTSHIFTEQUAL =">>=";
LEFTSHIFTEQUAL ="<<=";
EQUAL ="==";
NOTEQUAL ="!=";
LEQUAL ="<=";
GEQUAL =">=";
LEFTSHIFT ="<<";
RIGHTSHIFT =">>";
BITWISEANDEQUAL ="&=";
BITWISEOREQUAL ="|=";
BITWISEXOREQUAL ="^=";
SCOPE ="::";
}
WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
{ $setType(Token.SKIP); };

protected
DIGIT : '0'..'9' ;

protected
INT : (DIGIT)+ ;

protected
FLOAT : INT '.' INT | '.' INT ;

NUMBER : INT { $setType(INT); } ('.' INT { $setType(FLOAT); })?
| '.' INT { $setType(FLOAT); }
;
protected
SMALL_LIT : 'a'..'z' ;

protected
BIG_LIT : 'A'..'Z' ;

LITERAL : (SMALL_LIT | BIG_LIT | '_') (SMALL_LIT | BIG_LIT | '_' | DIGIT)* ;

LCURLY : '{' ;
RCURLY : '}' ;
LSQUARE : '[';
RSQUARE : ']';
LPAREN : '(' ;
RPAREN : ')' ;
STAR : '*' ;
MOD : '%';
PLUS : '+';
MINUS : '-';
DIVIDE : '/';
ASSIGN : '=';
LESS : '<';
GREATER : '>';
TERM : ';';
SEMI : ':';




Little update on the grammar,

I really wonder who rated my down :pf

[Edited by - Basiror on July 23, 2006 1:15:52 PM]

Share this post


Link to post
Share on other sites
Semi has a normal colon instead of a semi colon and Term has a semi colon. Your missing just the ! operator alone

while were showing code How's my Tokenizer?
It doesnt support two character operators yet...
and
And yes i know i can use usingnamespace I'm just too lazy to put in a time saving peice of code, thats lazy!

void CCompiler::Tokenize (void)
{
std::string SingleTokenCharacters = "=+-\*{}(),;<>";
std::string WhiteSpaces = " /n/t";
int Location;
std::string CurrentToken;
for (Location = 0;Location < Script.length ();Location++)
{
int loc = WhiteSpaces.find (Script [Location],0); //Test the Character to the Whitespace Definition String
if (loc != std::string::npos) //We found that it is a Whitespace
{
//Terminate the Token
if (CurrentToken.length () > 0) //See if the current token has anything in it
{
AddToken2InstructList (&CurrentToken); //Store the current token away in the stream
CurrentToken.clear (); //Clear the temporary token buffer
}
continue; //Test the next character
}
loc = SingleTokenCharacters.find (Script [Location],0); //Test if it matches a special character
if (loc != std::string::npos)//The character is a special character
{
//Terminate the token
if (CurrentToken.length () > 0) //Test if there is anything in the token
{
AddToken2InstructList (&CurrentToken); //Add it to the stream
CurrentToken.clear (); // Clear the Buffer
}
AddToken2InstructList (&Script.substr(Location,1)); //Add the specail token to the Token Stream
continue; //Test next character
}
CurrentToken += Script [Location]; //The character didnt match any of the delimiters tack it onto the Token Buffer and move on.
}
}


[Edited by - Tocs1001 on July 24, 2006 9:55:55 PM]

Share this post


Link to post
Share on other sites
Hi,
Yes I just correct the SEMI TERM thing ^^

The whole grammar development is a mess, its 2 years ago that I wrote my last parser generator.
First thing to implement is pretty printing (scanning the sources and printing it to cout to compare it with the input, if that works ok, you are half way through the mess :)


As for your Tokenizer, how does this work?
Do you scan for white spaces to skip them first and then you scan for tokens of your grammar?

Share this post


Link to post
Share on other sites
Well it loops through the script examining it character by character.
it uses the find function of std::string to see if the character matches a whitespace. If it does, it means that the current token needs to be terminated example int i; int has a space after it which means that "int" is a single token. So it tests if CurrentToken has anything. If it does have something it stores it away in the Tokens vector which is the stream of tokens. CurrentToken is what each character of a token gets added to temporarly when the token ends it adds the CurrentToken to the TokenStream and clears the CurrentToken then continues the loop bringing it back to the begining. The next test is for "Single Character Tokens" which are characters that should be their own tokens. they are like ;'s and /'s and {}'s and whatnot. It uses std::string::find to check if the current character is a "single character token" If it is then it needs to terminate the current token and add it to the stream as well as make a token for the "Single Character Token" So once again it checks if the current token is anything, if it is it adds it to the stream, clears the temporary token buffer (CurrentToken). and then adds another token for the character. If the Character the Tokenizer is examining is neither a Whitespace or a "Single Character Token" its added onto the end of Current Token.

and presto u have a simple Tokenizer. Only problem is it doesnt support 2 character operators like >= or <= or != but instead it seperates to "!" and "=" in seperate tokens which is fine but the compiler will have to check ahead when it reads certain characters for an equals sign. My dad is bring home a book on writing a Lexical Analyser from scratch using a table method which he explained a little.

I need to learn to comment my code i'm a bad coder X( Put some comments into the b4 post for your viewing pleasure.

When this compiler is done its back to writing the seamless world engine for the actual game. The script is a side trip but a nessasary one... I cant wait till the seamless worlds done then i can post screens of a world and write the world editor.

my projects website is http://home.fuse.net/dragonstorm/main.shtml The address sucks but when alpha's done i'm gettin a domain name =). My forums are swamped by a noob who made 2 names and posts about everything rediuclus and thinks hes part of the dev team when he has no skills... But thats offtopic ^^

P.S. Your missing the ! operator like for if (!Boolean) {}

[Edited by - Tocs1001 on July 24, 2006 9:08:16 PM]

Share this post


Link to post
Share on other sites
I also plan to write another engine somewhen ... I don t have the time nor the content at the moment.
I have a ton of 4 year old texture somewhere, but thats about it.


I seperated my project into 2 sections

1. Quake3 Radiant like map editor, its coming along slowly.
Just recently I ported it from windows to linux with success.
I shall contain the whole feature set of the radiant editors.
Thats also the primary reason why I am working on a new scripting language. So I can submit plugins as byte code to extend the editors functionality

2. A FPS engine based on the content created with my editor.
Most of the rendering code will be done before even starting with the editor because I plan to use the same renderer in the editor as well.
The editor's base is up and running, pretty stable too.
Next thing would be to implement the command system:

- undo, redo, execute
- finalize the culling system
- implementing the first commands like brush creation, deletion..

A huge project for a single being with an open timeline

Share this post


Link to post
Share on other sites
Scripting languages are great for so many things, guys! You can use them to program AI, particle effects, animated stuff, and if their complex enough you can even use them to write tools (ie. modelers and level editors)!

I'm also developing a scripting language now, and it will be fairly simple (not type-safe, no explicit variable declarations), howeve,r it will be object-oriented.

I've found scripting languages to be generally easy to implement, unless:

1) You want functional programing (routine-based). This is how many scripting languages are, and it makes things slightly more difficult (expecially when compiling to bytecode).

2) You want object orientation. My scripting lang will have OOP, and I've found it to be fairly complex - especially if you're implementing it in a non-object oriented language!

3) You want complex expressions subgrouped in parentheses like so:

x = 5+6*((sin(x/y)*5)/1);

This is where the true hell begins creeping in. Unwinding these bad boys is harder than drilling through diamonds. And surely you will want to convert these into single instructions when compiling to bytecode. Differentiating between variables, functions, and constants can be a pain. And implementing the order of operations!?!? Tough. Go with Python if you want a more complex language.

Share this post


Link to post
Share on other sites
Quote:
Original post by ouraqt
3) You want complex expressions subgrouped in parentheses like so:

x = 5+6*((sin(x/y)*5)/1);

This is where the true hell begins creeping in. Unwinding these bad boys is harder than drilling through diamonds. And surely you will want to convert these into single instructions when compiling to bytecode. Differentiating between variables, functions, and constants can be a pain. And implementing the order of operations!?!? Tough.


This is where Spirit or Bison or similar will save your bacon. You let their parseing take care of all the de-nesting, etc. and then you run an 'Optimizer' over the generated bytecode (Basiccally just a function that replaces all operations between 2 constant values with the result).

Share this post


Link to post
Share on other sites
Being a fairly low-level dude, I prefer to do the parsing myself to ensure that I know what's going on.

What I'm struggling with now is combining mathematical expressions and grouping with boolean expressions. For example:

bool x = IsCool||(Foo&&(Bar||Foobar)&&(x > 6/(5x+2)));

Despite the fact that no sane person would create something that ugly, I'd say a lexer that can successfully parse and unwind this expression is a piece of software that deserves a gold metal with a shiny trophy. Also, people have different coding styles, and some people may write ugly expressions like this (I have before) so the parsing ability is a requirement.

[Edited by - ouraqt on July 27, 2006 9:57:24 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by ouraqt
What I'm struggling with now is combining mathematical expressions and grouping with boolean expressions. For example:

bool x = IsCool||(Foo&&(Bar||Foobar)&&(x > 6/(5x+2)));

Despite the fact that no sane person would create something that ugly, I'd say a lexer that can successfully parse and unwind this expression is a piece of software that deserves a gold metal with a shiny trophy. Also, people have different coding styles, and some people may write ugly expressions like this (I have before) so the parsing ability is a requirement.


How are you structuring your parser?
I would recommend that you fully tokenize this before attempting to compile it, it is much easier to compile a token stream than a long string ;)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this