Creating a Scripting Language with YACC?

Started by
30 comments, last by Basiror 17 years, 8 months ago
I also plan to write another engine somewhen ... I don t have the time nor the content at the moment.
I have a ton of 4 year old texture somewhere, but thats about it.


I seperated my project into 2 sections

1. Quake3 Radiant like map editor, its coming along slowly.
Just recently I ported it from windows to linux with success.
I shall contain the whole feature set of the radiant editors.
Thats also the primary reason why I am working on a new scripting language. So I can submit plugins as byte code to extend the editors functionality

2. A FPS engine based on the content created with my editor.
Most of the rendering code will be done before even starting with the editor because I plan to use the same renderer in the editor as well.
The editor's base is up and running, pretty stable too.
Next thing would be to implement the command system:

- undo, redo, execute
- finalize the culling system
- implementing the first commands like brush creation, deletion..

A huge project for a single being with an open timeline
http://www.8ung.at/basiror/theironcross.html
Advertisement
Scripting languages are great for so many things, guys! You can use them to program AI, particle effects, animated stuff, and if their complex enough you can even use them to write tools (ie. modelers and level editors)!

I'm also developing a scripting language now, and it will be fairly simple (not type-safe, no explicit variable declarations), howeve,r it will be object-oriented.

I've found scripting languages to be generally easy to implement, unless:

1) You want functional programing (routine-based). This is how many scripting languages are, and it makes things slightly more difficult (expecially when compiling to bytecode).

2) You want object orientation. My scripting lang will have OOP, and I've found it to be fairly complex - especially if you're implementing it in a non-object oriented language!

3) You want complex expressions subgrouped in parentheses like so:

x = 5+6*((sin(x/y)*5)/1);

This is where the true hell begins creeping in. Unwinding these bad boys is harder than drilling through diamonds. And surely you will want to convert these into single instructions when compiling to bytecode. Differentiating between variables, functions, and constants can be a pain. And implementing the order of operations!?!? Tough. Go with Python if you want a more complex language.
Quote:Original post by ouraqt
3) You want complex expressions subgrouped in parentheses like so:

x = 5+6*((sin(x/y)*5)/1);

This is where the true hell begins creeping in. Unwinding these bad boys is harder than drilling through diamonds. And surely you will want to convert these into single instructions when compiling to bytecode. Differentiating between variables, functions, and constants can be a pain. And implementing the order of operations!?!? Tough.


This is where Spirit or Bison or similar will save your bacon. You let their parseing take care of all the de-nesting, etc. and then you run an 'Optimizer' over the generated bytecode (Basiccally just a function that replaces all operations between 2 constant values with the result).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Being a fairly low-level dude, I prefer to do the parsing myself to ensure that I know what's going on.

What I'm struggling with now is combining mathematical expressions and grouping with boolean expressions. For example:

bool x = IsCool||(Foo&&(Bar||Foobar)&&(x > 6/(5x+2)));

Despite the fact that no sane person would create something that ugly, I'd say a lexer that can successfully parse and unwind this expression is a piece of software that deserves a gold metal with a shiny trophy. Also, people have different coding styles, and some people may write ugly expressions like this (I have before) so the parsing ability is a requirement.

[Edited by - ouraqt on July 27, 2006 9:57:24 PM]
Quote:Original post by ouraqt
What I'm struggling with now is combining mathematical expressions and grouping with boolean expressions. For example:

bool x = IsCool||(Foo&&(Bar||Foobar)&&(x > 6/(5x+2)));

Despite the fact that no sane person would create something that ugly, I'd say a lexer that can successfully parse and unwind this expression is a piece of software that deserves a gold metal with a shiny trophy. Also, people have different coding styles, and some people may write ugly expressions like this (I have before) so the parsing ability is a requirement.


How are you structuring your parser?
I would recommend that you fully tokenize this before attempting to compile it, it is much easier to compile a token stream than a long string ;)

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:Original post by ouraqt
Being a fairly low-level dude, I prefer to do the parsing myself to ensure that I know what's going on.

What I'm struggling with now is combining mathematical expressions and grouping with boolean expressions. For example:

bool x = IsCool||(Foo&&(Bar||Foobar)&&(x > 6/(5x+2)));

Despite the fact that no sane person would create something that ugly, I'd say a lexer that can successfully parse and unwind this expression is a piece of software that deserves a gold metal with a shiny trophy. Also, people have different coding styles, and some people may write ugly expressions like this (I have before) so the parsing ability is a requirement.

I've written a couple of compilers that could handle a statement like that. One was based on flex and bison, the other was a recursive descent parser. Perhaps you are trying to do too much with your lexer? The job of a lexer is normally just to tokenise a string, not to make any sense out of it.

Another interesting way to handle precedence that I heard someone was using is to regex the expression and replace ? * ? with (? * ?) and so on in the correct order until all the brackets are in the right place. Although I wouldn't use this method, I thought it was a novel solution.

[Edited by - umbrae on July 29, 2006 8:31:59 PM]
I suppose I should write a tokenizer to parse the input stream before trying to make sense out of it. The problem is that I've written a whole bunch of code already! Darn, maybe I should start over.

First, let me make sure I understand the process of 'tokenizing'. So a tokenizer basically stores all the elements of an input source to a list? For example:

x = Direction/Speed;

So to the tokenizer they would look like this:

x
=
Direction
/
Speed
;

Is this correct? So basically it provides an easy way to parse the source without dealing with things like comments and whitespace. How does it deal with strings? And how does it differentiate between the operator "+=" and just a plus sign and an equal sign? Surely you wouldn't want an operator to span multiple tokens.

So basically, how much of the language syntax does the tokenizer need to know? I can see how this would make parsing the source much easier.
Actually the tokenizer would give you something more like this:

ID('x') EQUALS_SIGN ID('Direction') DIVIDE_SIGN ID('Speed') END_STATEMENT

I suggest learning about parsing as you will find a little theory makes what you're trying to do a whole lot easier.
ouraqt:

A tokeniser would 'look ahead' past the + and see the = sign and output an ADD_EQUALS token. A string literal would be tokenised the same as anything else, but could be (at that time) encoded into it's real characters. eg "string\n" would output a STRING token with the value of "string" and a newline. Getting to a real newline in the middle of a string would probably be an error.

So your tokeniser doesn't need to know any of your language syntax, just what type of tokens you have and how to parse them.
Another nice way to accomplish tokenisation is to write a custom tokenising function for boost::tokenizer.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement