Creating a Scripting Language

Started by
4 comments, last by Atrix256 14 years ago
Hey, I am planning on making a scripting language in my free time. Below is the BNF I came up with it. I just wanted to get some feedback on things I might be missing, things I could do better, any problems I'm setting myself up for, etc. Thanks!

***** DEFINITIONS 
definition	=> <functionDef> | <objDef> | <varDef>
functionDef	=> (<dataType> | "void") <identifier> "(" (<dataType> <identifier> (, <dataType> <identifier>)*)? ")" <block>
objDef		=> "object" <identifier> "{" <varDef>* "}"
varDef		=> <dataType> <identifier> ("=" <expression>)? ("," <identifier> ("=" <expression>)?)* ";"
dataType        => ("string" | "number" | <identifier>) ("[]")?

***** STATEMENTS 
statement	=>(("case" (<string>|<number>) | "default") ":")*
			| <expression> ";"
			| <block>
			| "if" "(" <expression> ")" <statement> ("else" <statement>)?
			| "switch" "(" <expression> ")" <statement>
			| "while" "(" <expression> ")" <statement>
			| "do" <statement> "until" "(" <expression> ")"
			| "repeat" "(" <expression> ")" <statement>
			| "for" "(" <expression>? ";" <expression>? ";" <expression>? ")" <statement>
			| "continue" ";"
			| "break" ";"
			| "return" <expression> ";"

block		=> "{" (<varDef> | <statement>)* "}"

***** EXPRESSIONS 
expression	=> (<unary> (<assignOp>))* <ternary>
ternary		=> <or> ("?" <expression> ":" <expression>)?
or		=> <and> ("||" <and>)*
and		=> <equality> ("&&" <equality>)*
equality	=> <relational> (<equalOp> <relational>)*
relational	=> <additive> (<relatOp> <additive>)*
additive	=> <multiplicative> (<addOp> <multiplicative>)*
multiplicative	=> <unary> (<multOp> <unary>)*
unary		=> <prefixOp>? (<identifier> | <string> | <number> | <funcCall> | "(" <expression> ")")
							("[" <expression> "]" | "." (<identifier> | <funcCall>) | <postfixOp>)*
funcCall	=> <identifier> "(" (<expression> ("," <expression>)*)? ")"

***** TERMINALS 
string		=> '"' ([!'"'] | <escapeSeq>)* '"'
escapeSeq	=> [!"\"]("\\" | '\"')
number		=> [0-9]* .? [0-9]+
assignOp	=> "="  | "+=" | "-=" | "*=" | "/="
equalOp		=> "==" | "!="
relatOp		=> "<"  | "<=" | ">"  | ">="
addOp		=> "+"  | "-"
multOp		=> "*"  | "/"  | "%"  | "div"
prefixOp	=> "-"  | "!"
postfixOp	=> "++" | "--"
identifier	=> [a-zA-Z_][a-zA-Z_0-9]*
whitespace	=> " " | "\n" | "\t" | "//" [!"\n"]* "\n" | "/*" [!"*/"]* "*/"

***** Changes
<3/30/10>
- Changed dataType to: ("string" | "number" | <identifier>) ("[]")?
			from ("string" | "number") ("[]")?
- Renamed "struct" to "object" and <structDef> to <objDef>
- Changed block to: "{" (<varDef> | <statement>)* "}"
			from "{" <varDef>* <statement>* "}"
- Renamed <conditional> to <ternary>
- Changed ternary to: <or> ("?" <expression> ":" <expression>)?
			from <or> ("?" <expression> ":" <ternary>)?
- Changed funcDef to: (<dataType> | "void") <identifier> ...
			from <dataType> <identifier> ...
- Changed string to: '"' ([!'"'] | <escapeSeq>)* '"'
			from '"' [!'"']* '"'
- Added escapeSeq => [!"\"]("\\" | '\"')
- Changed prefixOp to: "-"  | "!"
			from "++" | "--" | "-"  | "!"
[Edited by - dangerdan9631 on March 30, 2010 7:38:11 PM]
Advertisement
I have mostly critiques, but don't become discouraged ;)

BTW: I think that showing a syntax description is just one thing.

From the syntax I can see that
* functions are not 1st class values,
* function nesting isn't supported,
* hence I assume that closures are not possible,
* co-routines are not supported.

The missing features enumerated above are typically loved by scripters, so to say.

I can't examine the exact type system. I see the types string, number, arrays of the former types, and structs. Is the type system weak or strict? Does type coercion happen in some way?

It seems me that structs cannot be nested. Is this correct?

Does call-by-reference or call-by-value happen for strings, arrays, and/or structs? I'm not sure, but it seems me that functions must return a string or number or array; no "void" functions allowed?

Nothing is said about the integration with the application.

Nothing is said about execution (although that need not necessarily be part of a language specification): E.g. source interpretation, bytecode interpretation, or what?

Nothing is said about supporting packages/libraries (although again not necessarily part of a language specification).


Besides that (and issues I've not seen yet), using a syntax similar to C may be well accepted by programmers, and non-programmers may already have seen it once or twice, too.

[Edited by - haegarr on March 30, 2010 9:44:23 AM]
Thanks a lot! After looking at what you wrote I have changed some things:
redefined dataType as:
     dataType => ("string" | "number" | <identifier>) ("[]")?

Where Identifier is used to look up the name of a struct

I also renamed "struct" to "object" as that seems a little less "c" like and sounds a little more high level. Purely an aesthetic choice.


Quote:* functions are not 1st class values,
* function nesting isn't supported,
* hence I assume that closures are not possible,
* co-routines are not supported.


I chose not to do function nesting and co-routines because I felt that those might be too complicated for someone that might be using this (Yes, this is just for fun, but if it does get used it will be by people who have never programmed in their life, and thus wouldn't even know when to use a nested function or coroutine.) Also, this is my first attempt at this, so I figured I could keep it a little simpler by not including those. But maybe in a later iteration.

As for making functions 1st class values, that might actually be useful. I'll have to think about that and how I can add it in.

Quote:I can't examine the exact type system. I see the types string, number, arrays of the former types, and structs. Is the type system weak or strict? Does type coercion happen in some way?


The three types would be string, number, and object. All type checking would be done at compile time. The only conversion that would be allowed is an implicit conversion of number to string for the purpose of concatenation.

Ex.
number a = 1, b = 2, c = 3;string position = "(" + a + ", " + b + ", " + c + ")";

would assign "(1, 2, 3)" to position

Quote:It seems me that structs cannot be nested. Is this correct?


Again for the sake of simplicity, no they cannot. However I did change my datatype definition to allow for recursive struct definitions, which were not possible before.

Quote:Nothing is said about the integration with the application.

Nothing is said about execution (although that need not necessarily be part of a language specification): E.g. source interpretation, bytecode interpretation, or what?

Nothing is said about supporting packages/libraries (although again not necessarily part of a language specification).


I guess I probably should go over some of that. I am programming it in c#, I am working on a windows form editor, where you can write/save/run the program and test its value. When you run it here it uses source interpretation. However, to integrate it with other things, the plan was to translate it into c#, and then use msbuild to compile it into a .dll.

Maybe this isn't the best way to do things, but both of those are things that I have wanted to mess around with for a while, so I figured I'd try to kill two birds.

For supporting packages/libraries, I was planning on just using a #include "filename" directive. That would just insert the text of the file at that spot in the code.


Thanks for the comments! That was exactly what I was looking for!
Woops, forgot to reply to this part

Quote:Does call-by-reference or call-by-value happen for strings, arrays, and/or structs? I'm not sure, but it seems me that functions must return a string or number or array; no "void" functions allowed?


Good point, perhaps I could redefine functionDef as
functionDef => (<dataType> | "void") <identifier> ...


hmmm... That seems a little hacky. The other option would be to just add "void" to datatypes, and then do checking to make sure no "void" variables are declared? hmmm... I'll have to think about that some more when I get a chance.
Can you give examples of short programs written in your language? Looking at BNF is OK I guess, but whenever I'm trying to come up with a language, I just start writing stuff in it and see how it looks.

In lieu of 1st class functions, function pointers/references might be useful so you can at least do callbacks. Or put them in a struct and make a lightweight object.
Making a scripting language is fun, and cool, and a great learning process.

However!

It's a lot of work, and in the end you (probably!) won't have as good a thing as other scripting languages already out there.

If you get tired of making your language and decide you want to start focusing more on your GAME, check out Lua. It's easy to use, open sourced and benchmarks show it's one of the fastest scripting language out there.

I have it integrated into my current project and it's great.

World of warcraft also uses Lua for their UI system, so you can rely on it being a stable, mature language as well.

My 2 cents for ya!

This topic is closed to new replies.

Advertisement