Sign in to follow this  

Working control structures

Recommended Posts

Zotoaster    148
I'm having some trouble getting control structures to work in my scripting system, mainly if statements just now. My main idea was this: If I have a piece of code, like this: print "hello"; if(name == "Zotoaster"){ print "Cool name"; } print "done.."; Everytime you find an if, get whats inside the braces, and use that as the script, which means nested ifs would work too. It all works fine, apart from when printing what's after the if statement, in this case, the print "done.."; part. Anyone have an idea how to fix this, or have a better idea of how to work it? Thanks

Share this post

Link to post
Share on other sites
ToohrVyk    1596
This is usually done by making your "if" a statement. A typical grammar looks like this:

'statement' :=
| if ('condition') 'block'
| print 'text' ;

'block' :=
| 'statement'
| { 'statement*' }

Then, when executing the if-statement, you'd check the condition and insert the associated block before the remaining statements in the current block. Operational semantics would be:

exec('if(X) Y' :: next,S) =
if eval(X,S) then exec(Y :: next,S)
else exec(next,S)

Where :: is a concatenation operator, Y is the bound statement list, next is the statements to be executed after the current one, and S is the program state.

Share this post

Link to post
Share on other sites
Zotoaster    148
I don't quite understand that code there fully, but I think I have a rough idea of what you mean, but not enough to try and implement it. Do you mind giving a little more info? :)

Share this post

Link to post
Share on other sites
ToohrVyk    1596
Right. So, a few questions:

  • What's your language grammar?
  • How do you construct an AST from the source code?
  • What does your evaluation function look like?

Share this post

Link to post
Share on other sites
Zotoaster    148
Well, if by grammar you mean syntax, that it's very C like.

I looked up AST, and got abstract syntax tree.. To be honest, I don't know what that is.. I'm fairly new to this whole scripting business, sorry.

Evaluation as in.. maths evaluation? It's a whole class, and it involved the whole code. I can post it if you like :)

Share this post

Link to post
Share on other sites
ToohrVyk    1596
Right, so you really do not have a clue. I really advise that you get a book on programming language development. I'll write a quick primer for you, though.

The basic operations behind any programming language is as follows: the input file is read, character by character, and split up into tokens (characters or groups of characters which have particular meanings) by a lexer. The tokens are then transformed into an abstract syntax tree by a parser. This transform is described by means of a grammar which is a complete description of the language syntax.

Note — Some modern parsers are able to handle tokenizing themselves instead, and some complex languages (such as C++) must interpret the language at the same time to build the AST.

Once the abstract syntax tree exists, the compiler performs various static bindings (linking the usage and declaration of variables and functions) and checks (type-safety, existence of the objects), and then moves on to the transform part.

Note — Again, some languages don't do static binding at all (PHP, PERL, SH) while most others delay the actual binding another step (a dedicated linking step). Also, not all perform any checking whatsoever. On the contrary, some perform complete proofs about the language.

The transformation consists in taking the abstract syntax tree, altering it, and outputting an executable graph. This might require several successive transforms (moving from front-end to back-end to binary in GCC), or it might be immediate (for example, in λ-calculus, the AST is executed almost as-is). The result is the executable structure, which is then passed on to an evaluator.

The evaluator is either the processor (when the executable structure is machine code), a virtual machine (when the executable structure is bytecode), or an evaluator defined by means of operational semantics. Note that it's perfectly plausible to define an evaluator through operational semantics, and then implement it as a virtual machine, or by compiling the AST to machine code.

Now, this is the simple part. The first difficult part is providing a grammar which can handle what you want to do. The typical imperative language grammar considers blocks to be statements or lists of statements, and statements to be function calls, assignments, or control sequences (which may include blocks). So, the AST of your example would be as such:

+-STATEMENT-IF (name == Zotoaster)
| |
| |
| \-STATEMENT-PRINT "Cool name"

A typical operational semantics definition of such a language is to consider that you are executing a sequence of statements. This sequence is called, say, P (program). Every time you execute a statement, you alter the state of the program (its memory, its output, its input), which we'll call S (state). Then, the execution function here consists in telling the program what the final state is, given an initial state S and an initial program P.

So, here:

execute(program P, state S) is:

execute([],S) = S // no code left? return the state

execute([print x]::rest,S) =
execute(rest,x::S) // we "print x", thereby changing S,
// and then determine what the rest of the program does

execute([if(x) Y]::rest,S) =
if eval(x,S) // Looking at the state of the program,
// is the expression 'x' true or false?
then execute(Y::rest,S) // It's true: execute the if-statement code
// 'Y' before the rest of the code
else execute(rest,S) // It's false, don't bother with 'Y' and move on
// to the rest

Note that the execution function never forgets about 'rest', which is what code will be executed after the current statement, even if the current statement is an if-clause.

Now, once this is known, you translate the above execution system into whatever language and/or VM you wish to use.

Now, I go to sleep. I'll check again on this thread in 8 hours or so.

Share this post

Link to post
Share on other sites
Zotoaster    148
Oh, I really am far behind aren't I? O_o

Atleast I understand that syntax tree thing, which should help.

I admit this scripting system wont be the best around, but it's fun to try and make it heheh.

Share this post

Link to post
Share on other sites
ToohrVyk    1596
Well, it's not so much that you're far behind. Language semantics is an entire field of study which has been around for quite a few decades now, so people have naturally evolved tools and mindsets towards this.

Look at it this way. You don't use punch cards anymore to give orders, and your CPU doesn't respond with a 100dpi printer: your computer has a screen and keyboard and mouse, because these are simpler to use when communicating with the machine. You don't write all your code in assembly language anymore, you write most simple programs in medium-level or high-level languages because it's simpler to use and it saves time.

In the same way, language developers and semanticians have their own tools: lexer generators (because writing an equivalent lexer by hand takes years of experience and a couple of days), parser generators (because writing an equivalent parser takes years of experience and a couple of weeks), adapted tree manipulation languages (just because C++, C# or Java definitely won't cut it until they have access to pattern matching and/or garbage collection), as well as many human-spoken languages to communicate the various concepts (tokens, grammar, operational/denotational semantics).

The idea is this: where the average language can be written by hand in a month from scratch, these tools allow us to whip up a working prototype in half a day, which leaves us the rest of the month to think about the actually smart things (such as improving the speed of compiled code by 10% in addition to what the optimizer may do, or perhaps writing a program that proves your code doesn't contain any bugs, or allowing the programmer to express 1500 lines of C in 10 lines of the prototype language, or allowing the code to be distributed to a hundred machines at compile time, and so on). Besides, the tools are so flexible, we can change the grammar in five seconds tops. Which is why, although I consider a fine occupation to reinvent these wheels, I usually advise people to use them if they intend to get work done quick.

So, let's consider your example language, and see how it would fit. This is more of an untested outline, but the idea is here. First, you seem to want a language that has a C-like syntax, and we'll add a few concepts of our own:

  • Imperative language: functions are sequences of statements. Statements are either control structures (if, while, return), function calls, printing, reading, or assigning a variable a value.

  • Our values are NULL, strings, integers and arrays. We'll go for a PHP-like approach where arrays are in fact hash tables, and conversion between strings and integers is automatic. Variables can be used before being initialized (they're initially NULL).

  • We use no overloading (since we have a single type) and I'm too lazy to do argument-count-based overload.

  • The language is interpreted by a virtual machine. It is not type-checked at all.

So, the first step is to decide which grammar we'll use. The typical grammar is split into three sections: expressions (things which have a value), statements (things which perform actions) and definitions (things which explain what a function is). Here, I'll use Menhir, which is a very nice parser generator for OCaml, to describe the grammar. The idea is that I define non-terminals to represent each of the "entities" that can appear in the language, using other non-terminals (for instance, an expression can be an expression plus another expression) and terminals (the symbols read from the file, also called tokens).

// --- First, I describe the tokens I'll use...

// Tokens which are not associated with any value
TIf // 'if'
TWhile // 'while'
TFunc // 'function'
TSet // '='
TReturn // 'return'
TPrint // 'print'
TRead // 'read'
TBBlock // '{'
TSBlock // '}'
TBParen // '('
TEParen // ')'
TBSubs // '['
TESubs // ']'
TSColon // ';'
TComma // ','
TNull // 'null'
TAdd TMin TMul TDiv // + - * /
TAnd TOr TNot // && || !
TEqu TGt TLt TGet TLet TNeq // == > < >= <= !=
TEof // End of file

// Tokens which are bound to a string value
TIdent // Identifier (foobar,x,Value)
TString // String literal "hello"

// Tokens which are bound to an integer value
TInt // Integer literal 3

// --- Then, I describe the priorities of the operators

%left TMul TDiv // highest
%left TAdd TMin
%left TEqu TGt TLt TGet TLet TNeq
%unary TNot
%left TAnd
%left TOr // lowest

// --- Finally, the non-terminal that corresponds to a program

%start program


// --- I now describe how the non-terminals are defined.
// Between braces at the end of each possible definition is the
// OCaml code executed when this definition is encountered.

// A program is a list of function definitions
| defs = list(definition) TEof { AST.register (defs) }

// A definition is "function", a name, an argument list and a body
| TFunc
name = TIdent
args = separated_list(TIdent,TComma)
body = block { AST.make_function (name,args,body) }

// A block is either a single statement, or a list of statements
// between braces
| x = statement { [x] }
| TBBrace
statements = statement*
TEBrace { statements }

// A statement can be an assignment, reading, printing, or
// a control sequence (or an expression with ignored value)
| into = assignable TSet valu = expression TSColon
{ AST.make_assign (into,valu) }
| TRead into = assignable TSColon
{ AST.make_read (into) }
| TPrint valu = expression TSColon
{ AST.make_print (valu) }
| TIf
TBParen cond = expression TEParen
exec = block
{ AST.make_if (cond,exec) }
| TWhile
TBParen cond = expression TEParen
exec = block
{ AST.make_while (cond,exec) }
| TReturn TSColon
{ AST.return_nothing }
| TReturn valu = expression TSColon
{ AST.return (valu) }
| expr = expression
{ AST.eval (expr) }

// An assignable expression is either a variable, or
// a member of an assignable expression
| name = TIdent
{ AST.make_assignable_var (name) }
| array = assignable
TBSubs index = expression TESubs
{ AST.make_assignable_sub (array,index) }

// An expression works by prioritizing: first, we group together
// literals, calls and parantheses as "simple" expressions
| TBParen expr = expression TEParen
{ expr }
| valu = TInt
{ AST.int_literal (valu) }
| valu = TString
{ AST.string_literal (valu) }
| TNull
{ AST.null }
| name = TIdent
TBParen args = separated_list(expression,TComma) TEParen
{ AST.make_call (name,args) }

// Then, using simple expressions, we apply subscripting
| same = simpleExpression {same}
| array = applyExpression
TBSubs index = expression TESubs
{ AST.make_sub (array,index) }

// Then, once we have these, we can apply our operators
// happily
| same = applyExpression {same}
| a = expression
op = binary
b = expression
{ AST.make_operation2 (a,op,b) }
| TNot e = expression
{ AST.make_operation1 ("!",e) }

%inline binary:
| TAdd { "+" }
| TMul { "*" }
| TSub { "-" }
| TDiv { "/" }
| TGt { ">" }
| TLt { "<" }
| TGet { ">=" }
| TLet { "<=" }
| TEqu { "==" }
| TNeq { "!=" }
| TAnd { "&&" }
| TOr { "||" }


// --- There we go, this should describe the grammar nicely.

Sure, there are several niceties missing above, because I'm short on time (for instance, we don't have an unary minus, or an 'else' clause, or a 'break' statement, or a 'for' loop, or a 'switch' statement, oh well). But these are quickly added with minor alteration of the grammar. Also, I might have overlooked some stupid mistakes, but well, that happens.

Now, the next step is writing a lexer, which we'll plug into the parser. Ideally, I'll be using ocamllex, which is a lexer generator for OCaml:

// --- We define the different tokens as regular expressions
// Between braces, the associated token

rule token =
// Ignore spaces and comments
| '\t'
| ' '
| '\n'
| "//" [^ '\n']
{ token lexbuf }

// An integer literal
| '-' ? [ '0' - '9' ] + as int
{ Parser.TInt (int_of_string int) }

// A string literal
| '"' ( ([^ '"' '\\'] | '\\' _) * as lit ) '"'
{ Parser.TString (lit) }

// An identifier
| [ 'a'-'z' 'A'-'Z' '_' ] ['a'-'z' 'A'-'Z' '0'-'9' '_']* as id
{ Parser.TIdent (id) }

// Various keywords
| ";" { Parser.TSColon }
| "," { Parser.TComma }
| "if" { Parser.TIf }
| "while" { Parser.TWhile }
| "function" { Parser.TFunc }
| "(" { Parser.TSParen }
| ")" { Parser.TEParen }
| "[" { Parser.TSSub }
| "]" { Parser.TESub }
| "{" { Parser.TSBlock }
| "}" { Parser.TEBlock }
| "+" { Parser.TAdd }
| "-" { Parser.TSub }
| "*" { Parser.TMul }
| "/" { Parser.TDiv }
| "==" { Parser.TEqu }
| "!=" { Parser.TNeq }
| "<=" { Parser.TLet }
| ">=" { Parser.TGet }
| "<" { Parser.TLt }
| ">" { Parser.TGt }
| "=" { Parser.TSet }
| "print" { Parser.TPrint }
| "read" { Parser.TRead }
| "return" { Parser.TReturn }
| "null" { Parser.TNull }

// End of file
| eof { Parser.TEof }

// An error!
| _ { failwith "Lexing error" }

This lexer should transform happily any input file (or standard input) into a sequence of parser-recognized tokens, which we then plug into the parser to construct our AST! So, there we go: a few hundred lines of description and we've got a near-optimal lexer and parser.

Now, the fun part is defining the AST. We'll actually define a few distinct parts: statements, assignable expressions, and expressions. These work differently: statements are executed, assignable expressions are modified and expressions are evaluated. Our abstract tree should reflect this. I'll also write the generation functions which are used by the parser, so everything works.

(* What our values are *)
type value =
| Val_Null
| Val_Array of (value,value) Hashtbl.t
| Val_String of string
| Val_Int of int

(* Our tree definition! *)
type statement =
| Set of assignable * expression
| Return of expression
| If of expression * (statement list)
| While of expression * (statement list)
| Eval of expression
| Print of expression
| Read of assignable
and assignable =
| AVariable of string
| AIndexed of assignable * expression
and expression =
| Variable of string
| Literal of value
| Call of string * (expression list)
| Binary of (value -> value -> value) * expression * expression
| Unary of (value -> value) * expression
| Indexed of expression * expression

type function =
name : string;
args : string list;
body : statement list;

(* Our building functions *)

let make_function (name,args,body) =
{ name = name; args = args; body = body }

let make_assign (into,value) = Set (into,value)
let make_read (into) = Read (into)
let make_print (value) = Print (value)
let make_if (cond,exec) = If (cond,exec)
let make_while (cond,exec) = While (cond,exec)
let return_nothing = Return (Literal Val_Null)
let return (valu) = Return (valu)
let eval (expr) = Eval (expr)
let make_assignable_var (n) = AVariable (n)
let make_assignable_sub (e,i) = AIndexed (e,i)
let int_literal (i) = Literal (Val_Int i)
let string_literal (s) = Literal (Val_String s)
let null = Literal (Val_Null)
let make_call (f,arg) = Call (f,arg)
let make_sub (e,i) = Indexed (e,i)

let make_operation2 (a,op,b) =
Binary ((func_of_op2 op),a,b)

let make_operation1 (op,e) =
Unary ((func_of_op1 op),e)

(* We have a list of loaded functions, which is accessible
by name. Registering the functions consists in binding
them to their own names. *)
let functions = Hashtbl.create 10

let register =
(fun f ->
ignore (Hashtbl.find functions;
failwith ("Function '"^^"' defined twice!");
with Not_found ->
Hashtbl.add functions f)

let get_function (name) =
Hashtbl.find functions name
with Not_found ->
failwith "Could not find function '"^name^"'!"

Now, our AST is up and running, and can be generated in a breeze simply by launching the parser on a token stream. We can start defining operational semantics for the language. We basically want to implemnt two functions: eval and exec. Eval reads in an expression, and returns the value of that expression. Exec runs a list of statements, altering the world along the way, and returns the value returned by the function either when a "return" is encountered, or when it runs out of statements (implicit 'return;'). First, I'll assume for the sake of the eval function that we have access to a 'call(f,args)' function which calls a function with arguments and returns the return value. So, let's write that function:

(* First, we implement the operators. *)
let do_op2 ~op_n ~op_ii ~op_is ~op_ss = fun

(* All null values *)
| Val_Null o | o Val_Null -> op_n o

(* All array values *)
| (Val_Array _) _ -> | _ (Val_Array _) -> Val_Null

(* The rest *)
| (Val_Int a) (Val_Int b) -> op_ii a b
| (Val_Int a) (Val_array _) -> op_ii 0 a
| (Val_Int a) (Val_String b)
| (Val_string b) (Val_int a) -> op_is a b
| (Val_String a) (Val_String b) -> op_ss a b

let null1 = fun _ -> Val_Null
let null2 = fun _ _ -> Val_Null
let of_bool (b) = if b then Val_Int 1 else Val_Null

let op_add =
~op_n = null1
~op_ii = (fun a b -> Val_Int (a+b) )
~op_ss = (fun a b -> Val_String (a^b) )
~op_is = (fun i s ->
if s = "" then Val_string (string_of_int i)
else try Val_Int (int_of_string s + i) with _ -> (Val_int i)

let op_sub =
~op_n = null1
~op_ii = (fun a b -> Val_Int (a-b))
~op_ss = null2
~op_is = null2

let op_mul =
~op_n = null1
~op_ii = (fun a b -> Val_Int (a*b))
~op_ss = null2
~op_is = null2

let op_div =
~op_n = null1
~op_ii = (fun a b -> Val_Int (a/b))
~op_ss = null2
~op_is = null2

let op_and =
~op_n = null1
~op_ii = (fun a b -> of_bool (not (a*b = 0)))
~op_ss = (fun a b -> of_bool (not (a^b = "")))
~op_is = (fun i s -> of_bool (not (a = 0 || b = "")))

let op_or =
~op_n = (fun o -> op_and o o)
~op_ii = (fun a b -> of_bool (a*b <> 0))
~op_ss = (fun a b -> of_bool (a^b <> ""))
~op_is = (fun i s -> of_bool (not (a = 0 && b = "")))

let op_equ = fun i j -> of_bool (i = j)
let op_neq = fun i j -> of_bool (i <> j)

let comp (c) =
~op_n = null1
~op_ii = (fun i j -> of_bool (c i j))
~op_ss = (fun i j -> of_bool (c i j))
~op_is = null2

let op_lt = comp (<)
let op_gt = comp (>)
let op_let = comp (>=)
let op_get = comp (<=)

let op_not x = op_neq (op_and x x) Val_Null

(* These operators are automatically inserted into the AST by the
binary operation, which calls func_of_op* *)

let func_of_op1 = function
| "!" -> op_not
| op -> failwith ("Unknown unary operator '"^op^"'!")

let func_of_op2 = function
| "+" -> op_add
| "-" -> op_sub
| "*" -> op_mul
| "/" -> op_div
| "&&" -> op_and
| "||" -> op_or
| "==" -> op_equ
| "!=" -> op_neq
| "<" -> op_lt
| ">" -> op_gt
| ">=" -> op_get
| "<=" -> op_let
| op -> failwith ("Unknown binary operator '"^op^"'!")

(* For the sake of it, finding a variable in a
(name|value)-variable association *)

let find_var (vars) (name) =
Hashtbl.find vars name
with Not_found -> Var_Null

(* Finally, we implement the evaluation function. Vars is a
hashtable that binds variables to values. *)

let rec eval (vars) = function
| Variable name -> find_var vars name
| Literal v -> v
| Call (func,args) -> call func ( (eval vars) args)
| Binary (f,a,b) -> f (eval vars a) (eval vars b)
| Unary (f,a) -> f (eval vars a)
| Indexed (a,i) ->
begin match (eval vars a) with
| Val_Array a -> find_var a (eval vars b)
| _ -> ignore (eval vars b); Var_Null

The choices of interaction are fairly arbitrary (I decided that an array
automatically turned into a 'null' whenever it was involved in an
operation, except for equality) but the functions are there and can be
amended to perform whatever you wish. The point is, describing the
operations in a loosely typed language such as this one is necessarily
a long deal, since there are N² combinations to consider for N types.

Once the evaluation function is done, we can write a mutually recursive
execution function. The execution function concerns itself
with one call only. It keeps a list of stack positions, represented
as the code left to be executed within each block it is present in.
It then pops an element from the topmost block, executes it, and moves
on to the rest (evaluating objects in the process, and altering the
memory state if necessary).

(* Execute a stack of operations given the name-value
association 'vars' *)
and exec (vars) = function
| [] -> Val_Null (* Nothing left to execute *)
| []::above -> exec vars above (* Block finished, move up one. *)
| (h::t)::above ->
(* We still have commands here: h is the next command, t is the rest
of the commands in this block, above is the blocks above this one. *)
begin match h with
| Set (a,e) -> assign vars a (eval vars e); eexec vars (t::above)
| Return e -> eval vars e
| If (e,l) ->
if (op_and (eval vars e) <> Val_Null)
then exec vars (l::t::above)
else exec (t::above)
| While (e,l) ->
if (op_and (eval vars e) <> Val_Null)
then exec vars (l::(h::t)::above)
else exec (t::above)
| Eval e -> ignore (eval vars e); exec vars (t::above)
| Print e -> print_val (eval vars e); exec vars (t::above)
| Read a -> assign vars a (read_val ()); exec vars (t::above)

(* Call a function given by a name *)
and call (func) (args) =

(* Try to find the function by name *)
let f = get_function func in

(* Create an environment from the argument list *)
let environment = Hashtbl.create 10 in
(Hashtbl.add environment)

(* Execute the function *)
exec environment [f.body];;

I'll leave to you to choose how to implement print_val and read_val, and I'll concentrate on the assignment function instead. The objective of this function is to alter the variable or object described by its second argument (as part of the environment passed as first variable) and set its value to its third argument. Since we want the left-hand object to be created (if it's an array-of-array-of-array-of-...) we'll write a function which builds the object (if it doesn't exist) and returns the current value and a function to modify it.

and assign vars target value =
let rec helper = function

| AVariable v ->
(* Return a function which performs the modification *)
(find_var vars v),
(* Value modification *)
(fun x -> Hashtbl.remove vars v; Hashtbl.add vars v x)

| AIndexed (a,i) ->
(* Evaluate the index *)
let i = eval vars i in
(* Build whatever is below *)
let (value,modify) = helper a in
begin match value with
(* Value is alredy an array, so keep it and add to it! *)
| Val_array a ->
(find_var a i),(fun x -> Hashtbl.remove a i; Hashtbl.add a i x)

(* Value is not an array, we need to index into it so we alter
it. *)
| _ -> let a = Hashtbl.create 10 in
modify (Val_array a);
(Val_Null),(fun x -> Hashtbl.add a i x)

(* Retrieve the modification function and call it on the value. *)
snd (helper target) value

This should probably work. It'd probably take one more hour to correct all the various syntax errors, typos and typical mistakes (the code isn't tested), as well as binding code to execute the main function on startup and the usual command line arguments to specify which script to load. Perhaps adding a subscripting option for strings might be a good idea as well. Total time: three hours. Note that the values are passed by reference, but only arrays are mutable, so you can pass a mutable reference by wrapping it in an array, as such:

function alter(foo)
foo[0] = "Ha!";

function main()
foo[0] = null;
print foo[0]; // Prints "Ha!"

print bar[0]; // Prints "null", since bar wasn't an array when it was passed to 'alter'

Share this post

Link to post
Share on other sites
Zotoaster    148
Wow. That's alot of stuff. I really appreciate the help. I'm going to properly read over this :)

p.s. Is any of it actually coded in C++ ? :P

Share this post

Link to post
Share on other sites
ToohrVyk    1596
p.s. Is any of it actually coded in C++ ? :P

C++ is a pretty unexpressive, verbose and slow language to do these things in. So no, I'd rather not [wink]

Share this post

Link to post
Share on other sites
Zotoaster    148
Fair enough. I just happen to be a guy that doesn't like everything done for him. It doesn't feel right if I don't make it from scratch, otherwise I would just use Lua or Angelscript or something, which both are probably better than what this will end up like :P

To be honest, I don't even know what I'm going to do with it :p, probably make version 2, then 3, then use it to make another scripting system, then version 2, then 3, etc ^^

Share this post

Link to post
Share on other sites
Ahnfelt    176
That was a pretty nice primer ToohrVyk. Somebody sticky this!

I agree with pretty much everything ToohrVyk said, but I want to emphasise one of his many points: Functional languages are much better suited for this kind of task, and serveral ML variants (like OCaml and Standard ML) have very nice lexer and parser generators. You'd be doing yourself a huge favor by learning one of these, and not just for scripting languages, but for many tasks in general.

I've been reinventing the scripting language wheel a lot in the past, and I found it quite amusing. But if you want results, use lexer and parser generators and a functional language :) Go get 'em!

Share this post

Link to post
Share on other sites
wodinoneeye    1689
Original post by Zotoaster
I'm having some trouble getting control structures to work in my scripting system, mainly if statements just now.

My main idea was this: If I have a piece of code, like this:

print "hello"; if(name == "Zotoaster"){ print "Cool name"; } print "done..";

Everytime you find an if, get whats inside the braces, and use that as the script, which means nested ifs would work too. It all works fine, apart from when printing what's after the if statement, in this case, the print "done.."; part.

Anyone have an idea how to fix this, or have a better idea of how to work it?


You have to recursively scan the code and when you pop out of each level resume at the correct context. You identify the 'if' statement and scan first for the 'test' value and then a statement block and then for the optional 'else' block. Not finding an 'else' you resume scanning in the context which the entire 'if' statement existed in (function/program level or whatever that line is inside of). The print "done.."; would then execute.

For some reason it sounds like you were skipping to the end of line and not scanning the print "done.."; part correctly.

Share this post

Link to post
Share on other sites
Zotoaster    148
Well it works fine now. (Forgot about else :P I'll be sure to add that in).

My idea sorta follows that tree idea, istead I wouldn't call it a tree exactly.

For example, this code:

print "hello";
if((name == "Zotoaster")&&(age = X))
print "I think I know you";
if (1==1)
print "well done genius...";
print "heh";
print "done";

...could be formatted in a similar way to ToohrVyk's post.
Like this:

0 - if - (name == Zotoaster)
1 - print - "I think I know you"
1 - if - (1==1)
2 - print - "well done genius"
1 - print - "heh"
0 - print - "done"

So if you look at the numbers at the left, they basically reflect how you could indent. When a { is found, the number goes up, and when a } is found, the number goes down. This means that if the current statement is an "if", it can check it's correspoding data (like an equation). If the data returns 1 (true), it will keep executing the next statements. If it returns 0, it will pass through all the statements but they wont be executed, until the number on the left is back to the number it was at when the if statement was found.

This seems to give me incredible flexibility, even if it's not how it's supposed to be done. I can use it for ifs, loops, probably functions, and if I ever get that far, classes too. I like the idea :)

The only problems I'm having now is getting my maths parser finished. It doesn't seem to like the ! operator, and functions dont seem to be stable enough yet, but it shouldn't take too long to figure it out.

Share this post

Link to post
Share on other sites
Kylotan    10008
Yes, I've used the 'nesting depth' method for simple parsing of control structure blocks, and it works well. Make sure you test it with combinations of ifs with else blocks and ifs without else blocks, and with multiple if blocks inside another, as you'll need to ensure it doesn't skip or execute the wrong ones.

All you're essentially doing is exploiting the fact that the syntax tree is laid out linearly, which is a reasonable approach as long as you keep a clear idea of what exactly you're doing, and how that block depth value relates to the implicit tree structure of the code.

Share this post

Link to post
Share on other sites
Zotoaster    148
It seems to work well. The if statements can be nested, and you can basically put them anywhere. On a side note, they also seem to repond really well to some really complex conditions, like "if((3 == 2) || ((3 < 5) && ((5 == (10/2)) || ((2+2) > 5) )))" :) I am a happy chap now heh.

Since this post is still active, anyone know how I could implement functions in RPN?

The basic idea is that is you have this "3 * cos( sin( 0 ) )" (the result being 3), the RPN would look like this:


But it doesn't seem to work.
The Shunting Yard Algorithm is on Wikipedia ( but I cant seem to understand anything when it mentions functions :S Anyone know about this?

Thanks in advance :)

Share this post

Link to post
Share on other sites
Kylotan    10008
Er, why have you gone from comparing if-statement blocks to expression parsing? The two have some things in common but are completely different in how you execute them.

There's nothing complicated about functions; they work exactly the way operators do.

In your example, '*' is an operator that takes the 2 previous values and returns 1. And cos and sin are effectively operators that take 1 previous value and return 1. I see no reason why what you've posted wouldn't work just fine.

Share this post

Link to post
Share on other sites
Zotoaster    148
Nah it was just while the thread was active I thought I'd post my question instead of starting a new one.

I don't have a problem with actually solving functions once they are in RPN form, it's just converting them to RPN form that is getting to me.

Share this post

Link to post
Share on other sites
Kylotan    10008
Just do exactly what it says.

- If the token is a left parenthesis, then push it onto the stack.

I know you can do this.

- If the token is a function argument separator (e.g., a comma), until the topmost element of the stack is a left parenthesis, pop the element onto the output queue

Surely simple enough - keep popping stack to queue until you reach a parenthesis on the top of the stack, and don't pop that.

- If the token is a right parenthesis, then pop operators off the stack, onto the output queue, until the token at the top of the stack is a left parenthesis, at which point it is popped off the stack but not added to the output queue. At this point, if the token at the top of the stack is a function token, pop it too onto the output queue.

Exactly the same, except this time you pop the parenthesis too, ignore it, then pop the function name onto the queue.

Share this post

Link to post
Share on other sites
Zotoaster    148
Annnddd..... it works!!

I had a minor problem when doing things like "add(1,2)+1", as the algorithm didn't mention that if you find an operator, and the item at the top of the stack is a function, it has a higher precedence, but by luck I managed to figure it out myself, cos I'm cool :)

This is an example of my finished (just about) maths parser's capabilities:
(add( sub(6,4),3 ) + ( ((2+2) == 4) || (3 < 2 ) ) ) && (add(3,4) > ((10 / 2)*(5==5)))

It looks like the kinda thing that coding it out would be helish, but RPN seems to be a life saver :)

Share this post

Link to post
Share on other sites
Kylotan    10008
I think you're overcomplicating matters; the algorithm as it stands will evaluate a function's arguments before any operators that operate on the result of the function. This is because the right-parenthesis rule pops the entire function and its arguments off the stack in one go. By the time you find that operator in your example, the whole function should already be in the output queue. There should be no function on the stack.

Share this post

Link to post
Share on other sites
Zotoaster    148
I don't think anything is wrong with it. It's result should be 4. This is the output queue that I got when I tested it: 1 2 add 1 +

And if I try that out:

1 1
2 1 2
add 3
1 3 1
+ 4

Works good.

Share this post

Link to post
Share on other sites
Kylotan    10008
Just because it gives you the right answer, doesn't mean you've implemented the algorithm correctly. If you had "2+2" and used multiplication instead of addition, that would "work good" too, even though the algorithm is wrong.

As I said, by the time you get to the operator after the function, there should be no function on the stack, and therefore no need at all to worry about precedence. If you're having to use precedence rules to compare the operator to a function on the stack, then you haven't implemented the algorithm as it was documented.

Share this post

Link to post
Share on other sites
Zotoaster    148
I'm not entirelly sure what happened with the algorithm. All I did was add "||(cur.type == FUNCC)" and it worked fine.

I tested that expression out, and you were right:


add add
( ( add
1 1 ( add
, 1 ( add
2 1 2 ( add
) 1 2 add
+ 1 2 +
1 1 2 add 1 +

I don't know what happened to the algorithm, but, aslong as it works, and it's robust, I don't have a problem with it :)

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this