• entries
    743
  • comments
    1924
  • views
    580126

Control Structures in Om

Sign in to follow this  
Aardvajk

1426 views

Control Structures in Om

I've just finished up the basic control structures for the Om language and virtual machine, namely while, if/else and two forms of for loop. The second form of the for loop, what we'll call the for-each, is a little bit more interesting but I'll explain how the normal structures work first.

The control structures are all based around two virtual machine instructions that allow for dynamic branching in the execution of the VM. We have [font='courier new']OpCode::Jmp[/font] and [font='courier new']OpCode::JmpZ[/font], both of which have an unsigned int parameter following them in the bytecode which is the target for the program counter to be set to after the operation.

[font='courier new']OpCode::Jmp[/font] is an absolute jump, so simply sets the program counter to its parameter value. [font='courier new']OpCode::JmpZ[/font] is the real workhorse of the control structure system, as it only sets the program counter to its parameter if the current value on the top of the stack is false, when run through the [font='courier new']typedValueToBool[/font] method.

This method looks like this:
bool typedValueToBool(const TypedValue &v){ switch(v.type()) { case Om::Null: return false; case Om::Int: return v.toInt() ? true : false; case Om::Float: return v.toFloat() ? true : false; case Om::Bool: return v.toBool(); default: return true; }}Since [font='courier new']null[/font] is a special type in Om, any of the reference types, [font='courier new']Om::Object[/font], [font='courier new']Om::Array[/font], [font='courier new']Om::String[/font] and [font='courier new']Om::Function[/font] are always considered to be true when handled as a boolean. Because the type information is bound up to the value rather than the variable in this language, it is not possible to have a value that is both, say, an [font='courier new']Om::Function[/font] and also [font='courier new']null[/font], since [font='courier new']Om::Null[/font] is a distinct type. For example:
var f = func { };var g = "hello";if(f) do_something();f = null;if(!f) do_something_else();There is a bit of a complication with the control structures in that I decided I wanted to support the declaration of a variable as part of the condition, as per C++, as it is a very nice feature for a language because you can avoid polluting the enclosing scope with variables that will only be referenced in the loop.
var a = 10;while(var n = a){ do_something_with_n(n); --a;}This is a bad example, but you're all familiar with how useful this is in practice.

To deal with this, we've first of all had to add some methods to the compiler to allow it to permit a [font='courier new']var[/font] statement as well as an expression when grabbing the expression. Excuse my terse naming on the [font='courier new']Context[/font] object. The scanner which provides tokens from the source is [font='courier new'].sc[/font] and the symbol table is [font='courier new'].st[/font].
NodePtr conditionalExpr(Context &c, bool get){ if(!c.sc.next(c, get)) return NodePtr(); switch(c.sc.token().type()) { case Token::RwVar: return varNode(c, VarNode::Expression, true); default: return expr(c, false); }}We have also had to add a type ([font='courier new']Expression[/font] or [font='courier new']Declaration[/font]) to the [font='courier new']VarNode[/font] node class, as it needs to behave slightly differently in this case. The existing [font='courier new']VarNode[/font] implementation pushes its value (or null) onto the stack, adds the variable to the symbol table and returns. If this is done in the context of a var-declaration-expression, this would mean the variable would then be eaten by the [font='courier new']OpCode::Pop[/font] that will be performed at the end of the expression, so we need to duplicate the value onto the stack so that the variable survives the process correctly. We can just use [font='courier new']OpCode::Peek[/font] and 0 to do this.
bool VarNode::generate(Context &c){ if(c.st.findLocal(name).addr != invalid_uint) { c.error(pos, stringFormat("Symbol already defined - ", name)); return false; } if(value) { if(!value->generate(c)) return false; } else { c.pm() << OpCode::Push << TypedValue(Om::Null, 0); c.st.increment(); } uint id = c.s.symbolIds.takeId(); c.st.add(Symbol(name, id, 0)); c.pm() << OpCode::SetId << id; if(type == Expression) { c.pm() << OpCode::Peek << 0; c.st.increment(); } return true;}Taking the while loop as the simplest example, the compiler section puts together a [font='courier new']WhileNode[/font] structure which has two child nodes, the expression and the block which represents the code inside of the loop. [font='courier new']BlockNode[/font] is basically a vector of nodes and is assembled in a compiler function that checks for an opening brace and gathers until the matching close, or just takes a single construct if not so it works in a uniform manner.
NodePtr statementBlock(Context &c, bool get){ if(!c.sc.next(c, get)) return NodePtr(); BlockNode *b = new BlockNode(c.position()); NodePtr bn(b); if(c.sc.token().type() == Token::LeftBrace) { if(!c.sc.next(c, true)) return NodePtr(); while(c.sc.token().type() != Token::RightBrace) { if(!construct(c, b, false)) return NodePtr(); } if(!c.sc.next(c, true)) return NodePtr(); } else { if(!construct(c, b, false)) return NodePtr(); } return bn;}All of the compiler functions that eat up constructs rather than expressions take a pointer to a [font='courier new']BlockNode[/font] as one of their arguments and add their produced nodes to this block rather than return them as a [font='courier new']NodePtr[/font], so if you give a node a [font='courier new']BlockNode[/font] child, you can then compile into this block using the same methods as the main program is compiled.

When we walk the abstract syntax tree and generate the bytecode, the while loop needs to do the following in simple terms:

  • Generate the expression to the top of the stack
  • Emit [font='courier new']OpCode::JmpZ[/font] with a placeholder unsigned parameter
  • Emit the code for the block
  • Go back and patch the [font='courier new']OpCode::JmpZ[/font] parameter to the current bytestream position
    To make the patching a bit more attractive to the eye, I have augmented the [font='courier new']ByteStream[/font] class that the program is generated into with a [font='courier new']ByteStreamPatch[/font] class. This can be inserted by a non-const reference into a [font='courier new']ByteStream[/font], at which point it puts a 0 in the stream and stores a pointer to the [font='courier new']ByteStream[/font] and the position of the 0. It can then later be assigned an [font='courier new']unsigned int[/font], which it will put in the place the placeholder 0 went. Just eye candy really. Here's a stripped down, simplified version of [font='courier new']WhileNode::generate[/font] showing this.
    bool WhileNode::generate(Context &c){ uint top = c.pm().position(); if(!expr->generate(c)) return false; uint exprTotal = c.st.currentScopeTotal(); ByteStreamPatch end; c.pm() << OpCode::JmpZ << end; if(!block->generate(c)) return false; c.pm() << OpCode::Jmp << top; end = c.pm().position(); return true;}Where things get more complicated is in dealing with the symbol table since we want the while loop to have its own symbol scope. Normally we just call [font='courier new']push()[/font] on the symbol table, generate code, the call [font='courier new']pop()[/font] on the symbol table, and [font='courier new']pop()[/font] automatically generates the bytecode to remove all the local symbols from the stack.

    However, if the while loop expression is a var declaration, we want to add the var to the while block's symbol scope and each time the loop restarts, this value is going to be regenerated. This means that the value is pushed prior to the test, so if the test is false and we jump past the loop to the end, we then need to deal with the value.

    So what we can do is push the new symbol scope before we generate the expression, then make a note of the number of items in the local (top) symbol stack afterwards. This will always be a 0 or a 1. We can then use this later on to decide whether to apply an extra [font='courier new']OpCode::Pop[/font] after the loop. At the end of the loop, before we jump back up to test the expression again for the next run through it, we can also remove all the values in the scope ready for them to be generated again as the loop executes:
    bool WhileNode::generate(Context &c){ uint top = c.pm().position(); ByteStreamPatch end; c.st.push(); if(!expr->generate(c)) return false; uint exprTotal = c.st.currentScopeTotal(); c.pm() << OpCode::JmpZ << end; c.st.decrement(); if(!block->generate(c)) return false; c.pm() << OpCode::PopN << c.st.currentScopeTotal(); c.pm() << OpCode::Jmp << top; end = c.pm().position(); if(exprTotal) { c.pm() << OpCode::PopN << exprTotal; } c.st.decrement(c.st.currentScopeTotal()); c.st.discard(); return true;}The symbol table's [font='courier new']discard()[/font] method just pops the top scope without emitting any bytecode because we are handling it manually in this case.

    There are similar manual shenanigens going in with if/else and the C-style for(x;y;z) loops to get the correct behaviour with regard to the possibility of a var-declare-expression. Probably be a bit harder to explain these without a more complete understanding of the architecture so I'll leave this to your imagination for now. But they both also still just use combinations of [font='courier new']OpCode::Jmp[/font] and [font='courier new']OpCode::JmpZ[/font] to function.


    The For-Each Style Loop

    The shining jewel in the control structure set is the one I was really looking forward to working on, but I was disciplined enough to get the "boring" control structures working and tested first. This form of the for loop works on [font='courier new']Om::Arrays[/font].
    var a = [ 1, 2, 3, 4 ];for(var n: a) out n;Stealing syntax from modern C++ here, if the first clause is followed by a colon instead of a semicolon, we construct a [font='courier new']ForEachNode[/font] instead of a [font='courier new']ForNode[/font]. The compiler is a bit complicated by having to delay the decision about what sort of node to create, but since we can store nodes in local [font='courier new']NodePtr[/font] variables, it wasn't too bad.

    [font='courier new']ForEachNode[/font] has three child nodes: expression, value and block. The expression is the left side of the colon and the value is the right-hand side and has to be an [font='courier new']Om::Array[/font]. However, this second point has to be checked at runtime since this is perfectly legal:
    var f = func(x){ if(x) return [ 1, 2, 3 ]; else return "hello";};var b = some_unknown_value_at_compile_time();for(var n: f(b)){ // code}This is the joy of dynamic typing. Very flexible but means we have to defer a lot of things to runtime that a static typed language can catch at compile time.

    I agonised a bit and decided to add a couple of specialized [font='courier new']OpCodes[/font] to make the for-each loop a bit more efficient. We have [font='courier new']OpCode::FeChk[/font] and [font='courier new']OpCode::FeGet[/font].

    After the expression has been dealt with, the ForEachNode generates the value (hopefully an [font='courier new']Om::Array[/font]) to the stack top, then pushes an integer 0 on top and executes [font='courier new']OpCode::FeChk[/font]. At runtime then, the VM first checks to see if the value at stack position 1 is an [font='courier new']Om::Array[/font] or not, and errors out with an appropriate message if not.

    [font='courier new']OpCode::FeChk[/font] takes an unsigned program position parameter, just like [font='courier new']OpCode::JmpZ[/font]. Having confirmed that there is an array at stack position 1, [font='courier new']OpCode::FeChk[/font] then gets the integer at position 0 and, if this is greater than or equal to the length of the array pointed at by the position 1 stack value, sets the program counter to this parameter, thereby jumping out of the loop.

    [font='courier new']OpCode::FeGet[/font] is generated at the top of the loop body. This is just like the subscript get opcode, [font='courier new']OpCode::GetSc[/font], except the latter pops its stack value arguments. The value and counter for the for-each stay alive on the stack until the loop exits, just to save some unnecessary pushing and popping, so [font='courier new']OpCode::FeGet[/font] just retrieves the value from the array whose index is in the integer value and pushes that onto the stack.

    We then use the expression's [font='courier new']generateAssign()[/font] method to take whatever is on top of the stack and store it in the variable, which is the same routine as is used for normal assignments.

    We then generate the body of the loop, then remove all its locals so we are back to having the counter integer on the top of the stack followed by the array again. We can use the existing [font='courier new']OpCode::Inc[/font] to add 1 to the integer value on the top of the stack, then [font='courier new']OpCode::Jmp[/font] back to the [font='courier new']OpCode::FeChk[/font] instruction to run the loop again, first assigning the next array member to the expression value then executing the loop body, or if the [font='courier new']OpCode::Inc[/font] has set the counter to the length of the array, jumping out of the loop and clearing up.

    Using the [font='courier new']generateAssign()[/font] method to store the value means anything else that can be the target of an assignment can be the expression in a for-each loop. For example:
    var ob ={ var x = 10; var y = 23.5; var name = "Paul"; var f = func(x) { out "Say ", x, ", ", this.name; };};var nb ={ var x = [ "one", "two", "three" ];};for(nb.x[1]: ob.members){ out "member ", nb.x[1], ": ", ob[nb.x[1]];}Here we have an object with some members, then another object that has an array member (x). There is a built-in property [font='courier new']members[/font] that you can call on any object that returns an [font='courier new']Om::Array[/font] of O[font='courier new']m::Strings[/font] of the names of the members of that object, sorted alphabetically, so this is a first glimpse of actual reflection in Om up and working. I'll maybe write up a bit about the built-in properties and functions soon, have a good, easy to extend system in place there.

    We can use the element at position 1 of that array to be our iterator variable in the for-each loop because the underlying nodes are already set up to allow an array member to be the target of assigning. Twisted example but you get the idea.

    Again there is some manual tweaking of the symbol table going on in [font='courier new']ForEachNode::generate()[/font], but this is all compile-time only stuff. The symbol table is actually discarded at the end of compilation and plays no part of the actual execution of the code so we don't have to be too concerned about the work done here.

    I could have techincally avoided adding [font='courier new']OpCode::FeChk[/font] and [font='courier new']OpCode::FeGet[/font] and used the existing codes to perform the same actions, but that would have meant a) longer bytecode and b) lots of redundant pushing and popping of values onto the stack, including things like pushing a 1 then using the [font='courier new']BinaryOps[/font] system to add values together and so on. I'm keeping the VM instruction set down to as much of a minimum as I can, but when significant performance benefits can be gained by some specialising, I'd be foolish to not do so.

    So there you have a good overview of the control structures we currently have in the language. I've set up a test system where an ever increasing script program is run every time I launch my driver program, and its output directed to a file, which is then compared to a stored version. It breaks out with a message if they differ so I'm constantly making sure any changes are not breaking existing features as I go. As I finish a new feature, I add a few lines using it to the unittest.txt script and regenerate the comparison file, check it is correct then carry on with the next feature. Makes me feel a lot more confident that I'm not disturbing other things as I develop.

    Thanks, as always, for reading. I hope this stuff is of interest to any fellow virtual machine enthusiasts smile.png
Sign in to follow this  


0 Comments


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now