• entries
    743
  • comments
    1924
  • views
    580354

Implementing "break" in the Scripting Engine

Sign in to follow this  

1131 views

Just finished implementing [font='courier new']break[/font] in Om, for breaking out of [font='courier new']while[/font] and [font='courier new']for[/font] loops. Thought a write-up might be an interesting insight into the internals of the system, although I'm not sure who to smile.png.

First a quick overview. [font='courier new']Om::Value Om::Engine::evalute(const std::string &text, Om::Engine::EvaluteType type)[/font] is the main entry point into this scripting system from the user's point of view. You can pass [font='courier new']FromString[/font] or [font='courier new']FromFile[/font] as the type, and text contains either the script directly or the path to the file.

Internally this method works in a two-stage manner. First up, it runs the script through the compiler module, which converts the script into an abstract syntax tree structure, based on classes inheriting from a [font='courier new']Node[/font] parent class, then recursively walks the tree, spitting out the bytecode for the virtual machine. Secondly, the resulting top-level function bytecode is executed by the virtual machine and the return value is then converted to an [font='courier new']Om::Value[/font] and returned to the user.

One possible return is an [font='courier new']Om::Type::Function[/font], which you can then execute later using [font='courier new']Om::Value::Execute()[/font], which takes two parameters, the object to be used as the [font='courier new']this[/font] value and an [font='courier new']Om::ValueList[/font] which represents the parameters, if any.

[font='courier new']Om::Engine::evalute()[/font] firstly creates a stack variable of the [font='courier new']Context[/font] class, which contains all the shared information needed to compile the script. A reference to this [font='courier new']Context[/font] is passed down through all the recursive descent compiler functions, and also passed down to each [font='courier new']Node[/font] during the generation stage. When compilation is completed, the [font='courier new']Context[/font] is then abandoned.

[font='courier new']Context[/font] contains various things. Because functions can be nested indefinitly in Om, one thing [font='courier new']Context[/font] provides is a stack of [font='courier new']FuncData[/font] instances. [font='courier new']FuncData[/font] contains a [font='courier new']ByteStream[/font] instance to which the bytecode is streamed, a pointer to the [font='courier new']FunctionEntity[/font] which will represent the function in the entity heap at run-time and a [font='courier new']SymbolStack[/font] representing the local variables to the function.

Hopefully that is enough context to understand the details of how the [font='courier new']break[/font] keyword is implemented.

To implement [font='courier new']break[/font], we need two pieces of information - the number of values to pop from the stack (i.e. the number of locals that have been created since the enclosing loop began) and the address in the bytecode to jump to to exit the enclosing loop.

Loops are obviously also nestable in Om, but one slight complication here is it is only [font='courier new']while[/font] and [font='courier new']for[/font] loops we can break from. Other scopes, such as [font='courier new']if[/font] and [font='courier new']else[/font], should be ignored by [font='courier new']break[/font] and we need to jump right back to the enclosing [font='courier new']while[/font] or [font='courier new']for[/font].

So again, we need a stack, this time of [font='courier new']BreakData[/font] instances to keep track of this that is independant of anything else that tracks scopes, so we can ignore [font='courier new']if[/font] and [font='courier new']else[/font] scopes.

To establish the number of values to pop before the jump out, all we need to do is record the [font='courier new']SymbolStack[/font] depth when the loop begins. We can then just deduct this from the [font='courier new']SymbolStack[/font] depth at the point that [font='courier new']BreakNode[/font] is generated. So [font='courier new']BreakData[/font] firstly has to have a [font='courier new']uint stackTop[/font] member that is initialised when we first start generating in [font='courier new']WhileNode[/font], [font='courier new']ForNode[/font] and [font='courier new']ForEachNode[/font].

[font='courier new']WhileNode[/font], [font='courier new']ForNode[/font] and [font='courier new']ForEach[/font] node just push a new instance of BreakData onto the [font='courier new']Context[/font]'s stack, and pop it off once they are finished generating.

The second is slightly trickier since we won't know the address at the end of the loop until the loop has finished being generated, so we need to go back and patch these when the loop completes.

As a helper class to assist patching the [font='courier new']ByteStream[/font], which we have to do quite a lot, we have a [font='courier new']ByteStreamPatch[/font] class. There is an [font='courier new']operator<<[/font] overload in [font='courier new']ByteStream[/font] that takes a non-const reference to a [font='courier new']ByteStreamPatch[/font]. This writes a 0 to the [font='courier new']ByteStream[/font], but also initialises the [font='courier new']ByteStreamPatch[/font] with a pointer to the [font='courier new']ByteStream[/font] and the address the 0 was written at.

[font='courier new']ByteStreamPatch[/font] also overloads [font='courier new']operator=(uint i)[/font] which causes it to use the pointer and address it has stored to overwrite the 0 it wrote to the [font='courier new']ByteStream[/font] with the [font='courier new']i[/font] arguement.

For example:

void f(ByteStream &stream){ ByteStreamPatch patch; stream << OpCode::Jmp << patch; // generate more to the stream patch = stream.position();}Just makes the code a bit cleaner and easier to read than alternative approaches.

[font='courier new']ByteStreamPatch[/font] contains a pointer to memory it doesn't own and a uint so is trivially copyable so the other thing we add to [font='courier new']BreakData[/font] is a [font='courier new']pod_vector patches[/font] member. [font='courier new']BreakNode[/font] can then insert it into the stream after the [font='courier new']OpCode::Jmp[/font] instruction, then add it to the [font='courier new']pod_vector[/font] of the [font='courier new']BreakData[/font] instance currently on the top of the [font='courier new']Context[/font]'s stack.

So, [font='courier new']BreakData[/font]'s generate method looks like:

bool BreakNode::generate(Context &c){ if(c.breaks.empty()) { return c.error(pos, "break statement not within loop"); } c.pop(c.locals().functionPopTotal() - c.breaks.back().stackTop); ByteStreamPatch patch; c.pm() << OpCode::Jmp << patch; c.breaks.back().patches.push_back(patch); return true;}First we check to ensure we are actually inside a loop, and throw out an error if not. Then we calculate the amount to pop. [font='courier new']Context::pop()[/font] just outputs an [font='courier new']OpCode::Pop[/font] or [font='courier new']OpCode::PopN[/font] instruction, or does nothing if its arguement is zero.

Next we create the patch instance, insert the [font='courier new']OpCode::Jmp[/font] and the patch (which adds a dummy 0 to the stream) then add the patch to the current top of stack of [font='courier new']BreakData[/font] in the [font='courier new']Context[/font].

Next up, in the [font='courier new']WhileNode[/font], [font='courier new']ForNode[/font] and [font='courier new']ForEach[/font] nodes, we need to take care of patching these.

bool WhileNode::generate(Context &c){ // snip pod_vector &patches = c.breaks.back().patches; for(auto &p: patches) { p = c.pm().position(); } // snip}We now know the end position to jump to to exit the loop, so we just run through and assign it to all the patches in the vector on the top of the [font='courier new']BreakData[/font] stack.

We then pop the top from the [font='courier new']BreakData[/font] stack, and we're done. We can now do the following in the script:

var a = 0;while(a < 10){ var b = "hello"; out a; if(a == 5) { var c = "foobar", d = { destructor = func { out "Bye"; }; }; var x = 10; while(x) { for(var i = 0; i < 5; ++i) { out "forloop ", i; if(i == 2) break; for(var k: [ "one", "two", "three" ]) { var a = " ", b = "barfoo", c = " foo"; out k, a, b, c; break; } } out "subloop ", x; --x; if(x == 3) break; } out "at the first break"; break; } ++a;}out this;out "end";return 100;Just a tiny part of what is turning into a very large project here. Probably of interest to nobody but me, as I guess you need a better understanding of the whole thing for this to mean anything, but a nice example of a very small part of a much larger whole I hope.

Thanks for stopping by.

Sign in to follow this  


4 Comments


Recommended Comments

A really interesting read, it always amazes me some of the dev blogs that go up on here smile.png


Thank you. I'm never sure if this kind of internal detail stuff is interesting or not.

Share this comment


Link to comment

TODO: multilevel break to eliminate the only legit use of goto biggrin.png


Hmm, worth considering. Not something I've ever really missed in C++ to be honest, prefer in situations like that to factor loops into separate functions.

Fairly easy to implement in Om though I guess, just a case of walking further up the BreakStack.

But thanks for the suggestion. smile.png

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now