Om: Short Circuit Evaluation, Ternary Operators and Native Functions

Published April 11, 2015
Advertisement
Short Circuit Evaluation

The way I'd previously been doing and and or in the VM was with specific instructions, [font='courier new']OpCode::Land[/font] and [font='courier new']OpCode::Lor[/font], which popped the top two stack values, then pushed the boolean result of anding or oring their boolean interpretations. Something like:

bool Machine::land(){ TypedValueGuard a(s, vs.pop().value); TypedValueGuard b(s, vs.pop().value); push(typedValueToBool(a.value()) && typedValueToBool(b.value())); return true;}
[font='courier new']TypedValueGuard[/font] is just a convenience object that does a decrement on its value in the destructor so you can be sure the value is decremented at the end of the function, even with different return paths.

All very well, but it prevented implementation of short circuit evaluation. Having short circuit evaluation is a very nice feature in a dynamically typed language since it allows for things like:

var v/if(v.type == "array" && v.length) do_something();
Calling [font='courier new']length[/font] on a value that isn't an [font='courier new']Om::Array[/font] or an [font='courier new']Om::String[/font] is a runtime error, so you couldn't do things like this as it was, since both expressions were fully executed before the and operation.

First of all, I created a [font='courier new']LogicNode[/font] class to represent and and or operations in the abstract syntax tree, rather than the generic [font='courier new']BinaryNode[/font] which was previously used, so I could have different behaviour for the [font='courier new']generate()[/font].

It turns out we can actually do without [font='courier new']OpCode::Land[/font] and [font='courier new']OpCode::Lor[/font] altogether if we instead implement this by using [font='courier new']OpCode::JmpT[/font] and [font='courier new']OpCode::JmpF[/font], which change the program counter based on whether the value on the top of the stack is true or false when run through [font='courier new']typedValueToBool()[/font].

The behaviour we need is for the end of the generate to have one value on the top of the stack which will be considered by the [font='courier new']OpCode::JmpF[/font] that is generated by the if, or while or whatever construct the expression is the conditional for. So what we can do instead is:

bool LogicNode::generate(Context &c){ if(!left->generate(c)) return false; c.pm() << OpCode::Peek << 0; c.st.increment(); ByteStreamPatch end; c.pm() << (type == And ? OpCode::JmpF : OpCode::JmpT) << end; c.st.decrement(); c.pm() << OpCode::Pop; c.st.decrement(); if(!right->generate(c)) return false; end = c.pm().position(); return true;}
Basically, generate the first expression, then jump to the end of its bytecode section if the value is either true or false, false if this is an and since we know then the whole expression is false, true if this is an or, for the same reason.

If we don't jump, we can just pop the stack top, then generate the second expression. This leaves a value that is either true or false on top of the stack so we are then done and can carry on. But the difference here is that the bytecode for the second expression is only executed when it needs to be and is skipped if the first expression proves the overall expression is true or false.

So quite a nice outcome, to add a new, useful feature and find it actually reduces rather than enlarges the instruction set, although the bytecode is slightly longer.


Ternary Operator

The C-style ternary operator, namely [font='courier new']expr ? expr : expr[/font], is also very useful in a dynamically typed language as you can do things like:

var v;out v.type == "array" ? v.length : 0;
It was a breeze to implement. Just a case of adding an extra level at the very top of the compiler expression parser to gather the relevant nodes:

NodePtr tern(Context &c, Node *node, bool get){ TernaryNode *t = new TernaryNode(c.position(), node); NodePtr tn(t); NodePtr p = expr(c, get); if(!p) return NodePtr(); t->positive = p.release(); if(!c.sc.match(c, Token::Colon, false)) return NodePtr(); NodePtr n = expr(c, true); if(!n) return NodePtr(); t->negative = n.release(); return tn;}NodePtr ternary(Context &c, bool get){ NodePtr n = logical(c, get); while(true) { switch(c.sc.token().type()) { case Token::Question: n = tern(c, n.release(), true); break; default: return n; } if(!n) return NodePtr(); }}NodePtr expr(Context &c, bool get){ return ternary(c, get);}
[font='courier new']logical()[/font] was the previous top level expression parser function, so this is now the method that [font='courier new']ternary[/font] calls.

We end up with a new node which has three child nodes, expr, positive and negative. All we need do then when we [font='courier new']generate()[/font] is emit the expression bytecode, then use [font='courier new']OpCode::JmpF[/font] to jump over the emit of the positive node, then an absolute [font='courier new']OpCode::Jmp[/font] at the end of the positive node to jump over the negative node bytecode, so we end up with the correct value on the top of the stack. Again, only the chosen option is actually executed:

bool TernaryNode::generate(Context &c){ if(!expr->generate(c)) return false; ByteStreamPatch neg; c.pm() << OpCode::JmpF << neg; c.st.decrement(); if(!positive->generate(c)) return false; ByteStreamPatch end; c.pm() << OpCode::Jmp << end; neg = c.pm().position(); c.st.decrement(); if(!negative->generate(c)) return false; end = c.pm().position(); return true;}
There is an extra decrement of the symbol table stack positions required in there. Must confess I can't fully understand in my mind why this is required but this stuff is hard to keep track of and the solution is well tested now, so must be correct.


Calling Native Functions from Script

Also been working on the native API side of Om, to add support for being able to create an [font='courier new']Om::Value[/font] representing a native C++ function that can then be called from within the script using the normal syntax.

At the moment, the only way to get [font='courier new']Om::Values[/font] into the script is to pass them as function parameters so we first of all need a script that returns a function, the function needs to treat its parameter as another function and call it, then we supply a wrapped-up native function as the parameter when we call the function externally. So the script looks something like:

return func(x){ out "calling"; out x("string from the script"); out "end";};
Then, in the C++ program hosting the scripting, we can first of all get an [font='courier new']Om::Value[/font] representing this function in the normal way:

void f(){ Om::Engine e; Om::Value v = e.evaluate(script);}
v now points to the function we returned, whose life in the entity heap is prolonged now past the end of the program to the point at which the [font='courier new']Om::Value[/font] goes out of scope in the C++ program.

Next we define our native function, which has to have a specific function signature.

Om::Value function(Om::Engine &e, Om::Value &object, const std::vector &params){ std::cout << "The native function: " << params[0].toString() << "\n"; return e.makeString("Text from native back to the script");}
Note that to create referenced objects like [font='courier new']Om::String[/font], we have to use the engine to create them rather than construct an [font='courier new']Om::Value[/font] directly, so they can be added to the entity heap, tracked and removed when there are no more references to them. This is invisible to the user of the library as [font='courier new']Om::Value[/font] takes care of it by implementing the rule of three (or five).

Now we can call [font='courier new']execute()[/font] on [font='courier new']v[/font] which checks it is a function and calls it. [font='courier new']execute()[/font] takes an [font='courier new']Om::Value[/font] representing the this object for the duration of the function, and a [font='courier new']std::vector[/font] representing the parameters. We'll pass a null this object so use the [font='courier new']execute()[/font] overload that just takes the parameters, and we can use the initializer list syntax to make this tidier:

void f(){ Om::Engine e; Om::Value v = e.evaluate(script); Om::Value n = e.makeFunction(function); v.execute({ n });}
So the script function is called, passing an object representing the native function as the parameter to the function. The script then calls this function in the normal way. We'll grab the return value as well now:

return func(x){ out "calling"; var s = x("string from the script"); out "back in the script"; out s; out "end";};
And we get the following output:

callingThe native function: string from the scriptback in the scriptText from native back to the scriptend
So there we have an example of calling a native function, accessing values from the script environment that are passed to the function, creating objects within the native function and returning them to the script environment where they can be accessed like normal.

In terms of how this is implemented, when the compiler finds a left bracket following an entity in the script, it generates a [font='courier new']CallNode[/font]. When [font='courier new']CallNode[/font]'s generate is called, it emits bytecode to set up the function return on the stack, then the this value and the function parameters, then generates the entity and emits the [font='courier new']OpCode::Call[/font] instruction.

[font='courier new']OpCode::Call[/font] expectes the stack top to be either an [font='courier new']Om::Function[/font] or one of a couple of hidden, internal types. These types are defined in an internal-only file like this:

const uint InternalMethodType = static_cast(Om::Unknown + 1);const uint ExternalFunctionType = static_cast(Om::Unknown + 2);
This keeps them out of the external [font='courier new']Om::Type[/font] interface so it is not visible to the user of the library, and things like [font='courier new']Om::Value::type()[/font] are hardcoded to return [font='courier new']Om::Function[/font] if the type is either of these so the user can't distinguish between these special types and a normal [font='courier new']Om::Function[/font].

If the [font='courier new']TypedValue[/font] on the top of the stack is of one of these types, the value is a pointer to a C++ function, [font='courier new']reinterpret_cast[/font]-ed to an unsigned int. The function pointer is retrieved, the necessary setup work is done and the function is called by the VM, then the return value is copied back into the appropriate place on the [font='courier new']ValueStack[/font].

The reason for having two different types is that the external function has to work via the [font='courier new']Om::Value[/font] interface, which adds some small overhead whereas the internal types can deal directly with the [font='courier new']ValueStack[/font] and [font='courier new']TypedValues[/font], since I'm writing them all, and can take responsibility for stack management themselves, which is slightly more efficient.

For example, all reference types have a built-in [font='courier new']clone()[/font] method that does a deep copy of the object and returns the result. The equivalent internal method that gets called looks like this:

bool generic_clone(uint p, ValueStack &vs, State &s, Om::Value &err){ TypedValueGuardStack params(p + 1, vs, s); uint id = invalid_uint; if(params[0].type() == Om::Function) { id = params[0].toUint(); } else { Entity *e = s.eh[params[0].toUint()].clone(s); if(!e) { err = Om::ValueFactory::makeError(&s, stringFormat("Cannot clone type - ", Om::typeToString(params[0].type()))); return false; } id = s.eh.add(e); } return top(params[0].type(), id, vs, s);}
So if you have a script line like this:

var v = s.clone();
The [font='courier new']DotNode[/font] built from [font='courier new']s[/font] and [font='courier new']clone[/font] turns [font='courier new']clone[/font] into an unsigned int by looking up the word "clone" in the text cache, then emits [font='courier new']OpCode::GetMb[/font] with this unsigned id as its parameter, which does a check to see if the unsigned int value is recognised as a built-in method or property in the context of the current stack-top object type. In this case it is, so a [font='courier new']TypedValue[/font] is pushed with the type [font='courier new']InternalMethodType[/font] and the pointer to [font='courier new']generic_clone[/font] embedded in the value.

The brackets then cause the complier to make the [font='courier new']DotNode[/font] the left hand child of a [font='courier new']CallNode[/font], which will ultimately generate [font='courier new']OpCode::Call[/font]. When [font='courier new']OpCode::Call[/font] executes, it finds a [font='courier new']TypedValue[/font] with type [font='courier new']InternalMethodType[/font] on top of the stack, so it decodes the function pointer from the value, and calls it, passing it the [font='courier new']ValueStack[/font] from the currently executing [font='courier new']Machine[/font].

[font='courier new']ExternalFunctionType[/font] is handled almost the same, except that the VM has to do a bit more work converting the stack parameters and so on into [font='courier new']Om::Value[/font] instances, pass these to the external function, then decode these back into [font='courier new']TypedValue[/font], with the required incrementing and decrementing of reference counters, but the result is that the library user never needs to be aware of the [font='courier new']ValueStack[/font] at all, and just works with [font='courier new']Om::Value[/font] instances which provide a safe interface.

Having [font='courier new']TypedValue[/font] not automatically call the increment and decrement methods on its contents means we can find all sorts of optimisations internally in the VM, and it is augmented by [font='courier new']TypedValueGuard[/font] to ensure we don't forget to call the decrement, but we have to be able to remove a value from the stack, use it, then decrement it, so we can't have the stack removal step automatically decrement things.

Not sure what is next up. Plenty more to do, including the rather boring grunt work of implementing all of the internal methods that [font='courier new']Om::String[/font] and [font='courier new']Om::Array[/font] need to support to make them fully functional, but at least they are trivial to add now the [font='courier new']InternalMethod[/font] system is fully in place and working.

Thanks for reading.
Previous Entry Control Structures in Om
Next Entry Allocator Pools
5 likes 2 comments

Comments

Ashaman73

You made great progress with your scripting language. I have one question (I do a quick scan of your last 5 posts):

Do you plan to include multithreading support ?

The lack of game usable multithreading support is annoying about the usally scripting language (yes, I'm looking at you LUA). I'm not talking about the (low performance) hi-level sychronsiation like semaphores, but about a more low-level sync. Once you have the ability to call native code in your engine you have the option to use mutex and atomic operations, but the more important isssue for me would be to run multiple VMs on a shared piece of data. It would be okay to leave out all out-of-the-box sync features for shared data access, but it would be nice to have at least the option to access shared memory (unguared) from multiple VMs.

I could imagine a shared memory pool. Eg if you create a new VM you give it a shared memory pool and the type of syncronisation (read-only, write-only, safe-read-write, unsafe-read-write etc.). Either store all variables in the shared memory pool or declare a shared var , with the restriction, that you can only reassign shared vars to other shared vars....

Sorry, got day-dreaming ;-)

April 13, 2015 06:11 AM
Aardvajk
Its an interesting idea. Sadly I'm not really sufficiently experienced in threading to really attempt this at the moment, not an area I know more than the absolute basics about.

If this project matures sufficiently though, I'd be more than happy to think about some collaboration with someone who did have the necessary skills to make this happen.

All the state for an execution context is local i.e. when you execute a script function, a Machine object is created on the stack which contains all the state for the actual execution so you can run multiple functions at the same time, but some kind of synchronisation of access to the shared state (basically the EntityHeap) would be necessary for threading support as I understand it.
April 13, 2015 07:44 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement