• entries
    743
  • comments
    1924
  • views
    580021

Inside the Om Virtual Machine

Sign in to follow this  
Aardvajk

2190 views

Inside the Om Virtual Machine

I've basically rewritten the entire project from scratch since my last entry since I really felt I could do better in terms of organisation and wanted to pay more attention to correctly releasing resources when an error occurred. I'm nearly back at the same point of functionality I had before, but in a much better structured form.

I thought I'd write up a little about the way the handling of binary operations (+, *, ==, and, or etc) has evolved today but I'll need to provide a rough overview of the inner workings of the VM for this to make much sense.


A Virtual Stack Machine

OmVM is a virtual stack machine, quite unfashionable these days as everyone seems to be proving that virtual register machines can provide more efficient execution, but the advantages of a VSM make it the natural choice for as recursive a language as Om. The main advantage is that code generation is very context-free - what you need to emit for an Int for example is the same whether it is being assigned to a variable, being passed to a function or being used as a subscript index.

The first step, which I'll write more about in the future, is to read the script text and convert it into a [font='courier new']Node[/font] tree representing the program. Virtual methods on the [font='courier new']Node[/font]s are then used to traverse the tree and spit out the [font='courier new']OpCode[/font] program that is actually executed by the VM.

Internally, the VM maintains a stack of the fairly primitive [font='courier new']TypedValue[/font] class. This is just an aggregation of an [font='courier new']Om::Type[/font] and a [font='courier new']char[4][/font] buffer that can contain an [font='courier new']int[/font], [font='courier new']float[/font] or [font='courier new']unsigned int[/font]. Simple objects ([font='courier new']Om::Null[/font], [font='courier new']Om::Int[/font], [font='courier new']Om::Float[/font], [font='courier new']Om::Bool[/font]) are stored directly in this buffer. Complex types ([font='courier new']Om::String[/font], [font='courier new']Om::Function[/font], [font='courier new']Om::Object[/font] and [font='courier new']Om::Array[/font]) store an unsigned int ID that is basically an index into what is known as the [font='courier new']EntityHeap[/font]. The [font='courier new']EntityHeap[/font] stores pointers to objects deriving from [font='courier new']Entity[/font] but because we always know what the type is at runtime, we can [font='courier new']static_cast[/font] rather than [font='courier new']dynamic_cast[/font] to get the real entity so this is as efficient as possible.

In the case of a binary operation, for example adding, the emit of the [font='courier new']BinaryNode[/font] class looks like this:
bool BinaryNode::generate(Context &c){ if(!left->generate(c)) return false; if(!right->generate(c)) return false; c.pm() << type; c.st.decrement(); return true;}[font='courier new']type[/font] in this case will be [font='courier new']OpCode::Add[/font]. The rule for [font='courier new']generate()[/font] is that it generates the bytecode for pushing its value onto the [font='courier new']ValueStack[/font] so when [font='courier new']OpCode::Add[/font] is called, the right then the left values will be the top two values on the stack. Binary operations pop the top two values off the stack then push the result onto the stack, so we then have a single decrement on the symbol table class which is tracking where everything is on the stack.


Implementing Binary Operations

All types in Om are dynamic, in that they carry types around with them and it is not possible to predict ahead of runtime what these types will be, so the binary operations need to perform differently depending on a large possible combination of types, performing the appropriate conversions. For example, if one side is [font='courier new']Om::Float[/font] and the other is [font='courier new']Om::Int[/font], the [font='courier new']Om::Int[/font] needs to be converted to [font='courier new']Om::Float[/font] and the result should be [font='courier new']Om::Float[/font].

[font='courier new']TypedValue[/font] is a low level class and, like a union, it is undefined behaviour to call [font='courier new']toFloat()[/font] on one that is an [font='courier new']int[/font] and so on. Only the correct [font='courier new']to*()[/font] can be called.

So naively we might start with something like this:
bool add(ValueStack &vs, TypedValue &result){ TypedValue b = vs.pop(); TypedValue a = vs.pop(); if(a.type() == Om::Float) { switch(b.type()) { case Om::Int: result = TypedValue(Om::Float, a.toFloat() + static_cast(b.toInt()); return true; case Om::Float: result = TypedValue(Om::Float, a.toFloat() + b.toFloat(); return true; } } else if(a.type() == Om::Int) { /// etc } error(); return false;}With all the different types we have to handle, this quickly blows out of proportion for each binary operation, so we need a way to share this mess among all the binary operations but ideally have it as efficient as if we'd coded it all manually.

So this is a good use of templates. We can begin like this:
struct Addition{ template static T op(const T &a, const T &b){ return a + b; } };struct Subtraction{ template static T op(const T &a, const T &b){ return a + b; } };template bool binaryOp(ValueStack &vs, TypedValue &result){ TypedValue b = vs.pop(); TypedValue a = vs.pop(); if(a.type() == Om::Float) { switch(b.type()) { case Om::Int: result = TypedValue(Om::Float, T::op(a.toFloat(), static_cast(b.toInt())); return true; case Om::Float: result = TypedValue(Om::Float, T::op(a.toFloat(), b.toFloat()); return true; } } else if(a.type() == Om::Int) { /// etc } error(); return false;}So now we can add new operations by just adding a new struct, and we specialize at the point of call inside the main VM loop:
template void evaluate(ValueStack &vs){ TypedValue r; binaryOp(vs, r); vs.push(r); incRef(r);}void execute(){ while(true) { OpCode::Type op = pc.get(); switch(op) { case OpCode::Add: evaluate(vs); break; case OpCode::Sub: evaluate(vs); break; } }}So far so good, but now we want ==, != and so on to be handled by this system. Instead of these resulting in an [font='courier new']Om::Int[/font] or [font='courier new']Om::Float[/font], we want these results to be an [font='courier new']Om::Bool[/font]. So we need a way to get the template struct specified to define the type of the result as well.

First up, we need to have a way to convert a C++ type back to an Om::Type where possible. Template specialisation to the rescue:
template Om::Type nativeToType(){ return Om::Unknown; }template<> Om::Type nativeToType(){ return Om::Int; }template<> Om::Type nativeToType(){ return Om::Float; }template<> Om::Type nativeToType(){ return Om::Bool; }Now we can derive our structs from a [font='courier new']Base[/font] to provide a default implementation:
struct Base{ template static Om::Type type(T){ return nativeToType(); }};struct Addition : Base{// etc};Lets split evaluate up a bit so we can do this more concisely. We need to be able to pass in the types of the actual values but also tell the function what type to convert them both to internally to save some coding:
template bool doOp(const L &left, const R &right, TypedValue &r){ r = TypedValue(T::type(A()), T::op(static_cast(left), static_cast(right)); return true;}template bool binaryOp(ValueStack &vs, TypedValue &result){ TypedValue b = vs.pop(); TypedValue a = vs.pop(); TypedValue result; if(a.type() == Om::Float) { switch(b.type()) { case Om::Int: return doOp(a.toFloat(), b.toInt(), result); case Om::Float: return doOp(a.toFloat(), b.toFloat(), result); } } else if(a.type() == Om::Int) { switch(b.type()) { case Om::Int: return doOp(a.toInt(), b.toInt(), result); case Om::Float: return doOp(a.toInt(), b.toFloat(), result); } } error(); return false;}This will map the type back to the equivalent [font='courier new']Om::Type[/font] that equates to the [font='courier new']A[/font] template parameter.

Now we can add a struct that handles, for example, equality, that overrides this behaviour to make the result an [font='courier new']Om::Bool[/font]:
struct Equality : Base{ template static Om::Type type(T){ return Om::Bool; } template static bool op(const T &a, const T &b){ return a == b; } };The T type is ignored in [font='courier new']T::type()[/font] and [font='courier new']Om::Bool[/font] is always used. So [font='courier new']r[/font] gets assigned with [font='courier new']TypedValue(Om::Bool, bool)[/font]. Similarly adding the other comparisons is just a case of changing the [font='courier new']T::op()[/font] method.
struct Comparison : Base{ template static Om::Type type(T){ return Om::Bool; }};struct Equality : Comparison{ template static bool op(const T &a, const T &b){ return a == b; } };struct Inequality : Comparison{ template static bool op(const T &a, const T &b){ return a != b; } };And so on it goes. The real system is more complex that this, in that it checks to make sure you aren't trying to add an [font='courier new']Om::Bool[/font] to an [font='courier new']Om::Int[/font] and so on, but this is just a case of adding default static methods to the [font='courier new']Base[/font] then overriding in the sub-structs, for example we have:
struct Base{ template static bool check(const T &a, const T &b, Om::Value &err, State &s){ return true; } static bool supports(Om::Type a, Om::Type b){ return true; }};The [font='courier new']T::check()[/font] method is overriden by the [font='courier new']Division[/font] struct and returns false if the right-hand side is zero for example. Different structs respond to [font='courier new']T::supports()[/font] with different results depending on what they support. These are used inside [font='courier new']doOp()[/font] so that the methods can return false with an error message populated under these circumstances.

We also have to deal with the complex types. The real system has some more static methods for [font='courier new']string_op[/font] for example, that can get the [font='courier new']toUint()[/font] value from the [font='courier new']TypedValue[/font], use that to retrieve the actual string data from the [font='courier new']EntityHeap[/font] and do a string equality or inequality test.

We can also provide specialisations to ensure that comparing two other complex objects, for example [font='courier new']Om::Function[/font] or [font='courier new']Om::Object[/font], is just a comparision of their [font='courier new']toUint()[/font] value so we have reference equality.

So the main switch is still a fairly complex function, but it is reused for all of the binary operations we support and the compiler generates a fresh version for each one, so no runtime cost is paid for this convenience.

Here's the real thing if any one is interested. Not quite finished, need to do some more specialising to support adding items to arrays:
#ifndef BINARYOPS_H#define BINARYOPS_H#include "framework/StringFormat.h"#include "internal/ValueFactory.h"#include "internal/TypedValueGuard.h"#include "machine/State.h"#include "machine/components/ValueStack.h"#include "machine/entity/StringEntity.h"#include "TypeTemplates.h"namespace BinaryOps{bool isIntFloat(Om::Type t){ return t == Om::Int || t == Om::Float; }std::string error(const char *type, Om::Type a, Om::Type b){ return a == b ? stringFormat(capitalize(type), " on invalid type - ", Om::typeToString(a)) : stringFormat(capitalize(type), " on invalid types - ", Om::typeToString(a), " and ", Om::typeToString(b));}struct Base{ template static bool check(const T &a, const T &b, Om::Value &err, State &s){ return true; } static bool supports(Om::Type a, Om::Type b){ return true; } static Om::Type string_op_type(){ return Om::Unknown; } static bool string_bool_op(const std::string &a, const std::string &b){ return false; } static std::string string_string_op(const std::string &a, const std::string &b){ return std::string(); }};struct MathOp : Base{ template static Om::Type type(T){ return nativeToType(); } static bool supports(Om::Type a, Om::Type b){ return isIntFloat(a) && isIntFloat(b); }};struct Addition : MathOp{ static const char *desc(){ return "addition"; } template static T op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a + b; } static Om::Type string_op_type(){ return Om::String; } static std::string string_string_op(const std::string &a, const std::string &b){ return a + b; }};struct Subtraction : MathOp{ static const char *desc(){ return "subtraction"; } template static T op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a - b; }};struct Multiplication : MathOp{ static const char *desc(){ return "multiplication"; } template static T op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a * b; }};struct Division : MathOp{ static const char *desc(){ return "division"; } template static T op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a / b; } template static bool check(const T &a, const T &b, Om::Value &err, State &s) { if(b == static_cast(0)) { err = Om::ValueFactory::makeError(&s, stringFormat("Division by zero - ", Om::typeToString(nativeToType()))); return false; } return true; }};struct Comparison : Base{ template static Om::Type type(T){ return Om::Bool; } static Om::Type string_op_type(){ return Om::Bool; }};struct Equality : Comparison{ static const char *desc(){ return "equality"; } template static bool op(Om::Type ta, Om::Type tb, const T &a, const T &b) { if(isIntFloat(ta) && isIntFloat(tb)) return a == b; if(ta != tb) return false; return a == b; } static bool string_bool_op(const std::string &a, const std::string &b){ return a == b; }};struct Inequality : Comparison{ static const char *desc(){ return "equality"; } template static bool op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return !Equality::op(ta, tb, a, b); } static bool string_bool_op(const std::string &a, const std::string &b){ return a != b; }};struct MathComparison : Base{ template static Om::Type type(T){ return Om::Bool; } static bool supports(Om::Type a, Om::Type b){ return isIntFloat(a) && isIntFloat(b); }};struct LessThan : public MathComparison{ static const char *desc(){ return "less"; } template static bool op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a < b; }};struct GreaterThan : public MathComparison{ static const char *desc(){ return "greater"; } template static bool op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a > b; }};struct LessThanEquals : public MathComparison{ static const char *desc(){ return "less or equal"; } template static bool op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a <= b; }};struct GreaterThanEquals : public MathComparison{ static const char *desc(){ return "greater or equal"; } template static bool op(Om::Type ta, Om::Type tb, const T &a, const T &b){ return a >= b; }};template bool operation(const TypedValue &a, const TypedValue &b, const L &left, const R &right, TypedValue &r, Om::Value &err, State &s){ if(!T::supports(a.type(), b.type())) { err = Om::ValueFactory::makeError(&s, error(T::desc(), a.type(), b.type())); return false; } if(!T::check(static_cast
(left), static_cast(right), err, s)) return false; r = TypedValue(T::type(A()), T::op(a.type(), b.type(), static_cast(left), static_cast(right))); return true;}template bool string_operation(const TypedValue &a, const TypedValue &b, TypedValue &r, Om::Value &err, State &s){ std::string sa = s.entity(a.toUint()).value; std::string sb = s.entity(b.toUint()).value; switch(T::string_op_type()) { case Om::Bool: r = TypedValue(Om::Bool, T::string_bool_op(sa, sb)); break; case Om::String: r = TypedValue(Om::String, s.eh.add(new StringEntity(T::string_string_op(sa, sb)))); break; default: err = Om::ValueFactory::makeError(&s, error(T::desc(), a.type(), b.type())); return false; } return true;}template bool perform(const TypedValue &a, const TypedValue &b, TypedValue &r, Om::Value &err, State &s){ if(a.type() == Om::Null) { switch(b.type()) { case Om::Null: return operation(a, b, a.toInt(), 0, r, err, s); case Om::Int: return operation(a, b, 0, b.toInt(), r, err, s); case Om::Float: return operation(a, b, 0, b.toFloat(), r, err, s); case Om::Bool: return operation(a, b, 0, b.toBool(), r, err, s); case Om::String: case Om::Function: case Om::Object: case Om::Array: return operation(a, b, 0, b.toUint(), r, err, s); default: break; } } else if(a.type() == Om::Int) { switch(b.type()) { case Om::Null: return operation(a, b, a.toInt(), 0, r, err, s); case Om::Int: return operation(a, b, a.toInt(), b.toInt(), r, err, s); case Om::Float: return operation(a, b, a.toInt(), b.toFloat(), r, err, s); case Om::Bool: return operation(a, b, a.toInt(), b.toBool(), r, err, s); case Om::String: case Om::Function: case Om::Object: case Om::Array: return operation(a, b, a.toInt(), b.toUint(), r, err, s); default: break; } } else if(a.type() == Om::Float) { switch(b.type()) { case Om::Null: return operation(a, b, a.toFloat(), 0, r, err, s); case Om::Int: return operation(a, b, a.toFloat(), b.toInt(), r, err, s); case Om::Float: return operation(a, b, a.toFloat(), b.toFloat(), r, err, s); case Om::Bool: return operation(a, b, a.toFloat(), b.toBool(), r, err, s); case Om::String: case Om::Function: case Om::Object: case Om::Array: return operation(a, b, a.toFloat(), b.toUint(), r, err, s); default: break; } } else if(a.type() == Om::Bool) { switch(b.type()) { case Om::Null: return operation(a, b, a.toBool(), 0, r, err, s); case Om::Int: return operation(a, b, a.toBool(), b.toInt(), r, err, s); case Om::Float: return operation(a, b, a.toBool(), b.toFloat(), r, err, s); case Om::Bool: return operation(a, b, a.toBool(), b.toBool(), r, err, s); case Om::String: case Om::Function: case Om::Object: case Om::Array: return operation(a, b, a.toBool(), b.toUint(), r, err, s); default: break; } } else if(a.type() == Om::String) { switch(b.type()) { case Om::Null: return operation(a, b, a.toUint(), 0, r, err, s); case Om::Int: return operation(a, b, a.toUint(), b.toInt(), r, err, s); case Om::Float: return operation(a, b, a.toUint(), b.toFloat(), r, err, s); case Om::Bool: return operation(a, b, a.toUint(), b.toBool(), r, err, s); case Om::String: return string_operation(a, b, r, err, s); case Om::Function: case Om::Object: case Om::Array: return operation(a, b, a.toUint(), b.toUint(), r, err, s); default: break; } } else if(a.type() == Om::Function || a.type() == Om::Object || a.type() == Om::Array) { switch(b.type()) { case Om::Null: return operation(a, b, a.toUint(), 0, r, err, s); case Om::Int: return operation(a, b, a.toUint(), b.toInt(), r, err, s); case Om::Float: return operation(a, b, a.toUint(), b.toFloat(), r, err, s); case Om::Bool: return operation(a, b, a.toUint(), b.toBool(), r, err, s); case Om::String: case Om::Function: case Om::Object: case Om::Array: return operation(a, b, a.toUint(), b.toUint(), r, err, s); default: break; } } err = Om::ValueFactory::makeError(&s, error(T::desc(), a.type(), b.type())); return false;}template bool evaluate(ValueStack &vs, State &s, Om::Value &err){ TypedValueGuard b(s, vs.pop().value); TypedValueGuard a(s, vs.pop().value); TypedValue r; if(perform(a.value(), b.value(), r, err, s)) { vs.push(StackValue(r)); s.inc(r); return true; } return false;}}#endif // BINARYOPS_HHope this was of interest. Thanks for reading.
Sign in to follow this  


4 Comments


Recommended Comments

Thanks. Performance probably isn't too great at the moment and plenty of opportunities to optimise once I have all the base features in place. I'll try to keep posting detailed journals on this. Thanks to the site for featuring this.

Share this comment


Link to comment
[quote name="jbadams" timestamp="1428489900"][quote name="Aardvajk" timestamp="1428339376"] Thanks to the site for featuring this. [/quote]   No problem, thanks for sharing such an interesting entry! :)[/quote] Really enjoying writing these up actually. Plenty more to come.

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now