The Om Programming Language :)

Published January 04, 2017
Advertisement

The Om Programming Language :)

Rather a long journal entry today. I hope that someone sticks with it as I'm really getting quite excited about how my scripting langauge is starting to develop now.

I started trying to write a general overview of Om last night as I've been posting about my scripting language here for a while and never really provided such a thing, but I was quickly overwhelmed. It's really hard to write a concise overview, partly because the language has developed far more features than I realised until I took a step back but also because it is generally hard to know where to start and what order to discuss things in, given the interlated nature of language features.

Om is designed to be a lightweight scripting language that is easy to integrate into existing C++ projects, implemented entirely in standard C++ itself and provide a reasonable level of efficiency of code execution in terms of speed and resource usage. It is not trying to compete with Google V8 and similar. I wouldn't be that foolish. But while the syntax of Om is quite similar to Javascript, there are a number of differences that have motivated the developent of the language in the first place.

Firstly, there is no garbage collection in Om and it offers completely deterministic destruction of objects to enable RAII style coding - something I personally find hard to live without. Complex objects are carefully tracked by reference count and are released at the exact point their reference count reaches zero and, in the case of an Om::Type::Object, an Om::Type::Function can be set up to be called at this point.

Secondly, the syntax for using objects is slightly simpler than Javascript in that there is no need for a new keyword. The closest we come to constructors in Om are free functions that return an instance of an Om::Type::Object.

Finally, extending the language by the provision of native-side code is designed to be extremely simple. There is one Om::Function typedef, defined as:

Om::Value someFunction(Om::Engine &engine, const Om::Value &object, const Om::ValueList &parameters);

This single type of function can be used to provide rich modules of shared native code to be accessed from the script as well as allowing the script to pass pretty much anything back to the host application.

I'm going to focus on Om::Type::Object in this post and gloss over the other details which I hope will be fairly obvious from the example code. The only thing to bear in mind is that Om is entirely dynamically typed, with variables inferring their type from what is assigned to them, carrying their type around along with their value.

Also bear in mind that Om::Type::Functions are entirely first-class entities in Om and can be assigned to variables or passed as parameters as simply as one would pass an Om::Type::Int or any other type.

So to pick a random example, here's a simple example, entirely in script for now, of how one might implement a Person class.

Om::Type::Objects are declared with the syntax { }, and are essentially a string-value mapping with some special properties discussed later on. Unlike Javascript, Om::Type::List, declared with [ ] syntax, is a completely separate type from Om::Type::Object and functions as a resizable, heterogenus array of values.


import print;var makePerson = func(name, age){ return { name = name; age = age; };};var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];for(var p: people){ print(p.name, " is ", p.age, " years old");}Okay, so let's now think in a more OO way and make the description a method on a person instead.



import print;var makePerson = func(name, age){ return { name = name; age = age; describe = func { print(this.name, " is ", this.age, " years old."); }; };};var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];for(var p: people){ p.describe();}Om supports prototype-based inheritence in a very simple fashion.



import print;var proto ={ hair = "brown";};var makePerson = func(name, age){ return { prototype = proto; name = name; age = age; describe = func { print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair."); }; };};var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];people[1].hair = "blonde";for(var p: people){ p.describe();}Note that all instances of the person object now share the "brown" value when we are reading, but when we write the "blonde" value to Eddie before the output loop, Eddie then has his own "hair" property which overrides the one in the prototype. This is a very simple system to implement but extremely flexible.


Note the this.value syntax has to be explicit in Om. The reason is that the function has no idea it is a member function as it is being compiled. Indeed it is quite possible to call the same function once as a method on an object and then again as a free function.


import print;var proto ={ hair = "brown";};var makePerson = func(name, age){ return { prototype = proto; name = name; age = age; describe = func { print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair."); }; };};var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];people[1].hair = "blonde";var speak = func(word){ if(this.type == "object") { print(this.name, " says ", word); } else { print(word, " is generally spoken :)"); }};speak("hello");people[0].prototype.speak = speak;for(var p: people){ p.describe(); p.speak("hello");}This outputs:



Om: hello is generally spoken :)Om: Paul is 42 years old and has brown coloured hair.Om: Paul says helloOm: Eddie is 23 years old and has blonde coloured hair.Om: Eddie says helloOm: Jill is 78 years old and has brown coloured hair.Om: Jill says helloLastly for now, if we assign an Om::Type::Function taking no parameters to an objects destructor property, this will be called when the object is destroyed.



import print;var proto ={ hair = "brown"; destructor = func { print("goodbye from the prototype"); };};var makePerson = func(name, age){ return { prototype = proto; name = name; age = age; describe = func { print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair."); }; destructor = func { print("goodbye from ", this.name); }; };};var people = [ makePerson("Paul", 42), makePerson("Eddie", 23), makePerson("Jill", 78) ];people[1].hair = "blonde";var speak = func(word){ if(this.type == "object") { print(this.name, " says ", word); } else { print(word, " is generally spoken :)"); }};speak("hello");people[0].prototype.speak = speak;for(var p: people){ p.describe(); p.speak("hello");}people[2] = null;print("end of program");The above program will output the following:



Om: hello is generally spoken :)Om: Paul is 42 years old and has brown coloured hair.Om: Paul says helloOm: Eddie is 23 years old and has blonde coloured hair.Om: Eddie says helloOm: Jill is 78 years old and has brown coloured hair.Om: Jill says helloOm: goodbye from JillOm: end of programOm: goodbye from PaulOm: goodbye from EddieOm: goodbye from the prototypeNote how assigning null to people[2] destroys Jill at that point, since that causes Jill's refernece count to drop to zero.


Om::Type::Object has a built in members property that returns an Om::Type::List of the names of its members. Om::Type::Object supports lookup by both the dot operator and via dynamic text and the subscript operator so you can use these together to implement a form of reflection.


import print;var o ={ name = "Paul"; age = 41; car = "Rover";};for(var m: o.members){ print(m, " = ", o[m]);}Using the subscript operator is far less efficient than the dot operator so should only be employed when the name of the property is not known. Using the dot operator in the VM equates to doing a binary search for an unsigned integer in a sorted array whereas using the subscript operator requires actual text comparisons at runtime.


Final note on Om::Type::Object is that, like Om::Type::List and Om::Type::String, default copy is by reference.


import print;var o = { name = "Paul"; };var c = o;o.name = "Eddie";print(c.name);This will output "Eddie", not "Paul". However, all types support the clone() method so we can explicitly perform a deep copy here instead.



import print;var o = { name = "Paul"; };var c = o.clone();o.name = "Eddie";print(c.name);This will output "Paul" as expected.


clone() is supported by every type although does nothing in the case of the value types. Evem constants can use the dot operator in Om and the following is all perfectly legal and well-defined:


import print;print(10.type); // prints "int"var n = 10.clone(); // equivalent to var n = 10 :)var s = "hello".length; /// s = 5print({ name = "Paul"; age = 42; }.members.length); // prints 2print({ name = "Eddie"; age = 23; }.members.length.type); // prints "int"Now we have a bit of an overview of the langauge itself, let's take a look at how the C++ API is used to integrate Om scripting into an existing C++ application.


The two key classes exposed by the API are Om::Engine and Om::Value.


#incude "om/OmEngine.h"int main(){ Om::Engine engine; Om::Value v = engine.evaluate("return (1 + 2) * 3;", Om::Engine::EvaluateType::String); if(v.type() == Om::Type::Error) { std::cerr << "Error: " << v.toError().text << "\n"; return -1; } std::cout << "v is " << v.toInt() << "\n"; // will print "v is 9"}When reference types like Om::Type::String or Om::Type::Object are stored in Om::Values, the Om::Value takes care of keeping track of reference counts and so on, seamlessly from the user's point of view.


Om::Value can directly construct value types, but the constructors are marked explicit to avoid accidental conversions.


void f(){ Om::Value i(123); // Om::Type::Int Om::Value f(12.34f); // Om::Type::Float Om::Value b(true); // Om::Type::Bool}Reference types have to be generated from the Om::Engine.



void f(Om::Engine &engine){ Om::Value s = engine.makeString("hello"); Om::Value o = engine,makeObject(); o.setProperty("name", engine.makeString("Paul")); o.setProperty("age", Om::Value(42));}If we construct an Om::Value with an Om::Type::Function, it is compiled and stored, but not executed until we choose to later on.



int main(){ Om::Engine engine; Om::Value f = engine.evaluate("return func(a, b){ return a + b; };", Om::Engine::EvaluateType::String); if(f != Om::Type::Error) { Om::Value r = engine.execute(f, Om::Value(), { Om::Value(2), Om::Value(3) }); std::cout << "result " << r.toInt() << "\n"; // prints "result 5" }}In more detail, the execute method is Om::Value Om::Engine::execute(const Om::Value &function, const Om::Value &object, const Om::ValueList &parameters), allowing you to pass in an optional this-object and a parameter list to the function.


Om provides a simple but flexible mechanism for writing and reusing modular code. There is no preprocessing or file inclusion in Om. The compiler is only ever looking at exactly one source file (or string) at a time.

Om::Engine provides the addModule(const Om::String &id, const Om::Value &value) method. Any type of Om::Value can be added to the modules list and then imported into another script.

For example, all the previous examples begin with import print; As Om is entirely unaware of the context in which it is running, I have set up a simple native-side function to print values to std::cout in the test bed. C++-side, this looks like this:


void out(std::ostream &os, const Om::Value &value){ if(value.type() == Om::Type::List) { os << "["; for(int i = 0; i < value.count(); ++i) { os << " "; out(os, value.property(i)); } os << " ] "; } else if(value.type() == Om::Type::Data) { os << value.toData(); } else { os << value.toString(); // toString() provides a text representation of most types }}Om::Value printFunc(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ std::cout << "Om: "; for(auto p: params) { out(std::cout, p); } std::cout << "\n"; return Om::Value();}int main(){ Om::Engine engine; Om::Engine::OutputFlags flags(Om::Engine::OutputFlag::HideDefinedStrings); engine.addModule("print", engine.makeFunction(printFunc)); engine.evaluate("sample.txt", Om::Engine::EvaluateType::File);}The import keyword is scope aware and only introduces the symbol into the import's scope.



var n = 20;if(n > 10){ import print; print(n);}print("end"); // compile error - print symbol not foundThe actual module lookup is peformed at runtime so it is quite possible to compile a function that references modules that habe not yet been added to the engine, as long as they are added before the function is executed. As a result it is possible to create two-way relationships between modules without issues with circular dependancy.


In the print example, the module is simply an Om::Type::Function.

Let's look at a slightly more complex example using the script to define a module instead - a modular reworking of the Person examples above. Firstly we define the Person module in a normal script file:


import print;return{ base = { hair = "brown"; }; make = func(name, age) { return { prototype = this.base; name = name; age = age; describe = func { print(this.name, " is ", this.age, " years old and has ", this.hair, " coloured hair."); }; destructor = func { print("goodbye from ", this.name); }; }; };};Note we are returning an Om::Type::Object here, which gives as a place to store our prototype instance as well as the make function. The make function is the Om equivalent of a constructor here.


In the C++ setup, we can simply do:


int main(){ Om::Engine engine; Om::Engine::OutputFlags flags(Om::Engine::OutputFlag::HideDefinedStrings); engine.addModule("print", engine.makeFunction(printFunc)); engine.addModule("person", engine.evaluate("person.txt", Om::Engine::EvaluateType::File)); engine.evaluate("sample.txt", Om::Engine::EvaluateType::File);}Note in the real world, one would evaluate person.txt into an Om::Value so one could check for compiler errors. The evaluate method will return an Om::Type::Error rather than the object if errors are thrown up by the compiler.


We can now use this module in sample.txt as follows:


import person;var people = [ person.make("Paul", 42), person.make("Eddie", 23), person.make("Jill", 78) ];people[1].hair = "blonde";for(var p: people){ p.describe();}Note that there is no need to import print; into sample.txt now as it is not used directly.


It is also possible to extend Om with types implemented in native code. For example, because Om::Type::Strings are immutable, it is not optimal to concatenate lots of strings together in the script as it produces a great deal of temporary values. Much as in other langauges, what we really need is a stringBuilder that can do this kind of concatenation more efficiently. I'll now describe how to create such a facility in native C++ to make available to the scripts.

A special Om::Type provided for use in the C++ API is Om::Type::Data. This allows the user to store a void* pointer in an Om::Value. We can then access this data from the object instance in the usual way and use it to implement custom object types that interface with C++ code.

Our string builder is going to be based on std::ostringstream, so first of all we can define a representation in C++.


class Rep{public: Rep(){ } Rep(const std::string &s){ os << s; } std::ostringstream os;};Next, we need to provide a function that the script can call to create an instance of the string builder. In this function, we assign the properties of the string builder, using other native functions. I didn't want to provide a void* constructor for Om::Value as that could potentially lead to some dangerous conversions, even with an explicit constructor, so instead there is a static fromData() method instead to make this even more explicit.



Om::Value makeObject(Om::Engine &engine, const std::string &init){ Om::Value o = engine.makeObject(); o.setProperty("data", Om::Value::fromData(new Rep(init))); return o;}Now we can add the methods the script needs to be able to call on the string builder, specifically add() and value(). Om::Value provides a convenience template function, toUserType>(), to make it slightly more concise to cast back the pointer.



Om::Value add(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ if(params.count() != 1 || params[0].type() != Om::Type::String) return engine.makeError("incorrect parameters"); Rep *rep = object.property("data").toUserType(); rep->os << params[0].toString().c_str(); return Om::Value();}Om::Value value(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ Rep *rep = object.property("data").toUserType(); return engine.makeString(rep->os.str().c_str());}Om::Value makeObject(Om::Engine &engine, const std::string &init){ Om::Value o = engine.makeObject(); o.setProperty("data", Om::Value::fromData(new Rep(init))); o.setProperty("add", engine.makeFunction(add)); o.setProperty("value", engine.makeFunction(value)); return o;}For the final step, we need to define a destructor for the object so we can clean up the allocated memory. Note we are really just using existing features of the language here rather than having to implement any special functionality.



Om::Value destroy(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ delete object.property("data").toUserType(); return Om::Value();}Om::Value makeObject(Om::Engine &engine, const std::string &init){ Om::Value o = engine.makeObject(); o.setProperty("data", Om::Value::fromData(new Rep(init))); o.setProperty("add", engine.makeFunction(add)); o.setProperty("value", engine.makeFunction(value)); o.setProperty("destructor", engine.makeFunction(destroy)); return o;}Caution needs to be taken here though. Much like C++'s rule of three (or five), if you are providing a custom destructor and using the Om::Type::Data type, you almost certainly also need to overload clone() or terrible things will happen. The built-in clone() will do a by-value copy of the "data" property, meaning you end up with a double-delete if you clone the object in the script.


Since Om::Type::Objects can override any of the built-in methods with their own properties, we can simply add:


Om::Value clone(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ Rep *rep = object.property("data").toUserType(); return makeObject(engine, rep->os.str());}Om::Value makeObject(Om::Engine &engine, const std::string &init){ Om::Value o = engine.makeObject(); o.setProperty("data", Om::Value::fromData(new Rep(init))); o.setProperty("add", engine.makeFunction(add)); o.setProperty("value", engine.makeFunction(value)); o.setProperty("destructor", engine.makeFunction(destroy)); o.setProperty("clone", engine.makeFunction(clone)); // will override the built-in clone() for this object specifically return o;}Now we are safe to clone() the object inside the script.


We can define all of the above in a cpp file and provide a simple interface function in the header.


Om::Value omStringBuilder(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ return makeObject(engine, "");}Then business as usual setting up the module in main().



int main(){ Om::Engine engine; engine.addModule("print", engine.makeFunction(printFunc)); engine.addModule("stringBuilder", engine.makeFunction(omStringBuilder)); engine.evaluate("sample.txt", Om::Engine::EvaluateType::File);}Now off we can go into the script and use our custom type:



import print;import stringBuilder;var s = stringBuilder();s.add("one, ");s.add("two ");s.add("and three.");print(s.value()); // prints "one, two and three."For types that should not be cloned, for eaxmple a wrapper around a file stream or similar, one can instead provide a clone implementation in C++ like this:



Om::Value clone(Om::Engine &engine, const Om::Value &object, const Om::ValueList &params){ return engine.makeError("unable to clone object");}Then, in the script, if an attempt is made to clone the object, a runtime error will be generated and the script will exit. The destructors will still be called though so the memory clean up will still take place.


A couple of other snippets to mention. Script functions can be defined to take a variable number of parameters using the following syntax:


var f = func(a, b, c...){};The ellipses must be attached to the right-most parameter and when calling the function, you must provide values for the normal parameters. Any additional parameters (more than two in this example) are then accessible using the 'c' symbol which will be of Om::Type::List, containing the additinoal parameters.


Better explained with exmaple code here.


import print;var f = func(a, b, c...){ print(a); print(b); print(c);}f(1, 2); // prints 1, 2, [ ] (an empty list)f(1, 2, 3, 4); // prints 1, 2, [ 3, 4 ]var x = func(p...){ for(var i: p) print(i);};x(); // prints nothingx(1, 2, 3, 4); // prints 1 2 3 4The C++ style ternary operator is supported in Om, as well as short-circuit evalution of and and or. These are of particular use in a dynamically typed language as they can be used to concisely avoid evaluating expressions that would generate a runtime error.



var f = func(a);{ print(a.type == "object" ? a.name : "no name"); if(a.type == "object" and a.name == "Paul") doStuff();}In both cases the dot operator would throw a runtime error if the variable was not an object, so avoiding evaluating these is useful.


I think that is enough information for now. If anyone has made it through, thank you for your perserverence and I'm keen to answer any queries anyone may have.

[EDIT] I've just finished implementing Om's version of the switch statement, and also more general break statements for early-exiting out of loops, both of which work much as you would expect. Unlike C an C++, Om's switch is never turned into a jump table so while it isn't quite the efficient beast we know and love, it also doesn't have the C++ switch limitations - the switch expression and even the case expressions can be of absolutely any expression type. Fallthrough works the same as in C++ though.

Next Time: If anyone is interested, I'll maybe start to lift the lid on how all of this is actually implemented. Here, as a sneak peak, is the current complete OpCode set for the virtual machine, upon which everything above is based. A surprisingly small set of codes I think.

namespace OpCode{enum class Type{ Call, Ret, Push, Pop, PopN, Peek, Jmp, JmpT, JmpF, GetLc, PutLc, GetMb, PutMb, GetSc, PutSc, GetNl, PutNl, GetMd, Math, Cmp, Una, Bool, MkEnt, AddCh, FeChk, FeGet, Inc, Dec, Invalid};const char *toString(Type type);const char *parameters(Type type);enum class Math{ Add, Sub, Mul, Div, Invalid};const char *toString(Math type);enum class Cmp{ Eq, Neq, Lt, LtEq, Gt, GtEq, Invalid};const char *toString(Cmp type);enum class Una{ Neg, Not, Invalid};const char *toString(Una type);template const char *text(uint id){ return toString(static_cast(id)); }}

4 likes 4 comments

Comments

dmatter

Nice writeup and a good read!

It looks like Om has come a long way and, actually, seems to be maturing nicely! Any plans to release it?

Would be interesting ("interesting" :P) to get some perf comparisons compared to V8 or Lua or Python.

This is a topic close to my heart at the moment as I've recently begun building my own programming language too (although it compiles down to .Net CIL rather than a custom VM instruction set).

Keep up the great work!

January 04, 2017 10:58 PM
Aardvajk
Hey dmatter. Thanks for the comment.

I vaguely remember coming across some sites that had standard tests for scripting languages and published the results of tests against the major ones. Would be interesting to find these and compare Om to the big players, although speed of execution was never the primary aim of this project, just a reasonable level of efficiency.

I haven't even started optimising either the generated instructions or the VM yet though so would be a bit premature at this point.
January 05, 2017 07:18 AM
Aardvajk
Damn.

I just realised why Javascript doesn't have deterministic destructors. If two Om::Type::Objects reference each other, they are never released by the reference tracker and I'm not actually sure this is possibile to solve.

A reference cycle could involve any number of objects or lists and will be impossible to detect efficiently so it is also impossible to prevent.

Ho hum. Om could be in serious trouble here. Will have to ponder.
January 05, 2017 02:50 PM
Navyman

Creating your own language, wow that is a massive project!

January 09, 2017 04:49 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement