• entries
743
1924
• views
582996

Modules in the Om scripting language

1276 views

Modules in Om

I never wanted to add include statements to Om. I was always very keen to ensure that each script file could be compiled in isolation without having to do any kind of old-fashioned text-insertion.

I've spent quite a long time pondering the best way to approach a module system for Om. What I have finally come up with is, I hope, simple, efficient enough and flexible. It supports circular dependancies and a referencing a module symbol has approximately the same lookup cost as an object member access.

An Om module can be any Om value. Most commonly I would think this would be an [font='courier new']Om::Object[/font], in order to provide a grouping of functions and values behind a common namespace, but they can equally be an [font='courier new']Om::Function[/font], [font='courier new']Om::String[/font], even an [font='courier new']Om::Int[/font] if you really wanted.

Writing an module is just a case of writing a normal Om script and returning the module from the script. Nothing extra needed. For example, a module we will call ios that has a method [font='courier new']print()[/font] that passes its parameters to the [font='courier new']out[/font] operator (pointless, of course, just as an example).

return{ print = func(v...){ for(var n: v) out v; };};This script just returns an [font='courier new']Om::Object[/font] with a member called [font='courier new']print[/font] that points to an Om varadic function. We compile this to an [font='courier new']Om::Value[/font], then add it to the system using [font='courier new']Om::Engine::addModule()[/font].

void main(){ Om::Engine e; Om::Value ios = e.evaluate("ios.om", Om::Engine::EvaluateType::File); e.addModule("ios", ios);}To then use this module in another Om script, we have added the keyword [font='courier new']import[/font].

import ios;return func{ ios.print(10, 20, "hello");};An import statement is scope-aware, so the symbol becomes available to the scope the import is in, and any child scopes, but not outward, so you can locally import a module into a function, or just the else case of an if and so on. Putting it at the top of the file makes it available to everything in this file.

As we'll see in the implementation details, the module is not actually required to be loaded until it is first accessed (in the [font='courier new']ios.print[/font] statement above for example). So we could compile the above script without ios being installed, as long as we install it before we then execute the function. In this way, circular dependancies can be easily set up. For example, assuming the above script is in a file called test.om.

void main(){ Om::Engine e; Om::Value v = e.evaluate("test.om", Om::Engine::EvaluateType::File); // fine to compile without ios installed e.addModule("ios", e.evaluate("ios.om", Om::Engine::EvaluateType::File)); // can check the return for errors if needed v.execute(Om::Value(), { }); // fine to execute now ios is installed e.releaseModule("ios"); // can manually remove, checking for errors if required, else automatically cleaned up when engine dies}Note that by the time we get to the [font='courier new']v.execute()[/font] statement, everything has already been compiled and we are dealing purely with bytecode and in-memory representation of entities.

Later, we can imagine having the system scan a directory or path list looking for *.om files to install as modules dynamically on-demand so we don't have to manually install them and so on. Quite a few possibilities there but we'll stick with manual installation for now.

How it Works

A central feature to many of the features of Om is the reference-counted [font='courier new']TextCache[/font] system. The module system really revolves around this so I'll just revise this briefly.

The [font='courier new']TextCache[/font] is part of the [font='courier new']State[/font] object that exists as long as the [font='courier new']Om::Engine[/font] exists and represents shared information that persists between different compilation and execution operations. The [font='courier new']TextCache[/font] has a (poorly named perhaps) [font='courier new']add()[/font] method that takes a string and returns a [font='courier new']uint[/font] id. If, when [font='courier new']add()[/font] is called, the string is not already in the cache, it is added and its new id is returned, otherwise the existing id is returned.

The [font='courier new']TextCache[/font] also has [font='courier new']inc()[/font] and [font='courier new']dec()[/font] methods that, for a given id, increment or decrement the reference-count on that entry, removing it when the reference-count gets to zero.

[font='courier new']Om::Function[/font] and [font='courier new']Om::Object[/font] entities interally both have a [font='courier new']pod_vector trefs[/font] which keeps track of any references they need to strings. When these are destroyed, they decrement all these ids in the [font='courier new']TextCache[/font].

We also have a [font='courier new']Context[/font] class which is created per compilation run, and passed around among all the compiler system functions. This stores the shared information that is needed for one particular pass of a script by the compiler.

The [font='courier new']Context[/font] maintains the local variable [font='courier new']SymbolStacks[/font] (basically a stack of stacks of [font='courier new']Symbols[/font] which have name and id/address) for all the functions. When we enter a new scope, we push the current local [font='courier new']SymbolStack[/font] and we pop it when we exit, which is how local variable scoping works.

First up, we added a new [font='courier new']SymbolStack[/font] to the [font='courier new']Context[/font] called [font='courier new']modules[/font]. Whenever we push the local [font='courier new']SymbolStack[/font], we also push the [font='courier new']modules[/font] stack and vice versa when we pop it.

When the compiler finds an import statement, it transforms each entry in the comma-separated list of symbols into an [font='courier new']ImportNode[/font] which is added to the abstract syntax tree. When this [font='courier new']ImportNode[/font] is then generated during the code-generation stage, it doesn't actually do anything in terms of loading or finding modules, it just puts the symbol in the [font='courier new']TextCache[/font], increments its id and adds the id to the current function [font='courier new']trefs[/font] vector. It also adds the symbol to the [font='courier new']modules SymbolStack[/font].

bool ImportNode::generate(Context &c){ c.update(pos); if(c.locals().findInScope(name) || c.modules.findInScope(name)) { return c.error(pos, stringFormat("Symbol already defined - ", name)); } uint id = c.state.tc.add(name); c.state.tc.inc(id); c.entity().trefs.push_back(id); c.modules.add(name); return true;}Later, when we come across the usage of a symbol in the script, this ends up being handled by [font='courier new']SymbolNode[/font]. This already first searches the local [font='courier new']SymbolStack[/font] to see if the symbol is a local variable and, if not, searches up the [font='courier new']Context[/font]'s function stack, checking each function's [font='courier new']SymbolStack[/font]. If it finds it in this pass, it knows it is a non-local variable and has sufficient information then to generate the bytecode.

After these checks, we now also check to see if the symbol is found anywhere in the [font='courier new']modules SymbolStack[/font]. If so, we pass the symbol through the [font='courier new']TextCache[/font] to retrieve the id, then generate the [font='courier new']OpCode::GetMd[/font] instruction, using the id as the parameter. Again, no check is done at this stage to actually load or find the module.

bool implement(Context &c, const Source::Position &pos, const pod_string &name, OpCode::Type local, OpCode::Type nonLocal, OpCode::Type module){ // snip, check for local and non-local symbols const Symbol *m = c.modules.find(name); if(m) { if(module == OpCode::NullCode) { return c.error(pos, stringFormat("Unable to write to module - ", name)); } uint id = c.state.tc.add(name); c.pm() << module << id; c.state.tc.inc(id); c.entity().trefs.push_back(id); return true; } return c.error(pos, stringFormat("Symbol not defined - ", name));}When a module is installed (more later), we also put the symbol in the [font='courier new']TextCache[/font] and retrieve and increment the id.

The key point is that no matter what order we do things in in terms of import statements, module usage and module installation, we know the particular [font='courier new']TextCache[/font] id associated to a module symbol will always map to the same value on any given run. This might be hard to visualise based on this description (if you have even made it this far) but it works.

Finally, we added a [font='courier new']pod_map[/font] to the [font='courier new']State[/font] class to hold the installed modules. When we call [font='courier new']Om::Engine::addModule[/font], it creates or retrieves the [font='courier new']TextCache[/font] id, increments it, then adds the actual module value to the map using the id.

Om::Value Om::ValueProxy::addModule(State &s, const pod_string &name, const Value &value){ TRACE; uint id = s.tc.add(name); s.tc.inc(id); TypedValue t = toTypedValue(value); s.modules.insert(id, t); ::inc(s, t); return value;}Note that [font='courier new']Om::Value[/font] is the external user's view of an [font='courier new']Om::Value[/font] which automates incrementing and decrementing, whereas [font='courier new']TypedValue[/font] is the internal representation that has to be manually incremented and decremented.

Now, finally, we just have to implement the [font='courier new']OpCode::GetMd[/font] instruction, which is simplicity itself:

bool Machine::getMd(uint id, Om::Value &res){ auto m = state.modules.find(id); if(m == state.modules.end()) { res = Om::ValueProxy::makeError(state, stringFormat("Module not found - ", state.tc.text(id))); return false; } vs.push_back(m->value); inc(state, vs.back()); return true;}The [font='courier new']pod_map[/font] is a flat map that maintains a sorted list of key/value pairs, sorted by key, so this is ultimately a binary search of a contiguous block of memory and seems very performant from some profiling I have been doing. I'll talk a bit more about profiling later as it is a big topic and one I have only just started out on. But this is essentially the same as an [font='courier new']Om::Object[/font] member lookup so an acceptable cost.

So, to sum up, you can import symbols in a scope-aware manner, you can compile such code without the modules being present and you only have to ensure the module is loaded before you execute the code as the first time the module is actually looked for is the first time it is actually used.

And there you have it. Thanks for stopping by.

There are no comments to display.

Create an account

Register a new account