• entries
743
1924
• views
582938

# The "Outer Scope" problem in Om

1145 views

Just finished up Om's solution to what seems to be known as the outer scope problem. Consider:

var outer = func{ var a = "hello"; var inner = func { out a; }; inner(); return inner;};var x = outer();x();Anyone familiar with the problem will be able to spot it here, but in case you are not, here's the same again with some comments explaining it:

var outer = func{ var a = "hello"; var inner = func { out a; // refers to 'a' in the containing function }; inner(); // okay because 'a' still on the stack at this point return inner; // just the inner function being returned}; // outer exits, popping 'a' from the stackvar x = outer();x(); // problem, calls inner(), which references 'a' which no longer existsDifferent languages take different approaches to address this. For example, Lua does some terribly clever thing where it turns any referenced outer variable into a static in a function closure or something.

For Om, to keep it simple, the desire is to have the first call to [font='courier new']inner()[/font] (inside [font='courier new']outer())[/font] work as expected, but the second call to throw a runtime error telling the user that a referenced variable does not exist in the current context, with reasonable efficiency.

The first thing we need is to provide a unique identifier to each function. This is separate from its entity ID. I was going to just use an incrementing uint, but I don't like the fact this will eventually expire (after 4.2 billion times, yes, I know) so I wrote a small [font='courier new']GuidCache[/font] that can hold reference-counted IDs.

#include "framework/CoreTypes.h"#include "framework/PodVector.h"class GuidCache{public: GuidCache(){ } uint take(){ if(!free.empty()) return free.pop_back(); v.push_back(0); return v.size() - 1; } void inc(uint id){ ++v[id]; } void dec(uint id){ if(--v[id] == 0) free.push_back(id); } void output(std::ostream &os) const;private: pod_vector v; pod_vector free;};Most of the costs associated with using this will be at compile-time. In fact, the only method called during runtime will be the [font='courier new']dec()[/font] method so it is fairly simple and efficient.

Next up, we add two members to the [font='courier new']FunctionEntity[/font] class - a [font='courier new']uint guid[/font] and a [font='courier new']pod_vector grefs[/font]. When the [font='courier new']FunctionEntity[/font] is destroyed, it decrements both the [font='courier new']guid[/font] member and anything in the [font='courier new']grefs[/font] array.

bool FunctionEntity::release(State &s, Om::Value &res){ for(auto n: nodes) { if(!s.dec(TypedValue(Om::Type::Function, n), res)) { return false; } } s.guids.dec(guid); for(auto &t: trefs) { s.tc.dec(t); } for(auto &g: grefs) { s.guids.dec(g); } return true;}Next up, in the virtual machine, we already have a call-stack which contains the required information about the current [font='courier new']FunctionEntity[/font] being executed. We add a [font='courier new']pod_vector cids[/font] member to the machine and, whenever we push onto the call-stack, we push the associated [font='courier new']FunctionEntity guid[/font] onto the cids stack.

So now, at runtime, given a function [font='courier new']guid[/font], we can search the [font='courier new']cids[/font] stack backwards to find it and, if we do, its index in the [font='courier new']cids[/font] stack yields the index of the given [font='courier new']FunctionEntity[/font] on the call-stack. One of the things we store in the call-stack is the address on the value stack that the function's local variables begin at, so given the [font='courier new']guid[/font] and the local offset of a variable, we can add the stacktop and local offset together to get the address of the outer variable.

Next, we add two new virtual machine instructions - [font='courier new']GetNl[/font] and [font='courier new']PutNl[/font] (get non-local, put non-local) which implement the read/write methods.

bool Machine::execute(uint id, const Om::Value &object, const Om::ValueList ¶ms, Om::Value &res){ // snip while(true) { OpCode::Type oc = cc().get(); TypedValue v; uint u; switch(oc) { // snip case OpCode::GetNl: u = cc().get(); u = nonLocal(u, cc().get(), res); if(u == invalid_uint) return abort(res); vs.push_back(vs); inc(state, vs.back()); break; case OpCode::PutNl: u = cc().get(); u = nonLocal(u, cc().get(), res); if(u == invalid_uint) return abort(res); if(!dec(state, vs, res)) return abort(res); vs = vs.back(); inc(state, vs); break; // snip } } //snip}uint Machine::nonLocal(uint guid, uint addr, Om::Value &res) const{ if(!cids.empty()) { uint i = cids.size() - 1; while(true) { if(cids == guid) return cs.off + addr; if(i == 0) break; --i; } } res = Om::ValueProxy::makeError(state, "Variable does not exist in current context"); return invalid_uint;}Finally, in [font='courier new']SymbolNode[/font], we update it so that if the variable isn't found in the current function's [font='courier new']SymbolStack[/font], we search back up the [font='courier new']FuncData[/font] stack to see if we can find it in an enclosing function:

namespace{typedef std::pair non_local_pair;non_local_pair findNonLocal(Context &c, const std::string &name){ if(c.funcs.size() < 2) return non_local_pair(0, invalid_uint); uint i = c.funcs.size() - 2; while(true) { const Symbol *s = c.funcs.locals.find(name); if(s) return non_local_pair(s, c.funcs.entity->guid); if(i == 0) break; --i; } return non_local_pair(0, invalid_uint);}bool implement(Context &c, const Source::Position &pos, const std::string &name, OpCode::Type local, OpCode::Type nonLocal){ c.update(pos); const Symbol *s = c.locals().find(name); if(s) { c.pm() << local << s->addr; return true; } auto i = findNonLocal(c, name); if(i.first) { c.pm() << nonLocal << i.second << i.first->addr; c.funcs.back().entity->grefs.push_back(i.second); c.state.guids.inc(i.second); return true; } return c.error(pos, stringFormat("Symbol not defined - ", name));}}bool SymbolNode::generate(Context &c){ return implement(c, pos, name, OpCode::GetLc, OpCode::GetNl);}bool SymbolNode::generateAssign(Context &c){ return implement(c, pos, name, OpCode::PutLc, OpCode::PutNl);}Note that if we issue the [font='courier new']GetNl[/font] or [font='courier new']PutNl[/font] instruction, we also call [font='courier new']GuidCache::inc()[/font] and add the ID to the current function's [font='courier new']grefs[/font] vector so that the ID won't be reused until the current function is destroyed as well as the function the guid refers to.

And we're done. The output from the Om code I posted at the very start of this entry is now:

helloError 5: variable does not exist in current contextA minor improvement I might make next is to output the variable name as part of the error. We can achieve this fairly easily by adding the variable name to the [font='courier new']TextCache[/font] inside [font='courier new']SymbolNode::generate()[/font] and change [font='courier new']GetNl[/font] and [font='courier new']PutNl[/font] to take three [font='courier new']uint[/font] parameters, the last being the [font='courier new']TextCache[/font] ID of the name. We can just add the [font='courier new']TextCache[/font] ID to the [font='courier new']FunctionEntity[/font] [font='courier new']trefs[/font] vector so it releases it when the function in question goes out of scope.

So hopefully a nice, robust, reasonably efficient and acceptable solution there. Means accessing an outer scope variable is far less efficient than accessing a local one or a function parameter, but unless the call stack is absolutely huge, it shouldn't cause any particular bottle-neck as far as I can see.

I could have just stored the guid in the call-stack and searched that to be fair, but I figure it is better for cache coherency to search a vector of uints than a vector of [font='courier new']CallContext[/font] objects, which are significantly larger.

Hope this was of interest and thanks for stopping by.

There are no comments to display.

## Create an account

Register a new account