• entries
    743
  • comments
    1924
  • views
    580047

Om and the FunArg Problem

Sign in to follow this  
Aardvajk

1281 views

Om and the FunArg Problem

There is a problem with languages that support functions as first class citizens which turns out to be very well studied and even has a proper name according to Wikipedia - the funarg problem. After weeks of pondering, I've finally come up with a solution for Om that fulfills my criteria of being logical, simple to implement, reasonably efficient and adds no cost to the accessing of normal variables.

Here's the problem in its simplest example:
var a = func{ var x = 10; return func { out x; };};var b = a();b();It is quite logical for the inner function to be able to access [font='courier new']x[/font], since it is visible and in scope. But when we retreive the inner function as a value and assign it to [font='courier new']b[/font] in the outer scope, then call it, at that point [font='courier new']x[/font] does not exist.

I'm quite happy for now for this to be a runtime error in Om, as long as we can detect it and alert the user with a good error message. Lua does some clever stuff with things called upvalues that turn the local into a hidden static in a closure constructed from the function but there is no need for that in Om. As long as we can efficiently access the variable when it does exist, and detect and show an error when it does not, that's fine for now.

There are more complicated examples of this problem involving functions nested to deeper levels. For example:
var a = func{ var x; var g = func { var y = func { out x; }; y(); }; var h = g(); h();};A little hard to follow, but here the function assigned to [font='courier new']y[/font] is called twice, once with the call stack being [font='courier new'][global, func a, func g, func y][/font] then again with the call stack being [font='courier new'][global, func a, func y][/font], so the distance on the call stack from the invocation of [font='courier new']y[/font] to the position of [font='courier new']x[/font] on the stack can be different for different runs of the same function.

I'm trying to keep function call overhead as low as possible, so I don't want to have to modify the code when a function is called. I decided that some kind of dynamic searching was inevitable so was a case of finding the simplest and most efficient way to do this.

Thankfully in Om, there are no such thing as reference parameters to functions so it is always possible to tell at compile time when you are accessing a non-local var. This means I can leave the existing op-codes as is for accessing vars and globals and emit special op-codes when accessing non-locals. These op-codes have to do a bit more work so this is important to me.

What I decided to do, and have implemented is as follows. I'm generating a unique unsigned int id for each function, taken from a pool that recycles old ids when functions are destroyed. Will run out of memory long before we run out of ids so this is okay. This id is stored in the function entity at compile time.

At run time, when an [font='courier new']OpCode::Call[/font] is issued, as well as the normal pushing of the call stack, I also push the function's unique id onto another stack called [font='courier new']cids[/font] (call ids). This stack is just a [font='courier new']pod_vector[/font] so this is a tiny operation. So if we locate the position of a function id in this stack, we know what position on the call stack the function currently resides at.

When the compiler searches for a symbol and fails to find it in the current function's symbol stack, it goes on to search backwards up the compile-time function stack until it finds it in another function's symbol table. It then emits an [font='courier new']OpCode::GetNl[/font] (get non-local) or [font='courier new']OpCode::PutNl[/font]. These op-codes take two parameters - the unique id of the function and the address in that function's stack space of the variable.
typedef std::pair non_local_pair;// c.funcs is a compile-time maintained stack of information about the functions currently// being compiled, including a pointer to the function entity in the entity heapnon_local_pair findNonLocal(Context &c, const Source::Location &location, const raw_string &name){ if(c.funcs.size() < 2) return non_local_pair(0, invalid_uint); uint i = c.funcs.size() - 2; while(true) { const Symbol *s = c.funcs.locals.find(name); if(s) return non_local_pair(s, c.funcs.entity->id); if(i == 0) break; --i; } c.error(location, stringFormat("Symbol not defined - ", name)); return non_local_pair(0, invalid_uint);}bool SymbolNode::generate(Context &c){ const Symbol *s = find(c, location, name); // local function search if(s) { c.pm() << get(s->type) << s->addr; return true; } auto n = findNonLocal(c, location, name); if(!n.first) return false; switch(n.first->type) { case Symbol::Local: c.pm() << OpCode::GetNl << n.second << n.first->addr << c.s.tc.add(name.data()); break; // name is already in text cache from declaration, so tc.add just returns the address in this case case Symbol::Global: c.pm() << OpCode::GetGb << n.first->addr; break; default: return false; } return true;}This is actually if it turns out the value is a local. If it references a global it can just emit the normal [font='courier new']OpCode::GetGb[/font] or [font='courier new']OpCode::PutGb[/font] since the position of globals is fixed and can always be accessed, but this is a bit beyond the scope and point of this journal entry.

Om stack access works by storing an upwards offset from the start of the stack when the function executes, and when we call a function, we store the current stack position as an offset and all local accesses just add their address to that value at runtime to find the actual stack position. This has turned out to be vastly superior to accessing vars backwards from the top of the stack for a number of reasons, including the fact that you don't need to update the stack positions in the compiler as you push and pop things from the stack, but it also makes this system possible.

So, for example, [font='courier new']OpCode::GetNl[/font] reads its function id and stack address parameters, then searches backwards up the [font='courier new']cids[/font] stack until it finds the id. Should be a very fast operation since this should compile down to very simple array access. If it fails to find the id, we can runtime error out. These commands actually take an extra parameter too which is the address in the text-cache of the variable name involved, so we can tell the user at runtime what variable has caused the problem.
uint Machine::nonLocal(const triple &u) const{ uint i = cids.size() - 1; while(true) { if(cids == u.first) return cs.off + u.second; if(i == 0) break; --i; } std::cout << "Var does not exist in current context - " << s.tc[u.third] << "\n"; return invalid_uint;}If we find it, the index in [font='courier new']cids[/font] it is found at tells us the position in the call-stack that the function currently exists at, so we can then lookup and find that function's current stack offset, then just add the address parameter to this to find the actual stack position and get or set it as required.

Searching [font='courier new']cids[/font] backwards has the advantage that if we have a chain of functions that are called recursively that involve a non-local reference, we should always map to the most recent instantiation of the local variable, which is as good a defined behaviour as any in these circumstances.

Seems to work very well and has no extra cost added to accessing of local variables or global variables. Its a shame we have to do the dynamic search of the cids array, but this is highly unlikely to be significant. One could, of course, keep a dynamic mapping of the ids and call stack offsets but I suspect it wouldn't be any faster than searching a small array for a uint.

Took weeks to decide on and about an hour to implement which is a good sign, wanted a simple solution here. I may decide later to do something more clever with accessing expired values, like Lua does, but if I do it will just extend this system.

Hope that made sense, not the easiest thing to follow when you aren't down and deep in the code. Thanks for reading.
Sign in to follow this  


2 Comments


Recommended Comments

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now