DeyjaScript 10 : Lambdas

posted in Jemgine
Published October 11, 2011
Advertisement
I started thinking, okay, I've written this thing. Now how can I actually use it? Wasn't I supposed to be making a mud engine? Yeah, well, I already don't want to anymore. Instead, I'm going to use DeyjaScript in this thing, https://www.gamedev.net/blog/342/entry-2250747-gnome-colony-build-demo/ . I'll use it to implement an in-game console where the player can assign macros - bits of DeyjaScript code - to keys. This is also how I'll implement key binding and re-mapping. This decision led me to the next required feature in DeyjaScript - Lambdas. The goal is to be able to type something like 'bindkey [key] [lambda]' into the console.

The first thing I need to support lambdas is function pointers. A function pointer will be able to point to a method or a lambda function. To make this possible, lambdas will masquerade as methods. Like arrays, I'll first create a type-less raw system type, and then wrap it in a generic type. Since it will be impossible for scripts to do anything with this type (though they could create one, since the language lacks any sort of access control) it doesn't have any methods. It's just a data carrier. I add two instructions to create them. pushMemberFunctionPointer and pushLambdaPointer. The first looks up the member, wraps it's code pointer in the system type, and pushes it. The second takes the current code pointer, changes it's start index, wraps it and pushes it. Since lambdas will be smashed onto the end of the function they are declared in, and can't be referenced anywhere else, this works great. A final instruction, invokeFunctionPointer, takes the wrapped code pointer and jumps to it.

Next, the wrapper generic.


Func {
funcPtr raw;
object _this;
void construct(object _this, funcPtr ptr)
{
raw = ptr;
this._this = _this;
}
R __invoke(T0 t0) {
//What goes here??
}
}


There's another reserved method name. I'll use '__invoke' later to support operator(). Two places in the grammar create Func<...> objects. Accessing members, and lambda declarations. Both of these emit code to create the funcPtr type, then to create a Func<...> with the correct types, and call it's constructor, so that it appears to the script that the type of a method or lambda is 'Func<...>'.


//Inside LambdaExpressionNode.emitBytecode...
bytecode.AddBytecode(Instruction.pushVariableToStack, (byte)125); //push this
callIndex = bytecode.Count + 1;
bytecode.AddBytecode(Instruction.pushLambdaPointer, BitConverter.GetBytes((Int16)0));
bytecode.AddBytecode(Instruction.pushNewObject);
NewStatementNode.emitIndexStack(bytecode, funcType, 0, context);
var constructor = funcType.findFunctionBySimpleName("construct");
constructor.EmitCallBytecode(bytecode);

//replace the raw function pointer with the Func<..> object.
bytecode.AddBytecode(Instruction.loadRegisterFromStack, (byte)Register.returnValue);
bytecode.AddBytecode(Instruction.popN, 3); //pop Func<...>, funcPtr, this
bytecode.AddBytecode(Instruction.pushRegisterToStack, (byte)Register.returnValue);


The lambda has an address of 0 because the enclosing function needs to fill that value in later. The actual syntax of a lambda looks like this -

var func = [int a, int b, int c, int r] { return a + b + c; };

I'm not thrilled with having to specify the return value as the last type argument, or with having to name it. The name is never used. Maybe I'll find a way to deal with that later. I want to support calling Funcs with variable() syntax, so I have to throw a bunch of code all over. A new grammar construction 'expression + ( + parameter list + )' catches cases where the func is the result of an operation, but when it's just a variable, the parser mistakes it for an ordinary function call. The function call node gets to deal with it too. I could replace the function call node with the new version entirely, but only at the cost of highly ineffecient function calls (Create a function pointer, invoke it, throw it away - compared to just that middle step.)

To compile lambdas, the enclosing function keeps track of all the lambdas declared in itself, and after it compiles it's own code it compiles all of them and tacks them onto the end of it's bytecode.

I skipped over an important detail earlier, the actual implementation of Func<...>.__invoke. How, exactly, do I invoke the raw funcPtr object? I have an instruction to do it, but I need to prepare the stack and actually call it. The answer turns out to be inline bytecode. To support that, I need to build a table of all the instructions and the number and sizes of their arguments. Then I add it to the grammar and compile it. It does no checking of any sort, it just spits opcodes into the bytecode stream, so it's very dangerous. But it allows me to write this


R __invoke(T0 t0) {
inline {
pushVariableToStack 126; //push return pointer
loadRegisterFromStack 4; //Remember return pointer
pop;

pushVariableToStack 125; //push this
pushMemberToStack 0; //push raw
loadRegisterFromStack 5; //Remember func pointer
pop;

pushVariableToStack 125; //push this
pushMemberToStack 1; //push _this
loadRegisterFromStack 6; //Remember _this;
pop;

loadRegisterFromStack 0; //recover frame pointer
popN 3; //remove frame pointer, return pointer, and this pointer

pushRegisterToStack 6; //Push _this;
pushRegisterToStack 4; //Push return pointer
pushRegisterToStack 5; //Push func pointer

invokeFunctionPointer;
}
}


I re-write the top of the stack to remove the current frame, and then I invoke the function. When that function returns, it will go directly to the code that called __invoke, as if __invoke never existed at all. Since the parameters to __invoke are on the stack, I never have to touch them. I use a few magic constants (The this pointer is always at stack position 125, which is the same as -3 - it's encoded as 128 + offset, to support negative offsets), and I know where __invoke's frame pointer and return value are at on the stack. By time I invoke, the stack looks as if the code that called __invoke called the function pointed to directly, complete with it's this pointer.

So, lambdas are a go. Maybe next I'll start integrating it with that other thing. Or maybe I'll add loops. I'm not out of 'advanced' features I can implement before the basics at all, maybe this language will never have loops.

Aside : As it turns out, deducing the return type of a lambda from the contents of that lambda is impossible. Even in the trivial case of a lambda that returns an int, the system can't handle it. Does it return int? Does it return object? Either is valid. The only way to deduce the return type will be from what the lambda is assigned to, or from the type of the argument it is in a function call. Presumably, all types could be deduced this way, but it would mean type resolution becomes compilation order dependent.
0 likes 3 comments

Comments

ApochPiQ
Epoch solves the return-type inference problem by not having a shared "object" base for everything. Since there's no such thing as a boxed/unboxed primitive type, anything that purports to return an int will definitely be returning an int.

Maybe not the most amazing way to do it, but I like the results :-)
October 12, 2011 12:41 AM
Deyja
That works for primitives, but how does it work when an object hierarchy is involved?

I suppose I'll end up having the make sure whatever the lambda is being passed to is fully resolved first. Thankfully type resolution isn't order dependent because of my 4-pass compilation...
October 12, 2011 03:36 AM
ApochPiQ
Epoch's object model is a bit weird, but oddly powerful. I should probably write up a full description on it someday :-)
October 13, 2011 09:53 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement