Jump to content

  • Log In with Google      Sign In   
  • Create Account

Implementing Higher Order Functions on Top of LLVM

Posted by ApochPiQ, 07 April 2013 · 1,111 views

One of my favorite features of the Epoch programming language is the inclusion of first-class higher order functions:
apply : (thefunction : )

apply : string param, (thefunction : string -> string) -> string ret = thefunction(param)

mutate : string param -> string ret = param ; " foo"

entrypoint :
    string s = apply("test", mutate)
    assert(s == "test foo")
In the old VM model (which, as I've written about quite a bit recently, is going away) this is pretty easy to do. I just store the name of the function I want to invoke on the stack, and the VM knows how to jump into that function using an instruction called INVOKE_INDIRECT.

This works because the calling of functions is not checked at runtime - it's verified statically by the compiler, but once the bytecode is generated, there's no sanity checking to ensure that I've bound the correct number of parameters to the stack before invoking a function. Not having any checking allows higher-order functions to be invoked trivially in Epoch's calling convention: the callee simply reads off the expected parameters, and everything just works.

However, moving to JIT native code implementation on LLVM has proven tricky in this particular area. LLVM does not let you simply call arbitrary functions without knowing their signatures - which is a totally reasonable restriction.

Our saving grace is that we know the signature of the function already - it was checked at compile time, after all - so we just need to look at the Epoch bytecode metadata to figure out how to tell LLVM to invoke a function.

Remember how I mentioned that the VM used to call functions by name? This is a minor headache, because while the function signature is statically known, the mapping of the name to actual code can change dynamically at runtime. (If it couldn't, higher order functions would be kind of useless.) Suppose my example program selected a function to pass to apply() based on user input; the JIT layer wouldn't know which function call to embed into the LLVM bitcode.

The lazy solution would be to store the mapping in memory and use a thunk to translate the function name on the stack back into a code address. However, this would represent an annoying runtime cost, and might obliterate the possibility of doing certain inlining optimizations for simple uses of higher-order functions. So that option is out.

The proper fix is to change the compiler a bit. We'll still emit function names and push them onto the stack in the bytecode; however, we'll include a type annotation that mentions that this isn't just another string handle, but actually a function name handle.

Next, once the bytecode is loaded into the JIT layer, we'll detect these annotations and note each time a function's name is used as a higher order function parameter. Instead of emitting code to store the name on the machine stack, we'll look up the LLVM generated function for the target, and emit a function pointer instead. Remember, we know the signature statically, so we can get away with this without violating type safety in LLVM.

Last but not least, we implement INVOKE_INDIRECT in such a way that it looks at the function pointer it's being asked to call, and then fixes up the machine stack and calls it.

Unfortunately, it turns out that adding the metadata emission of function signatures is a nontrivial modification to the code. But it's the right thing to do, so I'm going to tackle it either way.

So far I'm an hour or so into this project and it looks like it might take a hefty couple of days to get it done properly. Such is life, I suppose.

I look forward to having 3 or 4 more of the compiler tests in the suite pass once this change is in!

[edit a few hours later] It's now 3:22 AM. 51 of 54 tests are passing. I am a happy, happy hacker.

May 2016 »

222324 25 262728