Epoch runtime: now with StupidFast

Published April 08, 2012
Advertisement
Did some more experimentation with LLVM and Epoch's JIT system; and damn I am pleased with the results.

After finally muddling through enough of the llvm-opt source code to figure out how to order all the optimization passes that LLVM offers, I got a lot of basic optimization and dead-code elimination working on the bitcode that the Epoch VM JITs out. The single biggest contributor to the new-found speed, though, is my addition of simple flow control to the native code emitter.

In a nutshell, this means you can write while() loops in Epoch code that is tagged as [native]. The practical upshot of this is that loop unrolling and constant propagation can now apply to Epoch code, thanks to LLVM. In turn, this means that my nice little brute-force benchmark from the last few days is basically now a couple of multiplications and nothing else.


To recap, here's the modified benchmark using native-code-generation of loops:

//
// JIT.EPOCH
//
// Just in time compilation test for Epoch
//


timeGetTime : -> integer ms = 0 [external("WinMM.dll", "timeGetTime")]


entrypoint :
{
integer four = 1 + 3
assert(four == 4)

vmbench(2, 3)
jitbench(2, 3)
}


vmbench : integer a, integer b
{
integer begintime = timeGetTime()
integer result = vmmath(a, b)
integer duration = timeGetTime() - begintime
string durationstr = cast(string, duration)
print("VM benchmark lasted: " ; durationstr)
string resultstr = cast(string, result)
print("Result: " ; resultstr)
}


jitbench : integer a, integer b
{
integer begintime = timeGetTime()
integer result = jitmath(a, b)
integer duration = timeGetTime() - begintime
string durationstr = cast(string, duration)
print("JIT benchmark lasted: " ; durationstr)
string resultstr = cast(string, result)
print("Result: " ; resultstr)
}


vmmath : integer a, integer b -> integer ret = 0
{
integer counter = 0
integer result = 0

while(counter < 1000000)
{
result = a * b
ret = ret + result
counter = counter + 1
}
}


jitmath : integer a, integer b -> integer ret = 0 [native]
{
integer counter = 0
integer result = 0

while(counter < 1000000)
{
result = a * b
ret = ret + result
counter = counter + 1
}
}



Since the two seed values come from outside LLVM's scope of knowledge, it can't quite turn the benchmark into a no-op, but it can get it down to this:

; ModuleID = 'EpochJIT_4181'

define void @dostuff(i32, i32) nounwind {
entry:
%2 = inttoptr i32 %0 to i32**
%3 = load i32** %2, align 4
%4 = load i32* %3, align 4
%5 = getelementptr i32* %3, i32 1
%6 = load i32* %5, align 4
%7 = mul i32 %6, %4
%8 = mul i32 %7, 1000000
store i32 %8, i32* %5, align 4
store i32* %5, i32** %2, align 4
ret void
}


In the quite-likely scenario that you haven't memorized LLVM's assembly syntax, here's what's going on:

  • We have a comment indicating that this is an Epoch-JIT module
  • There's a function definition indicating that the JITted function takes two parameters and returns void
  • There's an entry-point label for the function
  • We take the 0th parameter to the function and cast it from a 32-bit integer to an int** (in C parlance)
  • We then dereference this to get an int*, which is the stack pointer from the VM
  • Next we load the value off the stack into a local register
  • Then we increment the stack pointer by one int
  • ...and grab another value off the VM stack into a local register
  • Then multiply the two values
  • Then multiply by 1 million to account for the loop that used to be there
  • Then we store the result back onto the VM's stack, as it expects
  • Lastly, we update the VM's stack pointer
  • And return!



Here's a dump of running the new JIT test:

Epoch Language Project
Command line tools interface

Executing: D:/Epoch/Programs/JIT Tests/jit.epoch

Parsing... finished in 7ms
Validating semantics... finished in 2ms with 0 error(s)
Generating code... finished in 0ms

DEBUG: VM benchmark lasted: 3780
DEBUG: Result: 6000000
DEBUG: JIT benchmark lasted: 0
DEBUG: Result: 6000000


That's right... counting to 6 million takes the VM 3.78 seconds, and the JITted native code is so fast it can't even be measured by the benchmark. Which stands to reason, considering it's just a couple of imul instructions.


I now stand by my tentative decision to complete deprecate the VM as an execution model. I might keep it as a convenient IR for code and as a jumping-off-point for the JITter, but... going from over 3 seconds to unmeasurably fast is just too cool to pass up. There's really no major wins to be had in hacking on the VM anymore if I can get such excellent performance from JITted code and invest my time in making the JITter more powerful instead.


Next up, I need to do some major refactoring of the JITter and make it not... suck. Then I'll see about finishing up the remaining work items for R12, and kick that puppy out the door.


Woot!
Previous Entry LLVM Madness
Next Entry New scribbling!
1 likes 1 comments

Comments

Alpha_ProgDes
You really should trademark that!
April 14, 2012 02:33 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement