After finally muddling through enough of the llvm-opt source code to figure out how to order all the optimization passes that LLVM offers, I got a lot of basic optimization and dead-code elimination working on the bitcode that the Epoch VM JITs out. The single biggest contributor to the new-found speed, though, is my addition of simple flow control to the native code emitter.
In a nutshell, this means you can write while() loops in Epoch code that is tagged as [native]. The practical upshot of this is that loop unrolling and constant propagation can now apply to Epoch code, thanks to LLVM. In turn, this means that my nice little brute-force benchmark from the last few days is basically now a couple of multiplications and nothing else.
To recap, here's the modified benchmark using native-code-generation of loops:
//
// JIT.EPOCH
//
// Just in time compilation test for Epoch
//
timeGetTime : -> integer ms = 0 [external("WinMM.dll", "timeGetTime")]
entrypoint :
{
integer four = 1 + 3
assert(four == 4)
vmbench(2, 3)
jitbench(2, 3)
}
vmbench : integer a, integer b
{
integer begintime = timeGetTime()
integer result = vmmath(a, b)
integer duration = timeGetTime() - begintime
string durationstr = cast(string, duration)
print("VM benchmark lasted: " ; durationstr)
string resultstr = cast(string, result)
print("Result: " ; resultstr)
}
jitbench : integer a, integer b
{
integer begintime = timeGetTime()
integer result = jitmath(a, b)
integer duration = timeGetTime() - begintime
string durationstr = cast(string, duration)
print("JIT benchmark lasted: " ; durationstr)
string resultstr = cast(string, result)
print("Result: " ; resultstr)
}
vmmath : integer a, integer b -> integer ret = 0
{
integer counter = 0
integer result = 0
while(counter < 1000000)
{
result = a * b
ret = ret + result
counter = counter + 1
}
}
jitmath : integer a, integer b -> integer ret = 0 [native]
{
integer counter = 0
integer result = 0
while(counter < 1000000)
{
result = a * b
ret = ret + result
counter = counter + 1
}
}
Since the two seed values come from outside LLVM's scope of knowledge, it can't quite turn the benchmark into a no-op, but it can get it down to this:
; ModuleID = 'EpochJIT_4181'
define void @dostuff(i32, i32) nounwind {
entry:
%2 = inttoptr i32 %0 to i32**
%3 = load i32** %2, align 4
%4 = load i32* %3, align 4
%5 = getelementptr i32* %3, i32 1
%6 = load i32* %5, align 4
%7 = mul i32 %6, %4
%8 = mul i32 %7, 1000000
store i32 %8, i32* %5, align 4
store i32* %5, i32** %2, align 4
ret void
}
In the quite-likely scenario that you haven't memorized LLVM's assembly syntax, here's what's going on:
- We have a comment indicating that this is an Epoch-JIT module
- There's a function definition indicating that the JITted function takes two parameters and returns void
- There's an entry-point label for the function
- We take the 0th parameter to the function and cast it from a 32-bit integer to an int** (in C parlance)
- We then dereference this to get an int*, which is the stack pointer from the VM
- Next we load the value off the stack into a local register
- Then we increment the stack pointer by one int
- ...and grab another value off the VM stack into a local register
- Then multiply the two values
- Then multiply by 1 million to account for the loop that used to be there
- Then we store the result back onto the VM's stack, as it expects
- Lastly, we update the VM's stack pointer
- And return!
Here's a dump of running the new JIT test:
Epoch Language Project
Command line tools interface
Executing: D:/Epoch/Programs/JIT Tests/jit.epoch
Parsing... finished in 7ms
Validating semantics... finished in 2ms with 0 error(s)
Generating code... finished in 0ms
DEBUG: VM benchmark lasted: 3780
DEBUG: Result: 6000000
DEBUG: JIT benchmark lasted: 0
DEBUG: Result: 6000000
That's right... counting to 6 million takes the VM 3.78 seconds, and the JITted native code is so fast it can't even be measured by the benchmark. Which stands to reason, considering it's just a couple of imul instructions.
I now stand by my tentative decision to complete deprecate the VM as an execution model. I might keep it as a convenient IR for code and as a jumping-off-point for the JITter, but... going from over 3 seconds to unmeasurably fast is just too cool to pass up. There's really no major wins to be had in hacking on the VM anymore if I can get such excellent performance from JITted code and invest my time in making the JITter more powerful instead.
Next up, I need to do some major refactoring of the JITter and make it not... suck. Then I'll see about finishing up the remaining work items for R12, and kick that puppy out the door.
Woot!