AngelScript JIT/AOT implementation details (technical)

Started by
48 comments, last by quarnster 14 years, 9 months ago
Letting the jit manipulate the callstack doesn't need to be that hard, it could just call into asCContext::CallScriptFunction directly. If it is worth going to that extent or not I don't know, but definitely something to try out.

Compiling nanojit into an executable produces a 168kb large file on win32 x86 using MSVC, around 100kb on win32 arm. Compiling libjit into a dll using mingw32 (won't go through the effort of trying to make it compile with msvc until maybe I decide that libjit is for me) creates a dll that's around 1Mb on both win32 xp and arm, but ~400k on my ppc mac.

I haven't decided which one of the two I am going to use yet, so I'm just going to go ahead and create a small jit capable of running a simple benchmark and compare both of them in terms of ease to use and the quality of code that they output, and of course the execution speed. I'll report back with a more detailed pro/con table of the two systems once I've got them both tested.

BTW, for libjit I recommend using the head git version (http://git.savannah.gnu.org/cgit/dotgnu-pnet/libjit.git) as all others have only been giving me troubles.
Advertisement
Ok, I've decided that nanojit is in too much flux at the moment for me to start using it. I don't want to commit to a moving target and have to change all of my code when updating to the latest nanojit to get the latest bug fixes.

There's Adobe's version here: http://hg.mozilla.org/tamarin-redux, Mozilla's version here: http://hg.mozilla.org/tracemonkey, and then the merge that is supposed to happen between them here (which doesn't compile currently): https://developer.mozilla.org/en/NanojitMerge.

If someone wants to experiment with nanojit I would recommend Mozilla's version as it doesn't have as much dependencies to other code. In fact someone ripped a version of Mozilla's nanojit and all its dependencies out and placed it here for easy access: http://github.com/doublec/nanojit/tree/master.

Libjit it is for me then.
WitchLord, is the type of a temp "register" (v3 for example) constant over the whole function or can it change mid-run from say int to float or 32 bit to 64 bit?

The problem if it can change is when it comes to saving the value back to the VM from a native register as it can exist in either on of an integer or a float register for example, as there might be an unimplemented intstruction or a suspend happens inbetween the two.

I guess one way to manage that is to keep a separate value (possibly packed) indicating which type the register currently is in.

Loading is fine as we can always load the value into each one of the separate registers needed to represent it.

Btw as a sidenote I was wrong about the size of the libjit dll as I had compiled it with debug info enabled, it is 327k on xp. I had not done that mistake with LLVM though...
The stack space for temporary variables are reused for different primitive types, so you can't rely on a temp being an integer or float forever.

The type of the value should preferably be deferred from the use of the value, rather than having to encode it somewhere.

I'll think about what can be done to aid the construction of a JIT compiler with regards to knowing the type and lifetime of temporary variables.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I could add a couple of byte codes to give hints to the JIT compiler about the type and scope of variables, e.g.

 asBC_RESERVE var, typeid asBC_DISCARD var


These would only be added in case the application has indicated the intention to use JIT compilation, so it won't affect other applications.

Would that work for you?

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Hmm, not so sure about that.

So imagine we use this basic construct for compiled functions:

1) call conv prologue
2) load AS stack variables into native registers
3) jump table for resuming at the correct jitEntry
4) jitentry1 code
jitentry2 code
...
5) save the temp stack variables back to the vm (I believe this can be ignored if we reach a "RET", but must be respected if the function exits for any other reason)
6) Save the value register and others non-temp variables
7) call conv epilogue

The problematic step is 5, as we could get there from different blocks in the code where the temp stack value is in different registers. Basically it is the same problem as the Phi function solves when merging two blocks in SSA form (http://en.wikipedia.org/wiki/Static_single_assignment_form)... As to how the Phi function is actually implemented I have yet to find anyone describe.

Maybe the solution is as simple as making the temporaries that can change type be loaded/stored for each code block rather than at the global level.

And as yet another side point when it comes to my library evaluations, libjit produces sub-optimal code for ARM so it is out of the question for me. Continuing with my own homerolled ARM code is starting to look better and better for the needs I have, as neither libjit nor nanojit seems to solve the problems I originally hoped them to solve. I've gotten some good ideas from them though.
I think it will be extremely difficult for you to load the variables from the stack into the registers at global level. Instead I suggest you load the variable into the registers as they are used.

You can eliminate the use of the stack for temporary variables if, and only if, there is no way for control to pass to or from the the JIT function during the lifetime of the variable.

While the control is with the VM the application can actually go in and change the values of variables through the debug interface. Of course, you may choose to ignore that with your JIT compiler if your application won't use it.

Fortunately most temporary variables are short lived. They are allocated and freed with each expression, sometimes even sub expressions. The life cycle of most temporary variables is this: allocated -> written to -> read from -> deallocated. Very rarely is a temporary variable read multiple times.

Without the hints of when a temporary variable is allocated and deallocated, all you will see is the space on the stack that they are occupy. But in reality this space is used by multiple temporary variables at different times.

Quote:
The problematic step is 5, as we could get there from different blocks in the code where the temp stack value is in different registers. Basically it is the same problem as the Phi function solves when merging two blocks in SSA form (http://en.wikipedia.org/wiki/Static_single_assignment_form)... As to how the Phi function is actually implemented I have yet to find anyone describe.


From the article I understood that the Phi function is not really a function, instead it is just a hint to tell the compiler that all variables in the argument should occupy the same space. That is, if the variable Y is written two in two distinct branches, the Phi function tells the compiler that both of these new instances should occupy the same space (memory or register).

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Quote:Original post by WitchLord
I think it will be extremely difficult for you to load the variables from the stack into the registers at global level.


Really? Why? This is what I'm already doing and it has been proved working with the line callback/suspend. I don't see a problem with loading the most commonly used temp stack variables from the VM at the global scope as if a value can be both a float and an int it could just be loaded into both native registers.

Quote:Instead I suggest you load the variable into the registers as they are used.


This is exactly what I want to avoid as if I need to load a variable into a register when it is used, I also need to write it back all the time when it changes.

As I see it this is only necessary when:
a) We give back control to the VM
b) We run out of registers and need to flush something back to free one (or more) registers up
c) We need to resolve a Phi function

And thus that's what I will be aiming for.

Quote:You can eliminate the use of the stack for temporary variables if, and only if, there is no way for control to pass to or from the the JIT function during the lifetime of the variable.


I meant that if we reach the RET bytecode during runtime the temp variables don't have to be written back to the VM as they are no longer used for anything. All other exits out of the jit function would restore the stack to exactly the state that the original VM would have made it or the jit is fundamentally broken.

Quote:Fortunately most temporary variables are short lived. They are allocated and freed with each expression, sometimes even sub expressions. The life cycle of most temporary variables is this: allocated -> written to -> read from -> deallocated. Very rarely is a temporary variable read multiple times.


Which sounds to me like there aren't that many temporary variables around at a time. In other words, there won't be too many register spills as most (if not all) temp variables will fit in native registers.

Quote:Without the hints of when a temporary variable is allocated and deallocated, all you will see is the space on the stack that they are occupy. But in reality this space is used by multiple temporary variables at different times.


Indeed, and as this memory is non-volatile while the jit function is executing there's no need to load or store them more than once unless absolutely necessary as the only thing changing the meaning of the temp variables is the jit itself.
Well, it's really just a hunch of mine. I've never tried writing a JIT compiler so I can't say what the best way of doing it is. However, I still believe that with the hints of when a temp is allocated and freed, you will have a much easier time optimizing things, because most of the time you won't have to load or store the value on the stack at all.

But, you're the one who is writing the JIT compiler, you're the one that knows what you need.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Quote:Original post by WitchLord
I've never tried writing a JIT compiler so I can't say what the best way of doing it is.


That's a coincidence, neither have I ;)
Up until now that is.

Quote:However, I still believe that with the hints of when a temp is allocated and freed, you will have a much easier time optimizing things, because most of the time you won't have to load or store the value on the stack at all.


After giving this some thought I think this is better if the assumption is made that the jit will need to break out to the VM often. Each jitEntry must be treated separately when it comes to loading/storing values, but as this is the case it'll only load/store the values actually used in this block.

Globally loading/storing the temp variables is better if we assume that we don't need to break out to the vm as variables shared across jitEntry blocks will only be loaded/stored once.

I don't know, I think the globally loading/storing makes for a cleaner implementation, but maybe I'll change my mind if I run into some unforeseen problem.

Quote:But, you're the one who is writing the JIT compiler, you're the one that knows what you need.


Possibly, throwing the ball around to generate new ideas always helps though so I appreciate your input.

This topic is closed to new replies.

Advertisement