AngelScript JIT/AOT implementation details (technical)

Started by
48 comments, last by quarnster 14 years, 8 months ago
Hello! I know the topic of JITs have been up on the table before, but I couldn't find any details concerning actually implementing one. This is mostly directed to you, Andreas, but as this is information that will interest others wanting to add a JIT/AOT compilation for their processor of choice, I choose to post this here (and in English!) rather than emailing you directly. I started playing around with AOT compiling for AS as a fun sparetime project to get to know the ARM architecture a bit better. But before I get too deep into the actual implementation (and would have to rewrite the whole shebang to fit with your ideas) I figured I'd ask you what your thoughts are when it comes to JIT/AOT compiling and how it all would hook into the AngelScript engine. I think writing a JIT/AOT compiler is a big project to take on unless you can split it out into chunks and take it one step at a time. As such, one of the features I'm interested in is to be able to mix native machine code with AS bytecode and switch between them. Thus I suggest a new AS bytecode instruction is implemented "JIT" (or something else, but I'll use that name from now on). I've not thought this through 100% so the actual implementation will likely change during the course of the work on this. But my current idea is that the "JIT" bytecode would:
    - if JIT is enabled:
        - save l_bc, l_sp, l_fp to the context
        - Call "ExecuteJIT(this, machineCode)",
             where "this" is the asCContext and "machineCode" is a pointer to
             the array containing the native code to execute (where this code
             buffer is actually stored is to be decided...)
        - load l_bc_l_sp,l_fp from the context
    - else:
        - nop
The JIT/AOT compiler does not replace the implemented bytecodes but rather just injects the JIT bytecode before the section that it does support natively (see further down for an actual example). This allows us to disable the native code at runtime and treat it as a nop. Otherwise, as exactly what happens to l_bc can be unknown at compile time, l_bc in the context must be updated by the machineCode to skip the AS bytecodes and their arguments that were implemented by the JIT/AOT. It could also potentially work when implementing jumps/calls/suspend in native code, although exactly how the return/resume would work I've not figured out yet. But figuring out how to really compile calls into native code is IMO a separate issue as there are plenty of bytecodes to implement that would not be affected no matter if script calls are actually done within bytecode or machine code, so I've not really spent any time thinking about this. In any case, having this ability to mix native and AS bytecode, would allow me to implement one AS bytecode at a time, would not break as soon as a new bytecode is added (although if it's a frequently used one it would obviously hurt perf), would still allow for co-routines and would allow you to disable the JIT at runtime if you want to step through the AS bytecode with a debugger. So with this new bytecode instruction in mind, lets turn our attention to which part of the code would emit it. As I'm only in the very early testing phase, I added a call into my AStoARM compiler between Optimize and ResolveJumpAddresses in asCByteCode's Finalize method. This is easy and straight forward, however the biggest problem I can think of offhand is that this will mess with crossplatform loading and saving of the bytecode (when this becomes available). Surely code saved on one platform and loaded on another would want to have the JIT/AOT step redone for maximum perf. The other alternative I can think of is to save/load a version that does not have any JIT instructions and do the native compilation step as a post process operation. However it's important to note that all PC relative instructions would have to be patched when the new instruction is inserted between for example a jump and its destination. As I don't want the JIT to break once a new PC relative instruction is added, I suggest a common interface for adding the JIT instruction that will take care of this patching so that JIT don't have to know how to do it itself. The problem I can think of with this solution is that it would have to shuffle memory around when inserting the JIT instruction, although it could keep a buffer around with the JIT instructions to insert and not actually do it until the compiler says it is done. This or something similiar is probably a must for writing an actual JIT rather than an AOT. Any other suggestions? For the second alternative, where would the native compiler hook in? Another interface needed is one for taking ownership of the buffer that is to contain the native machine code. I suggest that the JIT instruction's parameter is an index that identifies which buffer to use, although exactly whom the JIT/AOT compiler should talk to to allocate the buffer and get the index I don't know. The script function, the module, the engine? What do you suggest? I'm sure there was something I forgot, but this has been a long post already. What do you think? Any alternative solutions you've been thinking of? Disregarding the problems with having the compilation done before ResolveJumpAddresses, and native machinecode buffer management (ie I only have one buffer), as a test I have implemented two of AS's bytecode instructions MULi and ADDi. The AngelScript source code used in the test was:
int TestBasic(int a, int b, int c)
{
    return a + b * c;
}
Which compiles into:
Temps: 1

    0   0 *    PUSH     1
- 3,5 -
    1   1 *    SUSPEND
    2   0      JIT      0               ; Notice the new instruction
    3   1 *    MULi     v1, v-1, v-2
    5   1 *    ADDi     v1, v0, v1
    7   1 *    CpyVtoR4 v1
    8   1 * 0:
    8   0 *    RET      3
With the native code for the JIT-supported section (from MULi to (including) ADDi) being:

0xe92d4030 stmdb     sp!, {r4, r5, lr}  ; Prologue
                                        ; Save return pointer and scratched
                                        ; registers we are required to save on
                                        ; the (native) stack

0xe5902028 ldr       r2, [r0, #0x28]    ; Load AS's stack frame pointer from the asCContext
0xe5921004 ldr       r1, [r2, #0x4]     ; Load v-1
0xe5923008 ldr       r3, [r2, #0x8]     ; Load v-2
0xe0040391 mul       r4, r1, r3         ; Perform the multiplication
0xe5925000 ldr       r5, [r2]           ; Load v0
0xe0854004 add       r4, r5, r4         ; Perform the add
    ; Epilogue
    ; Here at this code we notice that we don't support the next AS bytecode
    ; As such we need to flush any changed data and return from the code
    ;
    ; If on the other hand the next opcode would also have been supported, we
    ; would have continued executing even further and would not write any data
    ; back until it is needed, either by running out of native registers to
    ; use, or when it is time to exit
0xe5024004 str       r4, [r2, #-0x4]    ; Save v1 to the stack as it changed
0xe5904020 ldr       r4, [r0, #0x20]    ; Load byteCode pointer from the context
0xe2844014 add       r4, r4, #0x14      ; Add the number of AS bytecode data to
                                        ; skip (includes the JIT instruction)
0xe5804020 str       r4, [r0, #0x20]    ; Save back to the context
0xe8bd8030 ldmia     sp!, {r4, r5, pc}  ; Restore scratched registers we had to
                                        ; save (according to the ARM calling
                                        ; convention) and return (by setting
                                        ; pc = lr)
This piece of code has been verifed as working. Obviously this is not optimized assembly code for the ARM platform, as multiple sequenceive loads could be done with a single instruction, and the multiply and add too. But optimizing the generated native code is a completely different topic.
Advertisement
Wow, this is difficult one. I didn't have plans for JIT compiler support for a long time yet, so I haven't really given it that much thought yet.

I think that the JIT compiler should be registered by the application, rather than being linked directly within the AngelScript engine. Perhaps something like this:

void asIScriptEngine::SetJITCompiler(asIJITCompiler *compiler);


where asIJITCompiler is an interface defined in angelscript.h with all the methods that AngelScript needs to call in order to allow the JIT compiler do it's work.

The memory allocations needed to store the machine code must be allocated by the JITCompiler itself, because it will probably have special needs that AngelScript shouldn't have to worry about, e.g. data alignment, execution bits, etc. The AngelScript engine needs to store a pointer to the buffer so the VM can call it. AngelScript will also need a way of telling the JITCompiler when a buffer is no longer needed, e.g. when a script module is discarded.

I think your idea with the JIT instruction is a good one, and is in line with my own thoughts of not loosing ability to suspend scripts, debugging, co-routines, etc. It can probably be combined with the SUSPEND instruction so there is no need for an extra bytecode that would hurt performance where JIT compilation isn't available. The JIT instruction should be added by the compiler, or possibly in the asCByteCode post-compilation processing. But the actual JIT compilation must happen at the very end, so that it will work both for script compilation and for loading pre-compiled bytecode.

To perform the JIT compilation AngelScript should probably do a scan over the bytecode, and for each JIT instruction it finds it should make a callback to the JIT compiler to provide a machine code buffer for execution. AngelScript will then store this buffer in the engine and update the JIT instruction argument to point to the buffer.

For JIT compilation I would also like to allow the application to substitute entire script functions with machine code. This, however, has to be initiated by the application itself, as this kind of code will loose the ability to be suspended/debugged. This will probably be supported in a completely different way, possibly by having the application read the compiled bytecode, produce a JIT compiled version and then substitute the bytecode in the script function for the JIT compiled version.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I like the idea of having the application register the JIT, it feels like a very flexible solution and I believe it would make it even easier for people to write their own JITs. Well, hook them in atleast :)

The SUSPEND and JIT instructions could most likely be combined. You have no plans on using the argument for SUSPEND for anything do you? If not that can be used to identify a JIT segment. 0 means "no jit", any other number is an identifier used to select which machine code buffer to use.

Also do you know of any way to "execute" a buffer from within c/c++? I couldn't figure it out so I just created an assembly file for my ExecuteJNI function as described in the previous post that did "bx r1" (branch without setting the lr register, ie when the subroutine in r1 (our JITed machine code) returns, it'll return back into ExecuteNext in asCContext). If there's no such thing in c/c++, asCContext would would have to call into the JIT compiler as it will know how to run the code.

"The JIT instruction should be added by the compiler, or possibly in the asCByteCode post-compilation processing. But the actual JIT compilation must happen at the very end, so that it will work both for script compilation and for loading pre-compiled bytecode."

With "compiler" in this paragraph, do you refer to the AS compiler or the JIT compiler?

The caveat I see with the former is that if the very next instruction after the AS compiler issued JIT instruction is not supported by the actual JIT compiler, then all AS bytecodes between there and the next JIT instruction will not be JITed even though the others might be supported.

The caveat with the latter is that a JIT compiler supporting only every other instruction would insert lots of JIT instructions making the saved bytecode perfom worse when loaded in an environment that doesn't support JIT compilation.

A mixture of both might be what is wanted. If the JIT instruction is SUSPEND then these will be inserted already by the former, and while the JIT compiler then wouldn't be required to, it could insert extra SUSPEND as it sees fit.

Can the SUSPEND instruction be ignored by a JIT if it chooses to? I know the ability to abort a script gets lost, as will the linecallback stuff. Anything else?

As the SUSPEND is issued per line, and we have for example
    int one   = 1;    int two   = 2;    int three = 3;


Which turns into
    2   6 *    SUSPEND    3   6 *    SetV4    v1, 0x1          (i:1, f:1.4013e-045)- 4,5 -    5   6 *    SUSPEND    6   6 *    SetV4    v3, 0x2          (i:2, f:2.8026e-045)- 5,5 -    8   6 *    SUSPEND    9   6 *    SetV4    v4, 0x3          (i:3, f:4.2039e-045)


If the JIT must respect the suspend, then these will all be three different JIT segments, and if we don't actually need to suspend we'll either be going in and out of machine code unnecessarily, causing a lot of unneeded data flushes and reads or something clever would have to be thought up so that one code buffer can jump into the next. This would require that the JIT patches up the previous section as needed and the new section loads the data the previous section and this one share up front, so that those loads can be skipped by the previous section with a jump right into the second section after the loads. Unknown if this actually would work or not, but I guess it could :)

BTW, as this is a spare time project I'm likely several weeks if not months from something that is actually useful, so there is plenty of time to discuss how this should be designed. Please do write down your thoughts if you come up with something.
Quote:Original post by quarnster
The SUSPEND and JIT instructions could most likely be combined. You have no plans on using the argument for SUSPEND for anything do you? If not that can be used to identify a JIT segment. 0 means "no jit", any other number is an identifier used to select which machine code buffer to use.


SUSPEND is currently used to let the VM check for the need to suspend execution at safe points, i.e. when no loose references are hanging around. The script compiler adds a SUSPEND instruction before each script statement.

If you wish, you can just add a JIT instruction for your tests. When I merge it into the final code I'll make the optimizations for the bytecode generation.

Quote:Original post by quarnster
Also do you know of any way to "execute" a buffer from within c/c++? I couldn't figure it out so I just created an assembly file for my ExecuteJNI function as described in the previous post that did "bx r1" (branch without setting the lr register, ie when the subroutine in r1 (our JITed machine code) returns, it'll return back into ExecuteNext in asCContext). If there's no such thing in c/c++, asCContext would would have to call into the JIT compiler as it will know how to run the code.


Just cast the buffer pointer to a function pointer, and then call it. Of course, the JIT compiler must make sure to allocate the buffer in a way that the OS will allow the execution of the code. In Windows there are special allocation routines for this (don't remember their names, just now). Not sure if there is anything similar on the ARM system you're working with.

Quote:Original post by quarnster
"The JIT instruction should be added by the compiler, or possibly in the asCByteCode post-compilation processing. But the actual JIT compilation must happen at the very end, so that it will work both for script compilation and for loading pre-compiled bytecode."

With "compiler" in this paragraph, do you refer to the AS compiler or the JIT compiler?


I meant the AS compiler. :)

Quote:Original post by quarnster
The caveat I see with the former is that if the very next instruction after the AS compiler issued JIT instruction is not supported by the actual JIT compiler, then all AS bytecodes between there and the next JIT instruction will not be JITed even though the others might be supported.

The caveat with the latter is that a JIT compiler supporting only every other instruction would insert lots of JIT instructions making the saved bytecode perfom worse when loaded in an environment that doesn't support JIT compilation.

A mixture of both might be what is wanted. If the JIT instruction is SUSPEND then these will be inserted already by the former, and while the JIT compiler then wouldn't be required to, it could insert extra SUSPEND as it sees fit.


I don't think we should put much weight on JIT compilers that doesn't support the complete set of instructions. A compiler like that is obviously an unfinished product and will likely produce inefficient code anyway.

The JIT compiler should not be allowed to modify the bytecode, though maybe AngelScript can insert extra JIT instructions if needed, e.g. after CALL instructions, as the JIT compiler shouldn't try to implement script function calls. Inserting instructions after the bytecode is completed require a bit of work, the bytecode buffer needs to be resized, the relocation addresses needs to be updated, etc, so I think we should try to avoid this as much as possible.

Quote:Original post by quarnster
Can the SUSPEND instruction be ignored by a JIT if it chooses to? I know the ability to abort a script gets lost, as will the linecallback stuff. Anything else?

As the SUSPEND is issued per line, and we have for example
    int one   = 1;    int two   = 2;    int three = 3;


Which turns into
    2   6 *    SUSPEND    3   6 *    SetV4    v1, 0x1          (i:1, f:1.4013e-045)- 4,5 -    5   6 *    SUSPEND    6   6 *    SetV4    v3, 0x2          (i:2, f:2.8026e-045)- 5,5 -    8   6 *    SUSPEND    9   6 *    SetV4    v4, 0x3          (i:3, f:4.2039e-045)


If the JIT must respect the suspend, then these will all be three different JIT segments, and if we don't actually need to suspend we'll either be going in and out of machine code unnecessarily, causing a lot of unneeded data flushes and reads or something clever would have to be thought up so that one code buffer can jump into the next. This would require that the JIT patches up the previous section as needed and the new section loads the data the previous section and this one share up front, so that those loads can be skipped by the previous section with a jump right into the second section after the loads. Unknown if this actually would work or not, but I guess it could :)

BTW, as this is a spare time project I'm likely several weeks if not months from something that is actually useful, so there is plenty of time to discuss how this should be designed. Please do write down your thoughts if you come up with something.


I don't think the JIT compiler should ignore the SUSPEND instruction. AngelScript should be responsible for removing unecessary SUSPEND instructions as part of its bytecode optimization. Which SUSPEND instructions get removed is also controllable by the application through engine properties. Currently there is only one property for this, asEP_BUILD_WITHOUT_LINE_CUES, which removes all SUSPEND instructions except one at the start of each function and loop.

I'll probably add another property for removing suspend instructions between trivial statements (i.e. those statements that don't call any functions). That would optimize your example into:

    2   6 *    SUSPEND    3   6 *    SetV4    v1, 0x1          (i:1, f:1.4013e-045)    6   6 *    SetV4    v3, 0x2          (i:2, f:2.8026e-045)    9   6 *    SetV4    v4, 0x3          (i:3, f:4.2039e-045)


Besides, if complete JIT compilation is wanted, then the second approach should be used, i.e. the one that substitutes the entire function, rather than just small sections.

If you haven't done so already, I think you really should have a look at LLVM. I was thinking about using it myself when I started working on JIT compilation support, though I haven't really looked into the details of it yet.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Quote:Original post by WitchLord
If you wish, you can just add a JIT instruction for your tests. When I merge it into the final code I'll make the optimizations for the bytecode generation.


As the JIT shouldn't be allowed to modify the bytecode, I don't see the need for that :)

Quote:Original post by WitchLord
Just cast the buffer pointer to a function pointer, and then call it. Of course, the JIT compiler must make sure to allocate the buffer in a way that the OS will allow the execution of the code. In Windows there are special allocation routines for this (don't remember their names, just now). Not sure if there is anything similar on the ARM system you're working with.


Cool got it to work. So far I've been able to execute code from just normal memory allocation so don't think there's a need for any special memory allocation on my system.

Quote:Original post by WitchLord
The JIT compiler should not be allowed to modify the bytecode, though maybe AngelScript can insert extra JIT instructions if needed, e.g. after CALL instructions, as the JIT compiler shouldn't try to implement script function calls.


Yep inserting them after a function call would be good.

Quote:Original post by WitchLord
Besides, if complete JIT compilation is wanted, then the second approach should be used, i.e. the one that substitutes the entire function, rather than just small sections.


I see your point. Good to know this is already supported. Agreed on the rest of the points you made.

Quote:Original post by WitchLord
If you haven't done so already, I think you really should have a look at LLVM. I was thinking about using it myself when I started working on JIT compilation support, though I haven't really looked into the details of it yet.


What, and give up all the fun bits? :)
Quote:Original post by quarnster
What, and give up all the fun bits? :)


He he, I perfectly understand what you mean. :)

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

So I've been thinking a lot about cross section branching (where a section is separated by SUSPEND/JIT instructions) lately. The reason is this:

for (int i = 0; i < 100000000; i++){    // ...    // some expensive code here    // more expensive code...    // ...    if (func1())    {        func2();    }    else    {        func3();    }}


Now as JITs shouldn't try to implement function calls there will be 4 sections here that define where the JIT can jump in and resume executing native code, one at the very start and one after each function call. However all the branches both for the if/else and for the for loop will be cross section branches.

So lets say that one section has no knowledge whatsoever about the other sections and that cross section branching thus is not allowed. Then that means that the branches in this example can not be a native branch instruction but must be implemented as in AS. That in turn means that if a branch target is in the middle of a section, the JIT can't resume and perf will suck.

Adding SUSPEND/JIT instructions at every branch target would let the JIT resume, however now we got more sections meaning that if cross section branching is not allowed, we'll enter and exit native code all the time doing unneeded flushes and reads of data shared between the sections.

As such I'd like to allow cross section branching for best performance. The JIT shouldn't implement the actual SUSPEND code, but it should be able to detect whether it should suspend or not and if it shouldn't it could jump straight into the next section.

This would produce something like (pseudo):

section1:    int i = 0;forloop:    if (i >= 100000000)    {        // Yep, even this is a cross section branch        flushDataUsedBySection1butNot4();        goto end;    }    int tmp = func1();    // Here a SUSPEND/JIT instruction is inserted and splitting the two sections    // however section1 checks the suspend to see whether it is actually a suspend    // or a nop and acts accordingly    if (yesReallySuspend)    {        flushData();        return to scriptEngine;    }    else    {        flushSomeData(); // But only the data used by section1 that is not used by section2        goto section2_shared_data;    }section2:    // It'll enter here if we actually had to suspend    loadDataExclusiveToSection2();section2_shared_data:    loadDataSharedBetween1and2();    if (tmp)    {        func2();        // Again, another suspend is inserted for section 3, but we        // check in this section whether it is an actual suspend or not        if (yesReallySuspend)        {            flushData();            return to scriptEngine;        }        else        {            flushDataUsedBySection2butNot3();            goto section3_shared_data;        }    }    goto section3_shared_data;section3:    loadDataExclusiveToSection3();section3_shared_data:    loadDataSharedBetween2and3();    func3();    // Another suspend..    if (yesReallySuspend)    {        flushData();        return to scriptEngine;    }    else    {        flushDataUsedBySection3butNot4();        goto section4_shared_data    }section4:    loadDataExclusiveToSection4();section4_shared_data:    loadDataSharedBetween3and4();    goto forloop;end:    flushData();    return to scriptEngine;}


To make this work, the JIT must know about all the script sections up front to be able to create the cross section branching. As such, for best code/data cache usage too, I believe that each bytecode segment (I believe they are split per function?) has one and only one machine code buffer associated with it and then when the script engine wants to resolve where the SUSPEND/JIT instruction should execute, the JIT would return the offset into this buffer.

Even if cross section branching is not supported the single buffer, multiple section approach would work as rather than branching into the next section it'd just always do the "yesReallySuspend" path.

What do you think?
I agree. We should really try to avoid having to switch back to the VM needlessly.

And yes, there is one bytecode continous bytecode buffer for each script function.

In the future it may even be possible to inline function calls to application registered functions. But for this I really think the application needs to be able to flag functions that can be inlined and which cannot. For example, a call to sqrt() can safely be inlined by the JIT compiler, but a call to CreateCoRoutine() probably shouldn't be inlined by the JIT compiler. With inlining I mean to call the application function directly without switching back to the VM first. Doing an actual inlining, the way C++ would do it, may be possible if the JIT compiler knows the implementation of a function, but that is not of concern to AngelScript.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Status report.


int TestBasic(int a, int b, int c){    for (int i = 0; i < 30000; i++)       a += i;    return a;}


Becomes AngelScript bytecode:
Temps: 2, 3    0   0 *    SUSPEND    1   0 *    PUSH     3- 3,5 -    2   3 *    SetV4    v1, 0x0          (i:0, f:0)    4   3 * 1:    4   3 *    SUSPEND    5   3 *    CMPIi    v1, 30000    7   3 *    TS    8   3 *    ClrHi    9   3 *    JZ       +5              (d:16)- 4,8 -   11   3 *    ADDi     v0, v0, v1   13   3 * 3:   13   3 *    IncVi    v1   14   3 *    JMP      -12              (d:4)- 5,5 -   16   3 * 2:   16   3 *    CpyVtoR4 v0   17   3 * 0:   17   0 *    RET      3



Which becomes ARM native code:
0x00000000 0xe92d4000 stmdb       sp!, {lr}0x00000004 0xe5903060 ldr         r3, [r0, #0x60]0x00000008 0xe243300c sub         r3, r3, #0xc0x0000000c 0xe5803060 str         r3, [r0, #0x60]0x00000010 0xe3a01000 mov         r1, #0x00x00000014 0xe5d0301c ldrb        r3, [r0, #0x1c]0x00000018 0xe3530000 cmp         r3, #0x00x0000001c 0xe5903028 ldr         r3, [r0, #0x28]0x00000020 0xe5031004 str         r1, [r3, #-0x4]0x00000024 0xe5901020 ldr         r1, [r0, #0x20]0x00000028 0xe2811010 add         r1, r1, #0x100x0000002c 0xe5801020 str         r1, [r0, #0x20]0x00000030 0x08bd4000 ldmiaeq     sp!, {lr}0x00000034 0x0a000000 beq          0x00x00000038 0xe8bd8000 ldmia       sp!, {pc}0x0000003c 0xe92d4010 stmdb       sp!, {r4, lr}0x00000040 0xe5903028 ldr         r3, [r0, #0x28]0x00000044 0xe5131004 ldr         r1, [r3, #-0x4]0x00000048 0xe59f3040 ldr         r3, [pc, #0x40]0x0000004c 0xe1510003 cmp         r1, r30x00000050 0x03a0c000 moveq       r12, #0x00x00000054 0xb3e0c001 mvnlt       r12, #0x10x00000058 0xc3a0c001 movgt       r12, #0x10x0000005c 0xe1b0c00c movs        r12, r120x00000060 0x43a0c001 movmi       r12, #0x10x00000064 0x53a0c000 movpl       r12, #0x00x00000068 0xe20cc0ff and         r12, r12, #0xff0x0000006c 0xe1b0c00c movs        r12, r120x00000070 0xe5903020 ldr         r3, [r0, #0x20]0x00000074 0x02833014 addeq       r3, r3, #0x140x00000078 0xe5904028 ldr         r4, [r0, #0x28]0x0000007c 0xe580c030 str         r12, [r0, #0x30]0x00000080 0xe283301c add         r3, r3, #0x1c0x00000084 0xe5803020 str         r3, [r0, #0x20]0x00000088 0xe8bd4010 ldmia       sp!, {r4, lr}0x0000008c 0xea000000 b            0x00x00000090 0x00007530 andeq       r7, r0, r00x00000094 0x0a00000e beq          0xe0x00000098 0xe92d4000 stmdb       sp!, {lr}0x0000009c 0xe590c028 ldr         r12, [r0, #0x28]0x000000a0 0xe51c1004 ldr         r1, [r12, #-0x4]0x000000a4 0xe59c2000 ldr         r2, [r12]0x000000a8 0xe0822001 add         r2, r2, r10x000000ac 0xe2811001 add         r1, r1, #0x10x000000b0 0xe590c028 ldr         r12, [r0, #0x28]0x000000b4 0xe50c1004 str         r1, [r12, #-0x4]0x000000b8 0xe58c2000 str         r2, [r12]0x000000bc 0xe590c020 ldr         r12, [r0, #0x20]0x000000c0 0xe24cc01c sub         r12, r12, #0x1c0x000000c4 0xe580c020 str         r12, [r0, #0x20]0x000000c8 0xe8bd4000 ldmia       sp!, {lr}0x000000cc 0xeaffffff b            0xffffff0x000000d0 0xeaffffd9 b            0xffffd90x000000d4 0xe92d4000 stmdb       sp!, {lr}0x000000d8 0xe5901028 ldr         r1, [r0, #0x28]0x000000dc 0xe5912000 ldr         r2, [r1]0x000000e0 0xe1a0c002 mov         r12, r20x000000e4 0xe5901028 ldr         r1, [r0, #0x28]0x000000e8 0xe580c030 str         r12, [r0, #0x30]0x000000ec 0xe590c020 ldr         r12, [r0, #0x20]0x000000f0 0xe28cc004 add         r12, r12, #0x40x000000f4 0xe580c020 str         r12, [r0, #0x20]0x000000f8 0xe8bd8000 ldmia       sp!, {pc}



There are some oddities here and there, and obviously I have not started doing any sort of optimization yet. But that tiny script works in the JIT and I do get the correct result back from it. Hooooray! \o/

Branching took forever to get working, and will still need a lot more work (as will the rest of it).

I should really go to bed, but I have not even had supper yet... It's dangerous being "really close" to getting something to work ;)


Bytecode and native code mixed in case you are interested in following what happens (note that some instructions are patched up in a postprocessing step, and the epilogue printout isn't always an epilogue, just a data flush which is why it comes up twice):
New section: 0, 4, (pos: 1)---------------        ; adding 0xe92d4000 stmdb       sp!, {lr}  v1 (first written to: 1, written to at all: 1, score: 1, native: 1)---------------PUSH              ; adding 0xe5903060 ldr         r3, [r0, #0x60]        ; adding 0xe243300c sub         r3, r3, #0xc        ; adding 0xe5803060 str         r3, [r0, #0x60]SetV4     v1, 0x0 (0, 0.000000)        ; adding 0xe3a01000 mov         r1, #0x0SUSPEND           ; adding 0xe5d0301c ldrb        r3, [r0, #0x1c]        ; adding 0xe3530000 cmp         r3, #0x0=================Epilogue        ; adding 0xe5903028 ldr         r3, [r0, #0x28]        ; adding 0xe5031004 str         r1, [r3, #-0x4]        ; adding 0xe5901020 ldr         r1, [r0, #0x20]        ; adding 0xe2811010 add         r1, r1, #0x10        ; adding 0xe5801020 str         r1, [r0, #0x20]=================        ; adding 0x08bd4000 ldmiaeq     sp!, {lr}        ; adding 0x0a000000 beq          0x0=================Epilogue=================        ; adding 0xe8bd8000 ldmia       sp!, {pc}New section: 4, 9, (pos: 5)---------------        ; adding 0xe92d4000 stmdb       sp!, {lr}        ; adding 0xe5903028 ldr         r3, [r0, #0x28]  v1073741828 (first written to: 1, written to at all: 1, score: 4, native: 12)        ; adding 0xe5131004 ldr         r1, [r3, #-0x4]  v1 (first written to: 0, written to at all: 0, score: 1, native: 1)---------------CMPIi     v1, 0x7530 (30000, 0.000000)        ; adding 0xe59f3000 ldr         r3, [pc]        ; adding 0xe1510003 cmp         r1, r3        ; adding 0x03a0c000 moveq       r12, #0x0        ; adding 0xb3e0c001 mvnlt       r12, #0x1        ; adding 0xc3a0c001 movgt       r12, #0x1TS                ; adding 0xe1b0c00c movs        r12, r12        ; adding 0x43a0c001 movmi       r12, #0x1        ; adding 0x53a0c000 movpl       r12, #0x0ClrHi             ; adding 0xe20cc0ff and         r12, r12, #0xffJZ        0x5 (5, 0.000000)        ; adding 0xe1b0c00c movs        r12, r12        ; adding 0xe5903020 ldr         r3, [r0, #0x20]        ; adding 0x02833014 addeq       r3, r3, #0x14=================Epilogue        ; adding 0xe5904028 ldr         r4, [r0, #0x28]        ; adding 0xe580c030 str         r12, [r0, #0x30]        ; adding 0xe283301c add         r3, r3, #0x1c        ; adding 0xe5803020 str         r3, [r0, #0x20]=================        ; adding 0xe8bd4010 ldmia       sp!, {r4, lr}        ; adding 0xea000000 b            0x0        ; adding 0x00007530 andeq       r7, r0, r0        ; adding 0x0a000410 beq          0x410New section: 11, 14, (pos: 11)---------------        ; adding 0xe92d4000 stmdb       sp!, {lr}        ; adding 0xe590c028 ldr         r12, [r0, #0x28]        ; adding 0xe51c1004 ldr         r1, [r12, #-0x4]  v1 (first written to: 0, written to at all: 0, score: 2, native: 1)        ; adding 0xe59c2000 ldr         r2, [r12]  v0 (first written to: 0, written to at all: 1, score: 2, native: 2)---------------ADDi      v0, v0, v1        ; adding 0xe0822001 add         r2, r2, r1IncVi     v1        ; adding 0xe2811001 add         r1, r1, #0x1JMP       0xfffffff4 (-12, 1.#QNAN0)=================Epilogue        ; adding 0xe590c028 ldr         r12, [r0, #0x28]        ; adding 0xe50c1004 str         r1, [r12, #-0x4]        ; adding 0xe58c2000 str         r2, [r12]        ; adding 0xe590c020 ldr         r12, [r0, #0x20]        ; adding 0xe24cc01c sub         r12, r12, #0x1c        ; adding 0xe580c020 str         r12, [r0, #0x20]=================        ; adding 0xe8bd4000 ldmia       sp!, {lr}        ; adding 0xeaffffff b            0xffffff        ; adding 0xea000404 b            0x404New section: 16, 18, (pos: 16)---------------        ; adding 0xe92d4000 stmdb       sp!, {lr}        ; adding 0xe5901028 ldr         r1, [r0, #0x28]  v1073741828 (first written to: 1, written to at all: 1, score: 1, native: 12)        ; adding 0xe5912000 ldr         r2, [r1]  v0 (first written to: 0, written to at all: 0, score: 1, native: 2)---------------CpyVtoR4  v0        ; adding 0xe1a0c002 mov         r12, r2RET       =================Epilogue        ; adding 0xe5901028 ldr         r1, [r0, #0x28]        ; adding 0xe580c030 str         r12, [r0, #0x30]        ; adding 0xe590c020 ldr         r12, [r0, #0x20]        ; adding 0xe28cc004 add         r12, r12, #0x4        ; adding 0xe580c020 str         r12, [r0, #0x20]=================        ; adding 0xe8bd8000 ldmia       sp!, {pc}
This is fantastic watching it come together! I just wish it was for x86 so my project could leverage it, any plans to make an x86 version?
==============================
A Developers Blog | Dark Rock Studios - My Site

This topic is closed to new replies.

Advertisement