AngelScript x86 asm VirtualMachine

Started by
16 comments, last by WitchLord 19 years, 6 months ago
Hi here are some performance tests that i did today, i dont have real scripts to test it with, but i wanted to test it, so here they are This is with for(i=0; i<100; i++) (all TestFrameWork tests) ----------------------------------------------------------------- Timings at 10:09:57 Wednesday, October 06, 2004 ----------------------------------------------------------------- AngelScript X86 ASM Virtual Machine ----------------------------------------------------------------- ExecuteNext() took 12.35 milli secs (called 6,100 times) ExecuteNext() took 12.08 milli secs (called 6,100 times) ExecuteNext() took 12.23 milli secs (called 6,100 times) ----------------------------------------------------------------- Timings at 10:11:45 Wednesday, October 06, 2004 ----------------------------------------------------------------- AngelScript Normal Virtual Machine ----------------------------------------------------------------- ExecuteNext() took 37.14 milli secs (called 78,200 times) ExecuteNext() took 38.21 milli secs (called 78,200 times) ExecuteNext() took 38.74 milli secs (called 78,200 times) This is with for(i=0; i<100; i++) (all TestFrameWork tests and the testexecutescript in STEP_INTO ) ----------------------------------------------------------------- Timings at 10:34:55 Wednesday, October 06, 2004 ----------------------------------------------------------------- AngelScript X86 ASM Virtual Machine with STEP_INTO flag ----------------------------------------------------------------- ExecuteNext() took 136.43 milli secs (called 6,200 times) ExecuteNext() took 142.61 milli secs (called 6,200 times) ----------------------------------------------------------------- Timings at 10:31:31 Wednesday, October 06, 2004 ----------------------------------------------------------------- AngelScript Normal Virtual Machine with STEP_INTO flag ----------------------------------------------------------------- ExecuteNext() took 244.61 milli secs (called 83,600 times) ExecuteNext() took 236.91 milli secs (called 83,600 times) This is with in testexcutescipt for(i=0; i<1000; i++) { ctx->Prepare(main_function) ctx->Execute() // TestAsmVm.as } TestAsmVm.as script is int i; int n; void main() { i = 1; n = 1; o_stream(i, n); // output the values if(i == n) { i = 2; o_stream(i, n); } else { i = 3; o_stream(i, n); } if(i == 4) { i = 0; o_stream(i, n); } else { i = 1; o_stream(i, n); } } ----------------------------------------------------------------- Timings at 11:33:40 Wednesday, October 06, 2004 ----------------------------------------------------------------- AngelScript X86 ASM Virtual Machine ----------------------------------------------------------------- ExecuteNext() took 1.29 milli secs (called 1,002 times) ExecuteNext() took 1.29 milli secs (called 1,002 times) ExecuteNext() took 1.28 milli secs (called 1,002 times) ----------------------------------------------------------------- Timings at 11:31:45 Wednesday, October 06, 2004 ----------------------------------------------------------------- AngelScript Normal Virtual Machine ----------------------------------------------------------------- ExecuteNext() took 181.56 milli secs (called 54,011 times) ExecuteNext() took 116.85 milli secs (called 54,011 times) ExecuteNext() took 151.52 milli secs (called 54,011 times) look at this last test, this is more what i was expecting, and compare the times ExecuteNext() function is called, so you can add here the savings of the function call overhead, that was not included, as the performance was tested inside the ExecuteNext() function In the first test, thee are a performance gains of about 200% to 220% only, but keep in mind that most of the testframework tests are a single line script, but in the third performance where a more longer script is executed 1000 times in a row (a situation more like a real game, every frame) you see a big gain in speed The x86 asm vm is implemented in the as 1.8.2beta1, the plain vm was replaced with mine, no extra optimizations to the bytecode or the bytecode generator/compiler, that is why i expect the perfomance increase greatly impelmenting this X86 vm in the last AS, but this is hard as the library is currently in a state of high optimization from each release, and now focused on multithreating, and i dont want to focus on this as i expect to optimize the X86 vm a little more and gain like 10% more of speed, but i want to use the edi register to hold the bytecode_ptr, and in multhithreating this is not posible, if several scripts will run on parallel, on a single processor machine I think that will be better if WitchLord and i can sincronize this vm with the current release or wip, because updating it can be easier than trying to update to the last release, when releases are separated by a few days Best Regards, Lioric
Advertisement
WOW! These are certainly impressive results. I look forward to seeing exactly how this was implemented. It will also be interesting to see how the performance changes for the other performance tests that I have.

What does it mean that you call ExecuteNext() only 6100 in the optimized version as opposed to 78200 times in the normal version? I understand that you are putting a loop inside ExecuteNext(), right? I hope you exit this loop at the right places, when the execution should be allowed to be suspended.

I will release version 1.10.0 WIP 1 today (I hope), but if you send me the code I will incorporate it in 1.10.0 WIP 2. I don't think it should be that difficult, as the VM doesn't really change that much between versions. I should be able to update your implementation with the new bytecodes. When I release 1.10.0 WIP 2 you can look over the code again to see if you can find more optimizations.









AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Hi

The reason why ExecuteNext is called a few times is because i changed the way the vm works, that was a switch vm, and mine uses a reloc table that it creates in the first compilation time, and i dont see a reason why ExecuteNext have to be called for every code

The execution is allowed to be suspended in the normal way

Im not using loops in the ExecuteNext, its a big function with no loops, even the pop and push functions where removed

The library has to be compiled with the USE_ASM_VM define, and can be switched to the normal vm with the USE_NORMAL_VM, but the code compiled with any vm can be only executed with that vm

About script compilation, i like the idea of having a debug and a release script compile modes, a few bcodes extra for the debug compilation of scripts, like linenumbers ans so (but, linenumber is removed in the last release) and we can create a database with the local variables and the type (for interpretation and display) and the release version can be further optimized

Lioric
I'm very curious to know how this implementation works. When can you send me the code?

I was planning on building the database with variable types for a future version. It's something that is needed to serialize script states, which is something I've decided to support in the future. Once that is complete it shouldn't be too much work to give the application the possibility to peek into the context stacks and even change it.

The line numbers has been removed from the bytecode but can be looked up when needed, which is done when you set an exception.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I was wondering how difficult it would be to make both versions bytecode compatible? I'm asking because storing the bytecode would be useless if it might be using a different (incompatible) VM to execute it.

What are the main differences in bytecode?
I'm really impressed by these results. Is the AsmVM hand crafted or JIT'd using something like Softwire? Good work Lioirc :)
Gyrbo:

The bytecode that is saved with SaveByteCode() should be compatible with either VM. I don't know if Lioric's code already is, but if it isn't I will make the necessary changes to guarantee this.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Hi

the saved compiled script is not compatible with the other vm, as the normal vm is bytecode based, but for performance i removed the bytecodes after the optimization phase in the x86 vm, it dont uses bytecodes anymore, but more like a normal execution where the pointer to the next instruction is updated (like the eid register), so i dont have to decode a bytecode and "switch" to the correct code
Because of this capability, i want to update my code to use the edi register to store the code_ptr (currently uses a DWORD*), but this is a problem in multithread if you want several scripts to run in parallel
I prefer to do optimizations for single thread, and a define that compile the library in multithread version where these optimizations can be disabled for thread safe

Now that you mention about compatibility, it wont work with a very modified library other than the library where the code was saved. This is not a problem for me, because i dont see where you will have problems with this, and if you update the library in your project, a recompile of the script will do the job (for example adding a silent function that tests the library version number or checksum and if is different than the compiled script version or checksum, recompile it or alert the user)

I havent see application projects where you periodically update a dll or library that is part of the application, most of the time you will update the main executable, because you will use a stable libraries before you release your project, and releases where the libraries need to be updated are major releases and the user need know that, because of this, old files have to be resaved or recompiled

The relocation table that stores the code locations is calculated on the first run, so modified libraries "might" give different code locations

Keep in mind that different vms gives you more options, performance or compatibility, it all depends on your needs,
for example if your compiler dont support asm, you can compile the normal but slower vm

@evolutional :
the virtual machine is not using any external library, it dont modify the AS in the terms that it keeps on being stack based (the next step is to make AS become a register based script)

@WitchLord:
Im sending the code right now

Lioric
Lioric, it seems that you are basically doing a JIT compilation, in that you change the bytecode to use direct machine code instead of interpreting bytecodes. If that is so, it is exactly what I had in mind when thinking about the future implementation.

It's simple business making the bytecode saved compatible with the another instance of the same application. The library just have to keep the original bytecode and save that one. After loading a bytecode the same optimization process is run again.

Although I agree that singlethreaded applications are much easier to optimize for, I do need to make the library compatible with multiplethreaded applications. I would prefer not to disallow this excellent optimization for multithreaded applications.

I look forward to seeing your implementation and comparing the performance with version 1.9.2 using the same tests I did before.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Hi

i tried sending the code but the mail server is not working here, i will try later when i have some more time (this weekend), but yes its easy to implement compatibility with the other vm, saving the code un-reloced (un-resolved) and before the execution it can resolve the symbols

and about the execution on different libraries or new versions, we can store the reolcations offsets (symbols) relative to the segments that they are, in this case the script can be run on any library version while the vm is not modified

Lioric

This topic is closed to new replies.

Advertisement