# QtScript vs AngelScript

## Recommended Posts

Since AngelScript was intended to use in some Qt project, I wrote this performance test to compare QtScript and AngelScript: https://bitbucket.org/FreeSlave/qtscript-vs-angelscript/overview

On my machine (Intel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz) Qt script runs 10 times faster (this is minimum, often the number is higher) than AngelScript using this test.

The test itself is just multiplication of many random matrices 4x4. Matrices are represented by arrays (scriptarray add-on in case of AngelScript) in column-major form.

Code looks equivalent for both scripts, except maybe for part where 'preresult' is assigned to 'result' variable, but I also tried to force to make copy of array in QtScript using slice(), which still did not reduce the distance much.

I also compiled angelscript library with -O2, but it did not help. Maybe I missed something or scriptarray is not good enough. I expected better performance due to static typing.

##### Share on other sites
I can't look at your code right now but did you try with the jit on Angelscript?

##### Share on other sites

Array access in Angelscript is fairly slow, as each access is a function call. I believe Angelscript recently added fixed array access which would be a good fit for matrices, and should run much faster.

##### Share on other sites
Not only is each access done through a function call, the CScriptArray is also a generic implementation to support any type and thus has extra overhead to deal with the type identification.

Direct array access without a function call is not yet in the library. The difficulty with this is to guarantee safety, i.e. proper bounds checking for the array access. For a dynamic array it would most likely require a function call anyway just to check that the index is within the bounds.

If your intention is to work with matrices in the script you should really register your own matrix class with the proper operator overloads, e.g. matrix multiplication, etc. This would eliminate pretty much all of the overhead of using generic types, and function calls for each element access.

Still, I'm pleased to see this performance comparison (even though AngelScript lost ). I will take it as a challenge and work on improving the performance further.

##### Share on other sites

QtScript uses JavaScriptCore which is a JIT engine.

##### Share on other sites

Most likely their JIT-compiler also has in-depth knowledge of the various qt container types so it can build native code for accessing the array elements directly instead of making function calls all the time.

##### Share on other sites

For performance comparison with math operations, you should definitely use the excellent AngelScript JIT developed by BlindMind Studios. We have released real time audio processing software that uses AngelScript, and the JIT was really a huge improvement in performance. It makes both math operations and function calls much faster.

The main bottleneck will still be the function calls for arrays (if I remember well, it's really the function call that costs a lot, not the fact that the container is generic, nor the implementation of the "At" accessor), but it should be less noticeable. I think there is already a thread about this here.

By the way, I am thinking out loud, but wouldn't there be a way to implement ultra-fast built-in arrays for simple types (double, float, ints etc.) using native code? Maybe this could be an option to override the generic array add-on by another "fast" one that works only for simple types? Knowing the data structure, it could even still check boundaries and yet not require a function call. What do you think Andreas?

##### Share on other sites

Built-in dynamic arrays is not something I'll do. I have the arrays as add-ons to allow the application developer to customize it anyway it wants. If I implement built-in arrays for some types that would be defeated, since the application developer would no longer be able to customize the array (without modifying the library at least).

However AngelScript already supports template specializations, so the application can register for example an optimized implementation of an array of floats which would avoid all the runtime type checks. It still wouldn't get away from the function calls for each array access.

To remove the function calls I need to think of a way for the application to tell the VM where it can verify the size of the array, and how to access the elements. It can definitely be done, but I need to think about an appropriate interface for it.

##### Share on other sites

I definitely agree with you Andreas, and what you propose (finding a way for the app to define direct access to elements and length) would be a great improvement for performance with code that does a bit of math.

##### Share on other sites

When the CScriptArray binds itself with the engine, can there be a new call that tells the engine the offset of the start of the data relative to an instance of the object (CScriptArray)? If that call is made, then the VM will expect N bytes to be prepended to the data.. say 1 byte for the size of each element (1, 2, 4 or 8?) and several bytes for the # of elements currently in the array.

There would be some overhead in maintaining this '# of elements' value each time data is added/removed, but those are function-calls anyway.

For a CScriptArray instance which had made this binding call, the VM can compile accesses into something similar to the dot notation for accessing a struct member. Given the (CScriptArray) instance address, first offset by a # of bytes to get to the data. From there read the size of each element and the # of elements (and seek to the actual start of the data). Then offset by index * size_of_each_element. Then get the data?

##### Share on other sites

On a high level, that is exactly the idea on how to make the element access without a function call.

The difficult part is on how to do this without making hard-coded assumptions on the layout of the registered array type. I don't want to make a solution that would only work if you implement the array type precisely like the CScriptArray. :)

##### Share on other sites

assumptions on the layout of the registered array type

I was going to say that an array is, by definition, a contiguous buffer... but I suppose you could be thinking of something like a ring vector/circular buffer, where the given index is a logical one and needs to be transformed by some logic?

If you maintain the existing functionality and only apply the optimization when the type has explicitly stated (via a binding function call) that it has/desires this behavior, shouldn't that be OK?

##### Share on other sites

Of course. I'm not going to remove the existing opIndex method call. The direct access would only be an alternative way of providing the access to elements through the index operator.

##### Share on other sites

Probably I'll do something similar to what I did for supporting initialization lists, i.e. provide a sort of meta-language for explaining to the compiler how to access elements. Perhaps something like this:

// dynamic templated array (e.g. CScriptArray)
engine->RegisterObjectIndexOp("array<T>", "length=offset(8),uint;element=offset(16),deref,sizeof(T),T");

// fixed array (e.g. matrix4x4)
engine->RegisterObjectIndexOp("matrix4x4", "length=16;element=offset(0),,4,float");


With this the compiler would be able to understand that when accessing an element of the dynamic array it will find the length of the array at offset 8 from the object pointer, and to access the elements it should first offset the pointer with 16, then dereference the pointer, then offset with the index times the size of type T, and finally the accessed type will be T.

For the fixed array, it would know that the length is always 16, and that the element is accessed by directly offsetting the object pointer with the index times 4 and the accessed type will be a float.

Edited by Andreas Jonsson

##### Share on other sites

This would be awesome: elements accessed thru this specific opIndex would just be accessed like class members in fact, right?

Edited by gjl

##### Share on other sites

The syntax in the script would be identical to the current opIndex operator. Behind the scenes the compiler would generate the bytecode to access the elements without making a function call.

##### Share on other sites

I implemented a special bytecode instruction for calling registered class methods with signature 'type &obj::func(int)' in revision 2147. With this there are almost no runtime decisions that has to be made when making the function call, so the overhead is greatly reduced.

In my tests, a script that accesses array members 20 million times took about 1.2 seconds without this new bytecode instruction, and with it it was reduced to about 0.4 seconds.

In comparison, the same logic implemented directly in C++, takes about 0.04 seconds. So the script is approximately 10 times slower than C++ now. With the use of a JIT compiler this difference should be reduced even further (though I haven't tried it).

---

I did experiment with implementing the opIndex call as direct access by inlining the call as bytecode instructions, and the performance was about the same as with the new bytecode instruction. For now I've decided to put this option on hold, as the effort to get it to work is too great compared to the benefits it will have.

##### Share on other sites

Thanks! In order to validate the performance improvement with a JIT I guess this requires the new instruction to be implemented as part of the JIT, right?

##### Share on other sites

Correct. The current version of the JIT will not recognize the new byte code instruction, thus it will fallback to using the VM.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
627749
• Total Posts
2978912

• 10
• 10
• 21
• 14
• 14