array performance / direct memory access?

Started by
28 comments, last by WitchLord 10 years, 1 month ago

After playing quite a bit with angelscript lately, I have noticed that accessing array elements is pretty costly: I am trying to write some data processing algorithms using anglescript, and it is pretty fast, except when accessing array elements. For data intensive algorithms, that's unfortunately not ideal :-(.

I have first modified the array add-on to speed up element access, but it's not the bottleneck. After doing some profiling, it appears that most of the time is spent in the code that does the native function call, not the function itself.

So I am wondering if there would be a way to register a specific data type in the engine that you would access directly in memory, without having to go thru a function call. Something like a buffer that could be shared between the C++ and the script, but with an operator [] built-in (directly in the engine) that returns the appropriate location in memory without having to go thru a function call. It would have to be smart enough to return the right type and size too (I think it it were limited to simple types such as int, float, double, etc. it would be enough). It could be pretty close to javascript's ArrayBuffer I guess, but with data type safety).

I know it's not the main purpose of the engine (and it would actually break the data access safety), but it could be a great addon for data intensive apps. What do you think?

Advertisement

It is something I've wished for since the first time I heard of Angelscript, but I'm afraid this will probably never be implemented. It's so totally against script safety. Also, most people will probably object that it's not really something that is needed for a scripting language, as heavy number-crunching on large data isn't precisely a very usual application for scripting (you would normally want to write such stuff in a native language like C++).

Though, for the security concerns, one could counter that a script is only able to access something that the host has previously registered.

So if there is a notion of "start address" and "max length" and some minimum bounds-checking in a "memory block", then actually it should be perfectly safe to allow a script to index into "some memory block". It would be, as usual, the host application writer's responsibility not to register memory regions that he doesn't want read/written by scripts.

I agree with you, but there are some cases when this can be pretty useful, when the end user is actually the one writing the algorithms.

I'd be glad to contribute and help for such an implementation, however I am wondering if it is even just feasible without breaking compatibility and without too much work (I guess it would require at least a new instruction for the VM). I must admit I am still faily new to angelscript :-)

You might be better off embedding Python in your app - check it out, it's simple.

Holy crap I started a blog - http://unobvious.typepad.com/

You might be better off embedding Python in your app - check it out, it's simple.

Python is a nice language for many things, but it would not work here due to its dynamic nature and its syntax. So i'd prefer to stick to angelscript :-)

I'm not sure if this will be of any use to you. I wanted to deal with some structures that would be defined at runtime. Below is an example of what I did

A real structure that is defined at compile time like this:

struct foo
{
int a;
double b;
int c;
};
and used like this in C code:
foo bar;
bar.a = 1;
bar.b = 1.23;
bar.c = 2;
The same structure could be stored in a char array of size 16 = (4 + 8 + 4):
char bar [16];
which like the struct could be used in C code like this:
int* a = ( (int*)(bar ) );
*a = 1;
double* b= ( (double*)(bar +4 ) );
*b = 1.23;
int* c = ( (int*)(bar +12 ) );
*c = 2;
You can register a value type "foo" with the script engine like this
//Define the type to the script engine
r = engine->RegisterObjectType("foo", 16, asOBJ_VALUE | asOBJ_POD); assert( r >= 0 ); <- the 16 is the size of our dynamic structure
r = engine->RegisterObjectProperty("foo", "int a", 0); assert( r >= 0 ); <- 0 is the offset to a normally this is asoffsetof(foo,a)
r = engine->RegisterObjectProperty("foo", "double b", 4); assert( r >= 0 ); <- 4 is the offset to b
r = engine->RegisterObjectProperty("foo", "int c", 12 ); assert( r >= 0 ); <- 12 is the offset to c
//Now I register a global variable in the script and point it to the bar char array
r = engine->RegisterGlobalProperty("foo bar", bar); assert( r >= 0 );
This is then used in the script like this
Print("bar.a =" + bar.a + "\n");
Print("bar.b =" + bar.b + "\n");
Print("bar.c =" + bar.c + "\n");

Maybe you could use one property as the index, size and type and then copy the requested indexed value value into the another property in C++ code? Because it's a char array you can tell AngelScript that it is whatever type you want. This doesn't work with normal array syntax but it might be faster since it is just offsetting a pointer (it might be slower also I don't know). There might be some other tricks you can play with this to get it to work. It is unsafe but it worked for my application of a runtime defined structure.

If this doesn't work there is a project at CERN that is being developed for exactly what you are trying to accomplish. They have an interactive C++ interpreter that uses the LLVM. The project is called Cling and can be found here http://root.cern.ch/drupal/content/cling .

Tony

Thanks for the reply. I am not sure it would actually help here, since the size of the array is not known when compiling the script.

What the scripts do, is just perform loops on large arrays (which size may vary), such as:


for(uint i=0,count=myArray.length;i<count;i++)
{
   double item=myArray[i];
   
   // perform math processing on item
   // ...

   // store back to array
   myArray[i]=item;
}

Since a lot of time is spent in both reading and writing operations because the [] operator is a native function call, it would be nice if I could define a special operator that would work on a special buffer array type and would directly return a piece of memory and not go thru a function call. From my understanding of the scripting engine, I guess this would require a new instruction, but I am not sure how this would be implemented (yet).

I have also already looked at Cling, but it's not designed for embedding, so that would not work. I definitely want to use angelscript here, as it have lots of advantages compared to all other solutions I have tried or studied (python, php, compiled c++, javascript, lua and many others).

This could be done without any new bytecode instructions in the vm.

What you need is some way to tell the compiler to produce the bytecode for acessing the buffer memory directly like a class property, instead of calling the opIndex method.

I think this is something that can be done with minimal impact to the library and can be made optional (turned off by default) so it won't sacrifice safety for those who don't want it (similar to asEP_ALLOW_UNSAFE_REFERENCES).

If you implement this I'll gladly incorporate it into the repository as an official feature.

Regards,
Andreas

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Thanks. I'd be glad to add this feature, indeed. However I am not sure where and how to start :-).

I think Andreas is referring to the Angelscript index operator [].

If you look at the registration for the ScriptArray addon you will find these methods registered that have the opIndex type. These will call the class functions ScriptArrayAt_xxx().

[source]

r = engine->RegisterObjectMethod("array", "T &opIndex(uint)", asFUNCTION(ScriptArrayAt_Generic), asCALL_GENERIC); assert( r >= 0 );
r = engine->RegisterObjectMethod("array", "const T &opIndex(uint) const", asFUNCTION(ScriptArrayAt_Generic), asCALL_GENERIC); assert( r >= 0 );

[/source]

I think you search in the file as_compiler.cpp for the text "opIndex" maybe you could modify the functions to do the direct memory access as Andreas indicated.

Also from the documentation you will find this which might help you out as well.

Index operators op opfunc [] opIndex

When the expression a is compiled, the compiler will rewrite it as a.opIndex(i) and compile that instead.

The index operator can also be formed similarly to property accessors. The get accessor should then be named get_opIndex and have one parameter for the indexing. The set accessor should be named set_opIndex and have two parameters, the first is for the indexing, and the second for the new value.


  class MyObj
  {
    float get_opIndex(int idx) const       { return 0; }
    void set_opIndex(int idx, float value) { }
  }

When the expression a is used to retrieve the value, the compiler will rewrite it as a.get_opIndex(i). When the expression is used to set the value, the compiler will rewrite it as a.set_opIndex(i, expr).

Tony

This topic is closed to new replies.

Advertisement