Jump to content

  • Log In with Google      Sign In   
  • Create Account


- - - - -

Alignment requirements


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
30 replies to this topic

#1 Starfox   Members   -  Reputation: 501

Like
1Likes
Like

Posted 27 November 2013 - 04:11 PM

I'm having trouble registering a float3 class with a float3(float x, float y, float z) constructor as a scoped ref type. Calls specifying asBEHAVE_CONSTRUCT fail when the flags are asOBJ_REF | asOBJ_SCOPED and adding asOBJ_APP_CLASS fails when registering the object type. How would I do that? 

 

If possible, I'd like to cast a vote for the alignment feature request described at http://www.gamedev.net/topic/606270-memory-aligned-objects/?gopid=4835684 - aligned vector types exist in every single game engine that I know of, and that option would significantly improve the ease of binding.


Holy crap I started a blog - http://unobvious.typepad.com/

Sponsor:

#2 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
0Likes
Like

Posted 27 November 2013 - 06:19 PM

You should use asBEHAVE_FACTORY for asOBJ_SCOPED objects. Consider coupling this with a memory pool to avoid doing dynamic allocations for each instance.

 

You'll also need to register the asBEHAVE_RELEASE behaviour to free the memory allocated with the factory, but not the asBEHAVE_ADDREF behaviour as there is no reference counting with asOBJ_SCOPED objects..

 

Manual: Registering scoped reference types

 

 

 

I agree that the alignment feature might make things a lot easier, but it is a feature that will require quite a bit of work to get right, not the least to update the assembly code for the native calling conventions to support this type too. Needless to say, it will take a while before I'll get to start implementing this feature, though it is always in the back of my mind.

 

You may however want to consider if it is really wise to expose the aligned vector class to the script. They are great for math heavy calculations, but when mixed with other computation they have quite a bit of overhead as the loading and unloading of the SIMD registers will be performed even for trivial calculations. It will also use up 33% more memory than you usually need for a vector3 structure.

 

This is why for example DirectXMath has two separate vector classes. One for heavy duty math work (XMVECTOR), and one for normal work and storage (XMFLOAT3).


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#3 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 27 November 2013 - 06:28 PM

Math-heavy scripts are of course a bad idea, but for simple back and forth passing of parameters implementing a whole new class just for unaligned storage is overkill. If you accept contributions we could have someone do the work necessary if you can provide some outline assistance.


Holy crap I started a blog - http://unobvious.typepad.com/

#4 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
0Likes
Like

Posted 28 November 2013 - 01:19 PM

I haven't made a complete mapping of all the changes that would be needed, but these are the high level changes:

 

  • The script context must make sure to always allocate the local stack memory buffer on 16byte aligned boundaries (asCContext::ReserveStackSpace)
  • The engine must make sure to always allocate the memory for the script objects on 16byte aligned boundaries (asCScriptEngine::CallAlloc)
  • The application needs to inform a new flag when registering types that require 16byte alignment, e.g. asOBJ_APP_ALIGN16 (asCScriptEngine::RegisterObjectType)
  • The script object type must make sure to align member properties of these types correctly (asCObjectType::AddPropertyToClass)
  • Script global properties must allocate memory on 16byte boundaries if holding these types (asCGlobalProperty::AllocateMemory)
  • The script compiler must make sure to allocate the local variables on 16byte boundaries (asCCompiler::AllocateVariable)
  • The script compiler must add pad bytes on the stack for all function calls to guarantee that the stack position is 16byte aligned on entry in the called function (asCCompiler)
  • The bytecode serializer must be capable of adjusting these pad bytes to guarantee platform independent saved bytecode. Remember that the registered type may not be 16byte aligned on all platforms (asCWriter & asCReader) 
  • The bytecode serializer must also be prepared to adjust the position of the local variables according to the need fro 16byte alignment (asCWriter & asCReader)
  • The code for the native calling conventions must be adjusted for all platforms that should support 16byte aligned types (as_callfunc...)
  • When the context needs to grow the local stack memory it must copy the function arguments so that the stack entry position is 16byte aligned (asCContext::CallScriptFunction)
  • When the context is prepared for a new call, it must set the initial stack position so the stack entry position is 16byte aligned (asCContext::Prepare)

 

There may be some other changes needed as well. If I remember anything else later I'll add to this list.

 

The bullets in red are the complex changes.The other changes should be quite trivial.


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#5 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 28 November 2013 - 05:33 PM

Sounds good. I'll dive into the code as soon as I can. 

 

One thing I noticed though: The alignment errors only happen in release builds, and not debug builds - is it possible that this is due to the behavioral changes of malloc()? If so, can those changed be simulated for the release build? Sounds a less-effort fix.


Holy crap I started a blog - http://unobvious.typepad.com/

#6 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
1Likes
Like

Posted 28 November 2013 - 06:01 PM

I don't think it is because of anything malloc does. More likely the debug build uses different instructions to load the SIMD registers so the __m128 types doesn't require alignment. This is however something that would be in your application, and not in AngelScript. 

 

You can set custom memory functions with asSetGlobalMemoryFunctions(). With this the application can use memory routines that is guaranteed to always return 16byte aligned memory to the script library. You probably don't want to use 16byte aligned allocations for everything though, as it will waste a lot of memory when the allocations are smaller than 16bytes. 

 

This gave me an idea. The code in as_memory.h can perhaps be enhanced to have a new macro for allocating 16byte aligned memory, e.g. asNEW16 and asNEWARRAY16. This macro can then call a new userAlloc16 global function. The pieces of code I mentioned above that need to guarantee 16byte aligned memory would then only have to call these macros instead of the existing ones to allocate the memory. 


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#7 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 28 November 2013 - 06:17 PM

FWIW, they currently both call userAlloc, which defaults to malloc() which guarantees 16 byte alignment on OS X - so that's probably not it. Something else about the debug environment is causing variables created on the AS stack to always have 16-byte aligned addresses - if only I can know what it is I would change it so that it's guaranteed to behave in the way that happens by accident under the debug environment...


Holy crap I started a blog - http://unobvious.typepad.com/

#8 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
0Likes
Like

Posted 28 November 2013 - 07:55 PM

There is nothing in the AngelScript code to guarantee 16byte aligned addresses of local variables at the moment. Even if malloc() is guaranteed to return properly aligned memory buffers, the local variables in the script will be packed at 4byte boundaries.


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#9 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 04 December 2013 - 10:13 PM

I haven't made a complete mapping of all the changes that would be needed, but these are the high level changes:

 

  • The script context must make sure to always allocate the local stack memory buffer on 16byte aligned boundaries (asCContext::ReserveStackSpace)

 

I think I implemented that part, check out commits https://bitbucket.org/sherief/angelscript/commits/b9bca19ffa001ce628d106adc95669c666af4efb and https://bitbucket.org/sherief/angelscript/commits/c8070156e0141c28fad7e782f28947431834919a and let me know what you think.


Edited by Starfox, 04 December 2013 - 10:14 PM.

Holy crap I started a blog - http://unobvious.typepad.com/

#10 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 04 December 2013 - 10:27 PM

  • The script object type must make sure to align member properties of these types correctly (asCObjectType::AddPropertyToClass)

 

asCObjectType::AddPropertyToClass() now supports specifying an alignment parameter - added in https://bitbucket.org/sherief/angelscript/commits/962189ef1a752736e9e9c480d10b2e14d0e4e685


Holy crap I started a blog - http://unobvious.typepad.com/

#11 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 04 December 2013 - 10:40 PM

  • The application needs to inform a new flag when registering types that require 16byte alignment, e.g. asOBJ_APP_ALIGN16 (asCScriptEngine::RegisterObjectType)

 

I didn't add a flag for 16-byte alignment (since we'll need more than just 16 byte aligned variables soon), I added a new optional parameter to RegisterObjectType: 

 

https://bitbucket.org/sherief/angelscript/commits/a6dd89ef276d09825c15a929dac6c6d2055d0cd6


Holy crap I started a blog - http://unobvious.typepad.com/

#12 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
0Likes
Like

Posted 05 December 2013 - 07:24 AM

 

  • I didn't add a flag for 16-byte alignment (since we'll need more than just 16 byte aligned variables soon), I added a new optional parameter to RegisterObjectType:  

 

 

What other alignment requirements are you in need of?

 

 

I'll review all above patches as soon as I can and provide my feedback.


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#13 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 05 December 2013 - 09:31 AM

What other alignment requirements are you in need of?

 

32, for AVX. For future compatibility I want the changes to be able to support any alignment.


Holy crap I started a blog - http://unobvious.typepad.com/

#14 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 05 December 2013 - 10:49 AM

  • The engine must make sure to always allocate the memory for the script objects on 16byte aligned boundaries (asCScriptEngine::CallAlloc)

 

CallAlloc() relies on useralloc - should the signature be changed to add an alignment parameter? I can do this, and I'd also add a simple wrapper for the default malloc() / free() use case.


Holy crap I started a blog - http://unobvious.typepad.com/

#15 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
0Likes
Like

Posted 05 December 2013 - 11:35 AM

I don't think AVX will require 32 byte alignment. Do you have any reference that says it will? In fact the wikipedia entry on avx says the memory alignment requirement of SIMD instructions may be relaxed, so it is quite possible that not even 16byte alignment will be needed. (unfortunately no reference for this was given).

 

But, feel free to make the code generic to support any alignment requirement.

 

As for CallAlloc(). Not all memory allocations needs to be aligned and you definitely do not want to force alignment of small allocations as it may waste a lot of memory. I think you need to add a secondary method, e.g. CallAllocAligned(). This new method can optionally take an argument with the desired alignment (though, for performance it may be better just to hardcode it to 16). The code that allocates memory must know if the memory needs to be aligned or not.


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#16 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 06 December 2013 - 11:28 AM

vmovaps requires 32-byte alignment. We also see a performance benefit from aligning matrices on 32 or 64 byte boundaries due to cache behavior.


Holy crap I started a blog - http://unobvious.typepad.com/

#17 quarnster   Members   -  Reputation: 266

Like
0Likes
Like

Posted 06 December 2013 - 02:17 PM

For a reference see Intel® Advanced Vector Extensions Programming Reference:

 

Table 2-4. Instructions Requiring Explicitly Aligned Memory
Require 16-byte alignmentRequire 32-byte alignment
(V)MOVDQA xmm, m128      VMOVDQA ymm, m256
(V)MOVDQA m128, xmm      VMOVDQA m256, ymm
(V)MOVAPS xmm, m128      VMOVAPS ymm, m256
(V)MOVAPS m128, xmm      VMOVAPS m256, ymm
(V)MOVAPD xmm, m128      VMOVAPD ymm, m256
(V)MOVAPD m128, xmm      VMOVAPD m256, ymm
(V)MOVNTPS m128, xmm     VMOVNTPS m256, ymm
(V)MOVNTPD m128, xmm     VMOVNTPD m256, ymm
(V)MOVNTDQ m128, xmm     VMOVNTDQ m256, ymm
(V)MOVNTDQA xmm, m128    VMOVNTDQA ymm, m256
 
Table 2-5. Instructions Not Requiring Explicit Memory Alignment
(V)MOVDQU xmm, m128
(V)MOVDQU m128, m128
(V)MOVUPS xmm, m128
(V)MOVUPS m128, xmm
(V)MOVUPD xmm, m128
(V)MOVUPD m128, xmm
VMOVDQU ymm, m256
VMOVDQU m256, ymm
VMOVUPS ymm, m256
VMOVUPS m256, ymm
VMOVUPD ymm, m256
VMOVUPD m256, ymm

 

In http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf, we can read in section 3.6.4 that:

Misaligned data access can incur significant performance penalties. This is particularly true for cache line 
splits. The size of a cache line is 64 bytes in the Pentium 4 and other recent Intel processors, including 
processors based on Intel Core microarchitecture. 
 
An access to data unaligned on 64-byte boundary leads to two memory accesses and requires several 
µops to be executed (instead of one). Accesses that span 64-byte boundaries are likely to incur a large 
performance penalty, the cost of each stall generally are greater on machines with longer pipelines.


Edited by quarnster, 06 December 2013 - 02:27 PM.


#18 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 06 December 2013 - 07:30 PM

^^ what he said.


Holy crap I started a blog - http://unobvious.typepad.com/

#19 Andreas Jonsson   Moderators   -  Reputation: 3061

Like
0Likes
Like

Posted 06 December 2013 - 07:42 PM

:)

 

The changes you've done so far seems to be fine. 

 

Are you testing against the test_feature app in the svn? It's the best way to make sure you don't accidentally break anything.


AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

#20 Starfox   Members   -  Reputation: 501

Like
0Likes
Like

Posted 06 December 2013 - 07:48 PM

As for CallAlloc(). Not all memory allocations needs to be aligned and you definitely do not want to force alignment of small allocations as it may waste a lot of memory. I think you need to add a secondary method, e.g. CallAllocAligned(). This new method can optionally take an argument with the desired alignment (though, for performance it may be better just to hardcode it to 16). The code that allocates memory must know if the memory needs to be aligned or not.

 

Some types do need natural alignment though, and the default allocator (that wraps malloc()) is already aligning to the largest supported type alignment on the platform, since malloc() is guaranteed to do that. On OS X, my primary platform, malloc() is guaranteed to return 16 byte aligned data. On almost every platform with doubles aligned is required by the spec to return a pointer that is at least as aligned as a double (8 bytes). 

 

Also, the patches add an alignment type to the type id info that defaults to 4, and that's what'll get passed to allocators when allocating memory for a type - so in existing code the behavior of the changes should be exactly identical.


Holy crap I started a blog - http://unobvious.typepad.com/




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS