• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Starfox

Alignment requirements

30 posts in this topic

I'm having trouble registering a float3 class with a float3(float x, float y, float z) constructor as a scoped ref type. Calls specifying asBEHAVE_CONSTRUCT fail when the flags are asOBJ_REF | asOBJ_SCOPED and adding asOBJ_APP_CLASS fails when registering the object type. How would I do that? 

 

If possible, I'd like to cast a vote for the alignment feature request described at http://www.gamedev.net/topic/606270-memory-aligned-objects/?gopid=4835684 - aligned vector types exist in every single game engine that I know of, and that option would significantly improve the ease of binding.

1

Share this post


Link to post
Share on other sites

You should use asBEHAVE_FACTORY for asOBJ_SCOPED objects. Consider coupling this with a memory pool to avoid doing dynamic allocations for each instance.

 

You'll also need to register the asBEHAVE_RELEASE behaviour to free the memory allocated with the factory, but not the asBEHAVE_ADDREF behaviour as there is no reference counting with asOBJ_SCOPED objects..

 

Manual: Registering scoped reference types

 

 

 

I agree that the alignment feature might make things a lot easier, but it is a feature that will require quite a bit of work to get right, not the least to update the assembly code for the native calling conventions to support this type too. Needless to say, it will take a while before I'll get to start implementing this feature, though it is always in the back of my mind.

 

You may however want to consider if it is really wise to expose the aligned vector class to the script. They are great for math heavy calculations, but when mixed with other computation they have quite a bit of overhead as the loading and unloading of the SIMD registers will be performed even for trivial calculations. It will also use up 33% more memory than you usually need for a vector3 structure.

 

This is why for example DirectXMath has two separate vector classes. One for heavy duty math work (XMVECTOR), and one for normal work and storage (XMFLOAT3).

0

Share this post


Link to post
Share on other sites

Math-heavy scripts are of course a bad idea, but for simple back and forth passing of parameters implementing a whole new class just for unaligned storage is overkill. If you accept contributions we could have someone do the work necessary if you can provide some outline assistance.

0

Share this post


Link to post
Share on other sites

I haven't made a complete mapping of all the changes that would be needed, but these are the high level changes:

 

  • The script context must make sure to always allocate the local stack memory buffer on 16byte aligned boundaries (asCContext::ReserveStackSpace)
  • The engine must make sure to always allocate the memory for the script objects on 16byte aligned boundaries (asCScriptEngine::CallAlloc)
  • The application needs to inform a new flag when registering types that require 16byte alignment, e.g. asOBJ_APP_ALIGN16 (asCScriptEngine::RegisterObjectType)
  • The script object type must make sure to align member properties of these types correctly (asCObjectType::AddPropertyToClass)
  • Script global properties must allocate memory on 16byte boundaries if holding these types (asCGlobalProperty::AllocateMemory)
  • The script compiler must make sure to allocate the local variables on 16byte boundaries (asCCompiler::AllocateVariable)
  • The script compiler must add pad bytes on the stack for all function calls to guarantee that the stack position is 16byte aligned on entry in the called function (asCCompiler)
  • The bytecode serializer must be capable of adjusting these pad bytes to guarantee platform independent saved bytecode. Remember that the registered type may not be 16byte aligned on all platforms (asCWriter & asCReader) 
  • The bytecode serializer must also be prepared to adjust the position of the local variables according to the need fro 16byte alignment (asCWriter & asCReader)
  • The code for the native calling conventions must be adjusted for all platforms that should support 16byte aligned types (as_callfunc...)
  • When the context needs to grow the local stack memory it must copy the function arguments so that the stack entry position is 16byte aligned (asCContext::CallScriptFunction)
  • When the context is prepared for a new call, it must set the initial stack position so the stack entry position is 16byte aligned (asCContext::Prepare)

 

There may be some other changes needed as well. If I remember anything else later I'll add to this list.

 

The bullets in red are the complex changes.The other changes should be quite trivial.

0

Share this post


Link to post
Share on other sites

Sounds good. I'll dive into the code as soon as I can. 

 

One thing I noticed though: The alignment errors only happen in release builds, and not debug builds - is it possible that this is due to the behavioral changes of malloc()? If so, can those changed be simulated for the release build? Sounds a less-effort fix.

0

Share this post


Link to post
Share on other sites

I don't think it is because of anything malloc does. More likely the debug build uses different instructions to load the SIMD registers so the __m128 types doesn't require alignment. This is however something that would be in your application, and not in AngelScript. 

 

You can set custom memory functions with asSetGlobalMemoryFunctions(). With this the application can use memory routines that is guaranteed to always return 16byte aligned memory to the script library. You probably don't want to use 16byte aligned allocations for everything though, as it will waste a lot of memory when the allocations are smaller than 16bytes. 

 

This gave me an idea. The code in as_memory.h can perhaps be enhanced to have a new macro for allocating 16byte aligned memory, e.g. asNEW16 and asNEWARRAY16. This macro can then call a new userAlloc16 global function. The pieces of code I mentioned above that need to guarantee 16byte aligned memory would then only have to call these macros instead of the existing ones to allocate the memory. 

1

Share this post


Link to post
Share on other sites

FWIW, they currently both call userAlloc, which defaults to malloc() which guarantees 16 byte alignment on OS X - so that's probably not it. Something else about the debug environment is causing variables created on the AS stack to always have 16-byte aligned addresses - if only I can know what it is I would change it so that it's guaranteed to behave in the way that happens by accident under the debug environment...

0

Share this post


Link to post
Share on other sites

There is nothing in the AngelScript code to guarantee 16byte aligned addresses of local variables at the moment. Even if malloc() is guaranteed to return properly aligned memory buffers, the local variables in the script will be packed at 4byte boundaries.

0

Share this post


Link to post
Share on other sites

I haven't made a complete mapping of all the changes that would be needed, but these are the high level changes:

 

  • The script context must make sure to always allocate the local stack memory buffer on 16byte aligned boundaries (asCContext::ReserveStackSpace)

 

I think I implemented that part, check out commits https://bitbucket.org/sherief/angelscript/commits/b9bca19ffa001ce628d106adc95669c666af4efb and https://bitbucket.org/sherief/angelscript/commits/c8070156e0141c28fad7e782f28947431834919a and let me know what you think.

Edited by Starfox
0

Share this post


Link to post
Share on other sites
  • The application needs to inform a new flag when registering types that require 16byte alignment, e.g. asOBJ_APP_ALIGN16 (asCScriptEngine::RegisterObjectType)

 

I didn't add a flag for 16-byte alignment (since we'll need more than just 16 byte aligned variables soon), I added a new optional parameter to RegisterObjectType: 

 

https://bitbucket.org/sherief/angelscript/commits/a6dd89ef276d09825c15a929dac6c6d2055d0cd6

0

Share this post


Link to post
Share on other sites

 

  • I didn't add a flag for 16-byte alignment (since we'll need more than just 16 byte aligned variables soon), I added a new optional parameter to RegisterObjectType:  

 

 

What other alignment requirements are you in need of?

 

 

I'll review all above patches as soon as I can and provide my feedback.

0

Share this post


Link to post
Share on other sites

What other alignment requirements are you in need of?

 

32, for AVX. For future compatibility I want the changes to be able to support any alignment.

0

Share this post


Link to post
Share on other sites
  • The engine must make sure to always allocate the memory for the script objects on 16byte aligned boundaries (asCScriptEngine::CallAlloc)

 

CallAlloc() relies on useralloc - should the signature be changed to add an alignment parameter? I can do this, and I'd also add a simple wrapper for the default malloc() / free() use case.

0

Share this post


Link to post
Share on other sites

I don't think AVX will require 32 byte alignment. Do you have any reference that says it will? In fact the wikipedia entry on avx says the memory alignment requirement of SIMD instructions may be relaxed, so it is quite possible that not even 16byte alignment will be needed. (unfortunately no reference for this was given).

 

But, feel free to make the code generic to support any alignment requirement.

 

As for CallAlloc(). Not all memory allocations needs to be aligned and you definitely do not want to force alignment of small allocations as it may waste a lot of memory. I think you need to add a secondary method, e.g. CallAllocAligned(). This new method can optionally take an argument with the desired alignment (though, for performance it may be better just to hardcode it to 16). The code that allocates memory must know if the memory needs to be aligned or not.

0

Share this post


Link to post
Share on other sites

vmovaps requires 32-byte alignment. We also see a performance benefit from aligning matrices on 32 or 64 byte boundaries due to cache behavior.

0

Share this post


Link to post
Share on other sites

For a reference see Intel® Advanced Vector Extensions Programming Reference:

 

[table]
[th='2']Table 2-4. Instructions Requiring Explicitly Aligned Memory[/th]
[tr][td]Require 16-byte alignment[/td][td]Require 32-byte alignment[/td][/tr]
[tr][td](V)MOVDQA xmm, m128      [/td][td]VMOVDQA ymm, m256[/td][/tr]
[tr][td](V)MOVDQA m128, xmm      [/td][td]VMOVDQA m256, ymm[/td][/tr]
[tr][td](V)MOVAPS xmm, m128      [/td][td]VMOVAPS ymm, m256[/td][/tr]
[tr][td](V)MOVAPS m128, xmm      [/td][td]VMOVAPS m256, ymm[/td][/tr]
[tr][td](V)MOVAPD xmm, m128      [/td][td]VMOVAPD ymm, m256[/td][/tr]
[tr][td](V)MOVAPD m128, xmm      [/td][td]VMOVAPD m256, ymm[/td][/tr]
[tr][td](V)MOVNTPS m128, xmm     [/td][td]VMOVNTPS m256, ymm[/td][/tr]
[tr][td](V)MOVNTPD m128, xmm     [/td][td]VMOVNTPD m256, ymm[/td][/tr]
[tr][td](V)MOVNTDQ m128, xmm     [/td][td]VMOVNTDQ m256, ymm[/td][/tr]
[tr][td](V)MOVNTDQA xmm, m128    [/td][td]VMOVNTDQA ymm, m256[/td][/tr]
[/table]
 
[table]
[th]Table 2-5. Instructions Not Requiring Explicit Memory Alignment[/th]
[tr][td](V)MOVDQU xmm, m128[/td][/tr]
[tr][td](V)MOVDQU m128, m128[/td][/tr]
[tr][td](V)MOVUPS xmm, m128[/td][/tr]
[tr][td](V)MOVUPS m128, xmm[/td][/tr]
[tr][td](V)MOVUPD xmm, m128[/td][/tr]
[tr][td](V)MOVUPD m128, xmm[/td][/tr]
[tr][td]VMOVDQU ymm, m256[/td][/tr]
[tr][td]VMOVDQU m256, ymm[/td][/tr]
[tr][td]VMOVUPS ymm, m256[/td][/tr]
[tr][td]VMOVUPS m256, ymm[/td][/tr]
[tr][td]VMOVUPD ymm, m256[/td][/tr]
[tr][td]VMOVUPD m256, ymm[/td][/tr]
[/table]

 

In http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf, we can read in section 3.6.4 that:

Misaligned data access can incur significant performance penalties. This is particularly true for cache line 
splits. The size of a cache line is 64 bytes in the Pentium 4 and other recent Intel processors, including 
processors based on Intel Core microarchitecture. 
 
An access to data unaligned on 64-byte boundary leads to two memory accesses and requires several 
µops to be executed (instead of one). Accesses that span 64-byte boundaries are likely to incur a large 
performance penalty, the cost of each stall generally are greater on machines with longer pipelines.

Edited by quarnster
0

Share this post


Link to post
Share on other sites

:)

 

The changes you've done so far seems to be fine. 

 

Are you testing against the test_feature app in the svn? It's the best way to make sure you don't accidentally break anything.

0

Share this post


Link to post
Share on other sites

As for CallAlloc(). Not all memory allocations needs to be aligned and you definitely do not want to force alignment of small allocations as it may waste a lot of memory. I think you need to add a secondary method, e.g. CallAllocAligned(). This new method can optionally take an argument with the desired alignment (though, for performance it may be better just to hardcode it to 16). The code that allocates memory must know if the memory needs to be aligned or not.

 

Some types do need natural alignment though, and the default allocator (that wraps malloc()) is already aligning to the largest supported type alignment on the platform, since malloc() is guaranteed to do that. On OS X, my primary platform, malloc() is guaranteed to return 16 byte aligned data. On almost every platform with doubles aligned is required by the spec to return a pointer that is at least as aligned as a double (8 bytes). 

 

Also, the patches add an alignment type to the type id info that defaults to 4, and that's what'll get passed to allocators when allocating memory for a type - so in existing code the behavior of the changes should be exactly identical.

0

Share this post


Link to post
Share on other sites

smile.png

 

The changes you've done so far seems to be fine. 

 

Are you testing against the test_feature app in the svn? It's the best way to make sure you don't accidentally break anything.

 

Not really, I can't get the test harness to work on my system. We test internally but out test isn't as comprehensive. Would it be too much to ask you to test my patches? Sorry.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0