Sign in to follow this  

SSE with dynamically allocated objects / Sony Vector Math Library

This topic is 3491 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm trying to use the Sony vector math library (SSE Array of Structures version). I haven't really used SSE before, and I was thinking I could just plug and play with the Sony library, but I guess that was naive. I currently get runtime errors due to incorrect alignment of the members of dynamically allocated objects. For example, the Vector3 class is implemented as a __m128, and the assignment operator compiles to a pair of MOVAPS instructions, which require the operands to be aligned on a 16-byte boundary. Fror stack-allocated objects everything seems to work fine, but if I try to dynamically allocate an object with a Vector3 member, the Vector3 will not be aligned correctly and MOVAPS will crash. What I've read elsewhere suggests that I need to overload operator new for such objects so as to make use of _aligned_malloc, but I can't help but think I'm missing something. If anyone has used the Sony library or something similar in the past, any input would be appreciated.

Share this post


Link to post
Share on other sites
All memory must be aligned when dealing with any sort of SIMD functionality. Using _align_malloc() is one way to ensure aligned variables. I used a similar SIMD library and any sort of data structure that was used in conjunction with the SIMD functionality needed to be aligned. That included all variables being passed in and all variables being spit out.

One way to ensure 16 bit aligned values is to use some compiler specific commands. This one I used with the MSVC compiler:


__declspec(align(16)) int foo;

__declspec(align(16)) class Bar
{
};

Share this post


Link to post
Share on other sites
Hi

I'm so happy right now, i'm trying to do the exactly same for like two weeks now. I got everything working (needed to overwrite the new operator of Vector3 so the pointer would be aligned) but dynamically created objects with a Vector3 member produce the exceptions you mentioned. If it helps i'll post my new new Operator:


//My Operator for creating a new pointer, because the normal one sucks
inline void * operator new(size_t Size)
{
return _aligned_malloc(Size,16);
}
// My Operator for deleting these pointers
inline void operator delete(void * pointer)
{
_aligned_free(pointer);
}

with that

Vector3* test = new Vector3(5.0f,5.0f,5.0f);

works like a charm.

In a C++ Channel on the efnet (#c++) people suggested using
#pragma pack(16)
but i haven't had luck with that either.

Share this post


Link to post
Share on other sites
Quote:
Original post by FluxCapacitor
Thanks; I tried using __declspec(align(16)), but from what I gather that only works for stack-allocated objects.


Hmm, I believe you might be right.

Share this post


Link to post
Share on other sites
Quote:
Original post by RealMarkP
Would this properly unwind classes that are created using your implementation of delete?


yes. The poster above hasn't provided an overloaded new [] / delete [] though, so you might want to do that before allocating arrays...

Share this post


Link to post
Share on other sites
Found a Solution:

#pragma pack(16)

in front of every class Definition that uses a Vector3 or inherits one and

#pragma pack()

after it.

I'm currently testing if there are all problems solved with it...

Share this post


Link to post
Share on other sites
Quote:
Original post by Roritharr
Found a Solution:


That's not a solution, it just says pack the data so the structs are multiples of 16bytes. So, this code is (probably - you might be lucky) going to blow up:


struct Particle {
Vector3 pos;
Vector3 force;
Vector3 vel;
};
Particle v;


sizeof(Particle) will return 48, but then it would have done that with a default packing of 8 or 4 or 2 or 1. There is no guarentee however that v will have been allocated on a 16 byte boundary, which is what you actually want.

__declspec(align(16)) is required to make the above work....

So, the correct solution is this....


__declspec(align(16))
struct Particle {
Vector3 pos;
Vector3 force;
Vector3 vel;

inline void * operator new(size_t Size)
{
return _aligned_malloc(Size,16);
}
inline void operator delete(void * pointer)
{
_aligned_free(pointer);
}
inline void * operator new[](size_t Size)
{
return _aligned_malloc(Size,16);
}
inline void operator delete[](void * pointer)
{
_aligned_free(pointer);
}
};


That will solve all of your alignment issues apart from usage in std::vector etc. For STL, you'll need to provide your own custom allocator, or alternatively roll your own templated containers...

btw, the __declspec(align(16)) flag will also pad structures when needed, i.e.


struct Blah {
char foo;
// under visual C++ this will issue a warning that
// the sruct has been padded due to __declspec(align(16))
Vector3 bar;
};

sizeof(Blah) == 32


however Blah will itself not be aligned when you allocate it, so you'll need to add __declspec(align(16)) to it, and overload the new/delete operators as i showed you before....

Share this post


Link to post
Share on other sites

This topic is 3491 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this