SSE with dynamically allocated objects / Sony Vector Math Library

Started by
8 comments, last by RobTheBloke 15 years, 11 months ago
Hi, I'm trying to use the Sony vector math library (SSE Array of Structures version). I haven't really used SSE before, and I was thinking I could just plug and play with the Sony library, but I guess that was naive. I currently get runtime errors due to incorrect alignment of the members of dynamically allocated objects. For example, the Vector3 class is implemented as a __m128, and the assignment operator compiles to a pair of MOVAPS instructions, which require the operands to be aligned on a 16-byte boundary. Fror stack-allocated objects everything seems to work fine, but if I try to dynamically allocate an object with a Vector3 member, the Vector3 will not be aligned correctly and MOVAPS will crash. What I've read elsewhere suggests that I need to overload operator new for such objects so as to make use of _aligned_malloc, but I can't help but think I'm missing something. If anyone has used the Sony library or something similar in the past, any input would be appreciated.
Advertisement
All memory must be aligned when dealing with any sort of SIMD functionality. Using _align_malloc() is one way to ensure aligned variables. I used a similar SIMD library and any sort of data structure that was used in conjunction with the SIMD functionality needed to be aligned. That included all variables being passed in and all variables being spit out.

One way to ensure 16 bit aligned values is to use some compiler specific commands. This one I used with the MSVC compiler:

__declspec(align(16)) int foo;__declspec(align(16)) class Bar{};
------------Anything prior to 9am should be illegal.
Thanks; I tried using __declspec(align(16)), but from what I gather that only works for stack-allocated objects.
Hi

I'm so happy right now, i'm trying to do the exactly same for like two weeks now. I got everything working (needed to overwrite the new operator of Vector3 so the pointer would be aligned) but dynamically created objects with a Vector3 member produce the exceptions you mentioned. If it helps i'll post my new new Operator:

	//My Operator for creating a new pointer, because the normal one sucks	inline void * operator new(size_t Size)	{	return _aligned_malloc(Size,16);	}	// My Operator for deleting these pointers	inline void operator delete(void * pointer)	{		_aligned_free(pointer);	}

with that
	Vector3* test = new Vector3(5.0f,5.0f,5.0f); 

works like a charm.

In a C++ Channel on the efnet (#c++) people suggested using
#pragma pack(16)
but i haven't had luck with that either.
Quote:Original post by FluxCapacitor
Thanks; I tried using __declspec(align(16)), but from what I gather that only works for stack-allocated objects.


Hmm, I believe you might be right.
------------Anything prior to 9am should be illegal.
Quote:Original post by Roritharr
_aligned_free(pointer);


Would this properly unwind classes that are created using your implementation of delete?
------------Anything prior to 9am should be illegal.
Its supposed to and i havent witnessed any problems with it, so i simply like to believe it does...
Quote:Original post by RealMarkP
Would this properly unwind classes that are created using your implementation of delete?


yes. The poster above hasn't provided an overloaded new [] / delete [] though, so you might want to do that before allocating arrays...
Found a Solution:

#pragma pack(16)

in front of every class Definition that uses a Vector3 or inherits one and

#pragma pack()

after it.

I'm currently testing if there are all problems solved with it...
Quote:Original post by Roritharr
Found a Solution:


That's not a solution, it just says pack the data so the structs are multiples of 16bytes. So, this code is (probably - you might be lucky) going to blow up:

struct Particle {  Vector3 pos;  Vector3 force;  Vector3 vel;};Particle v;


sizeof(Particle) will return 48, but then it would have done that with a default packing of 8 or 4 or 2 or 1. There is no guarentee however that v will have been allocated on a 16 byte boundary, which is what you actually want.

__declspec(align(16)) is required to make the above work....

So, the correct solution is this....

__declspec(align(16)) struct Particle {  Vector3 pos;  Vector3 force;  Vector3 vel;  inline void * operator new(size_t Size)  {    return _aligned_malloc(Size,16);  }  inline void operator delete(void * pointer)  {    _aligned_free(pointer);  }  inline void * operator new[](size_t Size)  {    return _aligned_malloc(Size,16);  }  inline void operator delete[](void * pointer)  {    _aligned_free(pointer);  }};


That will solve all of your alignment issues apart from usage in std::vector etc. For STL, you'll need to provide your own custom allocator, or alternatively roll your own templated containers...

btw, the __declspec(align(16)) flag will also pad structures when needed, i.e.

struct Blah {  char foo;  // under visual C++ this will issue a warning that   // the sruct has been padded due to __declspec(align(16))  Vector3 bar; };sizeof(Blah) == 32


however Blah will itself not be aligned when you allocate it, so you'll need to add __declspec(align(16)) to it, and overload the new/delete operators as i showed you before....

This topic is closed to new replies.

Advertisement