• 13
• 16
• 27
• 9
• 9

# SSE with dynamically allocated objects / Sony Vector Math Library

This topic is 3581 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, I'm trying to use the Sony vector math library (SSE Array of Structures version). I haven't really used SSE before, and I was thinking I could just plug and play with the Sony library, but I guess that was naive. I currently get runtime errors due to incorrect alignment of the members of dynamically allocated objects. For example, the Vector3 class is implemented as a __m128, and the assignment operator compiles to a pair of MOVAPS instructions, which require the operands to be aligned on a 16-byte boundary. Fror stack-allocated objects everything seems to work fine, but if I try to dynamically allocate an object with a Vector3 member, the Vector3 will not be aligned correctly and MOVAPS will crash. What I've read elsewhere suggests that I need to overload operator new for such objects so as to make use of _aligned_malloc, but I can't help but think I'm missing something. If anyone has used the Sony library or something similar in the past, any input would be appreciated.

##### Share on other sites
All memory must be aligned when dealing with any sort of SIMD functionality. Using _align_malloc() is one way to ensure aligned variables. I used a similar SIMD library and any sort of data structure that was used in conjunction with the SIMD functionality needed to be aligned. That included all variables being passed in and all variables being spit out.

One way to ensure 16 bit aligned values is to use some compiler specific commands. This one I used with the MSVC compiler:

__declspec(align(16)) int foo;__declspec(align(16)) class Bar{};

##### Share on other sites
Thanks; I tried using __declspec(align(16)), but from what I gather that only works for stack-allocated objects.

##### Share on other sites
Hi

I'm so happy right now, i'm trying to do the exactly same for like two weeks now. I got everything working (needed to overwrite the new operator of Vector3 so the pointer would be aligned) but dynamically created objects with a Vector3 member produce the exceptions you mentioned. If it helps i'll post my new new Operator:

	//My Operator for creating a new pointer, because the normal one sucks	inline void * operator new(size_t Size)	{	return _aligned_malloc(Size,16);	}	// My Operator for deleting these pointers	inline void operator delete(void * pointer)	{		_aligned_free(pointer);	}

with that
	Vector3* test = new Vector3(5.0f,5.0f,5.0f);

works like a charm.

In a C++ Channel on the efnet (#c++) people suggested using
#pragma pack(16)
but i haven't had luck with that either.

##### Share on other sites
Quote:
 Original post by FluxCapacitorThanks; I tried using __declspec(align(16)), but from what I gather that only works for stack-allocated objects.

Hmm, I believe you might be right.

##### Share on other sites
Quote:
 Original post by Roritharr_aligned_free(pointer);

Would this properly unwind classes that are created using your implementation of delete?

##### Share on other sites
Its supposed to and i havent witnessed any problems with it, so i simply like to believe it does...

##### Share on other sites
Quote:
 Original post by RealMarkPWould this properly unwind classes that are created using your implementation of delete?

yes. The poster above hasn't provided an overloaded new [] / delete [] though, so you might want to do that before allocating arrays...

##### Share on other sites
Found a Solution:

#pragma pack(16)

in front of every class Definition that uses a Vector3 or inherits one and

#pragma pack()

after it.

I'm currently testing if there are all problems solved with it...

##### Share on other sites
Quote:
 Original post by RoritharrFound a Solution:

That's not a solution, it just says pack the data so the structs are multiples of 16bytes. So, this code is (probably - you might be lucky) going to blow up:

struct Particle {  Vector3 pos;  Vector3 force;  Vector3 vel;};Particle v;

sizeof(Particle) will return 48, but then it would have done that with a default packing of 8 or 4 or 2 or 1. There is no guarentee however that v will have been allocated on a 16 byte boundary, which is what you actually want.

__declspec(align(16)) is required to make the above work....

So, the correct solution is this....

__declspec(align(16)) struct Particle {  Vector3 pos;  Vector3 force;  Vector3 vel;  inline void * operator new(size_t Size)  {    return _aligned_malloc(Size,16);  }  inline void operator delete(void * pointer)  {    _aligned_free(pointer);  }  inline void * operator new[](size_t Size)  {    return _aligned_malloc(Size,16);  }  inline void operator delete[](void * pointer)  {    _aligned_free(pointer);  }};

That will solve all of your alignment issues apart from usage in std::vector etc. For STL, you'll need to provide your own custom allocator, or alternatively roll your own templated containers...

btw, the __declspec(align(16)) flag will also pad structures when needed, i.e.

struct Blah {  char foo;  // under visual C++ this will issue a warning that   // the sruct has been padded due to __declspec(align(16))  Vector3 bar; };sizeof(Blah) == 32

however Blah will itself not be aligned when you allocate it, so you'll need to add __declspec(align(16)) to it, and overload the new/delete operators as i showed you before....