SSE - Alignment of __m128 Type - how?

Started by
21 comments, last by GameDev.net 18 years ago
I'm using gcc 3.31 and there seem to be some problems with the alignment of the __m128 type. As far as I know the type should be aligned to 16bit border per default. But sometimes this just isn't true. If I use the type as class varialbe and the object isn't correctly aligned it doesn't work. But there has to be another way than aligning each and every object (and each object that has these objects as members and so on). Is there some way to do it by default or does using the __m128 type (e.g. in a vector class that's used everywhere in my program) mean there'll be a lot of pain aligning all the classes by hand? Greetings and thanks, Roga
Advertisement
I don't exactly remember how to use it, but you have to play with the __aligned__ attribute. I believe it is something like this:

typedef __m128 __aligned_m128 __attribute__ ((__aligned__(16)));

More on the type attribute syntax.

Regards,
Thanks but I already tried that and it doesn't help. As far as I know the _m128 type should be aligned by default anyway. What seems to be the problem is that the objects that have __m128 as member variables are not correctly aligned...

typedef float __m128 __attribute__ (( vector_size(16) ));or maybetypedef float __m128 __attribute__ (( vector_size(16), aligned(16) ));


If your objects contain such values, alignment of the object will be at least the alignment of those values. The compiler will take of this. Check your memory manager if it also cares about the 16 byte alignment.
Hi, thanks for your reply but that still doesn't work. I looked around the net and all I found was, that I'm not the only one with this problem. I take assertions in all my routines that use a __m128 Type in the form of

assert(reinterpret_cast<int>(m_pData) % 16 == 0);

But this alway breaks sooner or later. The datatype simply isn't aligned as long as it is a class member (or the class that contains it is a member of some other class and so on...)

Yet there are many people who use these types. Do all of you align your vectors by hand? (Reserve memory dynamic and than place the __m128 type...) Wouldn't that be a huge waste of time, concerning that is had to be done each time a vector is constructed.
Ok, that last post was a little bit blurry. To clear things up - I have implemented a 3d vector class. The data for each vector is held in a union that looks like this:

union {       __m128 m_sseValues;   float m_pData[4];   struct {                    float x;          float y;          float z;          float w;        };


So what I want is that the __m128 type is always aligned like it's supposed to be. That's checked via

assert(reinterpret_cast<int>(m_pData) % 16 == 0);


But sooner or later this assertion fails. Anyone can tell me why and how to do this? I can't be that hard, can it? If it helps - I'm using gcc 3.31
Variables with proper alignment attributes, that are created on the stack should be properly aligned automagically - there are additional assembly intructions taking care of it added by the compiler just after entering any function using such variables, like in:
void f(void){  int a;  my_aligned_type  v;  bool b;  //...};
Whole stack-frame becomes aligned just after entering the function. Once and for all.

But variables created on the heap do not have this property. If you call
new my_aligned_type[10]
the plain new operator doesn't care about alignement issues, and (by default) returns memory chunk aligned to 8 bytes (of course, it may vary for different heap implementations). So there's quite a nice chance of getting not aligned memory.

I handled it by overloading new operator for such classes. You would also need a 16-byte aligning allocator for standard containers, AFAIK.
Thank you, I already feared that... But what I don't get is that in all example libraries (e.g. the fast sse matrix code on gamasutra) they use the __m128 type without thinking of additional alignment. Does that mean that they can't use their matrices if they are inside dynamically reserved objects?

class A {  public:    A() { myMatrix.doSomethingWithSSE(); }    ~A()  public:    Matrix4 myMatrix;}


// Within main
=> Matrix4 myMatrix; // This is ok
=> Matrix4 *myMatrix = new Matrix4(); // This can blow up..
=> A myA; // This is ok
=> A *myA = new A(); // This will also blow



Is that right? And if it's right - do any of you have the time to point me to a small sample where this is done right? (I guess I have to take care of the alignment within the constructor...)

Thanks, Roga
Yes, that is correct.

D3DX math library was having an overloaded new operator for its matrix library,that was doing the alignment by hand.
But in latest DXSDKs, entire math is put into dlls, not in the headers, so if you would like to exmine their implementation, you would have to take some older versions (2 years old, I guess).

Btw, this my implementation, based on theirs:
class CMemSpaceA16{public:   // new. dual with ReleaseMemSpaceA16();   static BYTE* GetMemSpaceA16(size_t size)   {      BYTE *p = ::new BYTE[size + 16];      if (p)      {         BYTE offset = (BYTE)(16 - ((unsigned int)p & 15));         p += offset;         p[-1] = offset;      }	return p;   };   // delete. dual with GetMemSpaceA16();   static void ReleaseMemSpaceA16( BYTE* p )   {      if(p) {         p -= p[-1];         ::delete [] p;      }   };};//---------------------------------------------------------------------/////////////////////////////////////////////////////////////////////////   I16Aligned	-	16 byte aligned classes//   just public inherit from this and job done ;)/////////////////////////////////////////////////////////////////////////---------------------------------------------------------------------class I16Aligned{public:   void* operator new(size_t size)   {      return static_cast<void*>( CMemSpaceA16::GetMemSpaceA16(size) );   };   void* operator new[](size_t size)   {      return static_cast<void*>( CMemSpaceA16::GetMemSpaceA16(size) );   };   void operator delete(void* p)   {      BYTE* pb = static_cast<BYTEp);      CMemSpaceA16::ReleaseMemSpaceA16(pb);   };   void operator delete[](void* p)   {      BYTE* pb = static_cast<BYTE*>(p);      CMemSpaceA16::ReleaseMemSpaceA16(pb);   };   /////////////////////////////////////////////////////////   //	Placement version (fake).   /////////////////////////////////////////////////////////   void* operator new(size_t /*size*/, void* ptrPlacement)   {	return ptrPlacement; };   void operator delete(void* /*ptr*/, void* /*ptrPlacement*/)   {	return; };   void* operator new[](size_t /*size*/, void* ptrPlacement)   {	return ptrPlacement; };   void operator delete[](void* /*ptr*/, void* /*ptrPlacement*/)   {	return; };};


IIRC, there are other way, involving some aligned versions of malloc, but I didn't look into this, so I won't give you any details.
Quote:Original post by Rogalon
But what I don't get is that in all example libraries (e.g. the fast sse matrix code on gamasutra) they use the __m128 type without thinking of additional alignment. Does that mean that they can't use their matrices if they are inside dynamically reserved objects?


You can overload operator new and probide your own implementation. For instance you can return aligned memory for memory requests >= 16 byte.

This topic is closed to new replies.

Advertisement