SIMD still need help

Started by
30 comments, last by codehunter13 18 years, 11 months ago
Hello all, i 've started to optimise my matrix class with simd but i run into a problem. i get an acces violation when accessing _L1. This is declared like this:

__declspec( align( 16 ) ) union 
{
    struct 
    {
	__m128 _L1, _L2, _L3, _L4;
    };
    struct 
    {
	float	_11, _12, _13, _14;
	float	_21, _22, _23, _24;
	float	_31, _32, _33, _34;
	float	_41, _42, _43, _44;
    };
};




i've noticed that the _L1 variable starts at 0x011cead8 which is not aligned at 16 bytes. i thought the __declspec( align( 16 ) ) would take care of that but no... i'm using visual studio.net 2003. Could anyone help me?? thx in advance. [Edited by - codehunter13 on May 12, 2005 5:19:49 AM]
Advertisement
Are you using some type of CPU that complains if you are trying to access a 32b variable that is not alligned properly?
On x86 processors allignment doesn't really matter (except for speed purposes).
Post some code.
this is the function that fails:
void cMatrix::rotationMatrix(const float radsX,const float radsY,const float radsZ)	{		cMatrix x,y,z,res;		x.rotateXMatrix(radsX);		y.rotateYMatrix(radsY);		z.rotateZMatrix(radsZ);		res = z*x*y;		_L1 = res._L1;		_L2 = res._L2;		_L3 = res._L3;		_L4 = res._L4;	}


i've got an amd 2000 xp processor
Post the full data declaration too.
namespace ML{	class cMatrix;	class cVector;	class cVector3;	class cMatrix {	public:		__declspec( align( 16 ) ) union {			 struct {				__m128 _L1, _L2, _L3, _L4;			};			struct {				float	_11, _12, _13, _14;				float	_21, _22, _23, _24;				float	_31, _32, _33, _34;				float	_41, _42, _43, _44;			};		};	// Constructors 		cMatrix() {}		cMatrix(const cMatrix &m) : _L1(m._L1), _L2(m._L2), _L3(m._L3), _L4(m._L4) {}		cMatrix(float _11, float _12, float _13, float _14,				float _21, float _22, float _23, float _24,				float _31, float _32, float _33, float _34,				float _41, float _42, float _43, float _44);		float& operator() (int i, int j) {			assert((0<=i) && (i<=3) && (0<=j) && (j<=3));			return *(((float *)&_11) + (i<<2)+j);		}		F32vec4& operator() (int i) {			assert((0<=i) && (i<=3));			return *(((F32vec4 *)&_11) + i);		}		F32vec4& operator[] (int i) {			assert((0<=i) && (i<=3));			return *(((F32vec4 *)&_11) + i);		}		F32vec4& operator[] (int i) const {			assert((0<=i) && (i<=3));			return *(((F32vec4 *)&_11) + i);		}		cMatrix& operator= (const cMatrix &a) {			_L1 = a._L1; _L2 = a._L2; _L3 = a._L3; _L4 = a._L4;			return *this;		}		friend cMatrix operator * (const cMatrix&, const cMatrix&);		friend cMatrix operator + (const cMatrix&, const cMatrix&);		friend cMatrix operator - (const cMatrix&, const cMatrix&);		friend cMatrix operator + (const cMatrix&);		friend cMatrix operator - (const cMatrix&);		friend cMatrix operator * (const cMatrix&, const float);		friend cMatrix operator * (const float, const cMatrix&);		cMatrix & operator *= (const cMatrix &);		cMatrix & operator *= (const float);		cMatrix & operator += (const cMatrix &);		cMatrix & operator -= (const cMatrix &);		// Other Constructors:		void zeroMatrix();		void identityMatrix();		void translateMatrix(const float dx, const float dy, const float dz);		void scaleMatrix(const float a, const float b, const float c);		void scaleMatrix(const float a);		void rotationMatrix(const float radsX,const float radsY,const float radsZ);		void rotateXMatrix(const float rads);		void rotateYMatrix(const float rads);		void rotateZMatrix(const float rads);		};
Hmm...
I don't really understand this code:
		res = z*x*y;_L1 = res._L1;


I mean, I don;t really know C++, but in C that doesn't make much sense. Unless this is a feature of C++ I am not familiar with, what exactly are you trying to acomplish in those 2 lines of code?
res = z*x*y => calculate rotation matrix and store in res.

_L1 = res._L1; =>copy first four floats of the res matrix into the first four floats of the current object
I still don't understand it.
comment out //res = z*x*y => calculate rotation matrix and store in res.
See if it still crashes.
Quote:Original post by Raduprv
Hmm...
I don't really understand this code:
		res = z*x*y;_L1 = res._L1;


I mean, I don;t really know C++, but in C that doesn't make much sense. Unless this is a feature of C++ I am not familiar with, what exactly are you trying to acomplish in those 2 lines of code?
That multiplies the matrices x, y and z, and then it copies line 1 of the matrix into this (and the following 3 lines copy the rest of the result into this).
He could also have done:
cMatrix x,y,z,res;x.rotateXMatrix(radsX);y.rotateYMatrix(radsY);z.rotateZMatrix(radsZ);*this = z*x*y;

Also, alignment does matter for SIMD instructions, since the CPU can't perform SIMD operations on unaliged data.


codehunter13: Have you tried that __declspec with other variable types? It seems to work for me.
EDIT: Apparently the __m128 data type is supposed to be aligned to 16 bytes anyway...
Edit 2: Clicky
already done that and he always crashes when accessing _L1;

This topic is closed to new replies.

Advertisement