Jump to content
  • Advertisement
Sign in to follow this  

SIMD still need help

This topic is 4756 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello all, i 've started to optimise my matrix class with simd but i run into a problem. i get an acces violation when accessing _L1. This is declared like this:
__declspec( align( 16 ) ) union 
{
    struct 
    {
	__m128 _L1, _L2, _L3, _L4;
    };
    struct 
    {
	float	_11, _12, _13, _14;
	float	_21, _22, _23, _24;
	float	_31, _32, _33, _34;
	float	_41, _42, _43, _44;
    };
};




i've noticed that the _L1 variable starts at 0x011cead8 which is not aligned at 16 bytes. i thought the __declspec( align( 16 ) ) would take care of that but no... i'm using visual studio.net 2003. Could anyone help me?? thx in advance. [Edited by - codehunter13 on May 12, 2005 5:19:49 AM]

Share this post


Link to post
Share on other sites
Advertisement
Are you using some type of CPU that complains if you are trying to access a 32b variable that is not alligned properly?
On x86 processors allignment doesn't really matter (except for speed purposes).
Post some code.

Share this post


Link to post
Share on other sites
this is the function that fails:

void cMatrix::rotationMatrix(const float radsX,const float radsY,const float radsZ)
{
cMatrix x,y,z,res;
x.rotateXMatrix(radsX);
y.rotateYMatrix(radsY);
z.rotateZMatrix(radsZ);
res = z*x*y;

_L1 = res._L1;
_L2 = res._L2;
_L3 = res._L3;
_L4 = res._L4;
}





i've got an amd 2000 xp processor

Share this post


Link to post
Share on other sites

namespace ML
{
class cMatrix;
class cVector;
class cVector3;


class cMatrix {
public:
__declspec( align( 16 ) ) union {
struct {
__m128 _L1, _L2, _L3, _L4;
};
struct {
float _11, _12, _13, _14;
float _21, _22, _23, _24;
float _31, _32, _33, _34;
float _41, _42, _43, _44;
};
};


// Constructors
cMatrix() {}
cMatrix(const cMatrix &m) : _L1(m._L1), _L2(m._L2), _L3(m._L3), _L4(m._L4) {}
cMatrix(float _11, float _12, float _13, float _14,
float _21, float _22, float _23, float _24,
float _31, float _32, float _33, float _34,
float _41, float _42, float _43, float _44);

float& operator() (int i, int j) {
assert((0<=i) && (i<=3) && (0<=j) && (j<=3));
return *(((float *)&_11) + (i<<2)+j);
}
F32vec4& operator() (int i) {
assert((0<=i) && (i<=3));
return *(((F32vec4 *)&_11) + i);
}
F32vec4& operator[] (int i) {
assert((0<=i) && (i<=3));
return *(((F32vec4 *)&_11) + i);
}
F32vec4& operator[] (int i) const {
assert((0<=i) && (i<=3));
return *(((F32vec4 *)&_11) + i);
}

cMatrix& operator= (const cMatrix &a) {
_L1 = a._L1; _L2 = a._L2; _L3 = a._L3; _L4 = a._L4;
return *this;
}

friend cMatrix operator * (const cMatrix&, const cMatrix&);
friend cMatrix operator + (const cMatrix&, const cMatrix&);
friend cMatrix operator - (const cMatrix&, const cMatrix&);
friend cMatrix operator + (const cMatrix&);
friend cMatrix operator - (const cMatrix&);
friend cMatrix operator * (const cMatrix&, const float);
friend cMatrix operator * (const float, const cMatrix&);

cMatrix & operator *= (const cMatrix &);
cMatrix & operator *= (const float);
cMatrix & operator += (const cMatrix &);
cMatrix & operator -= (const cMatrix &);


// Other Constructors:
void zeroMatrix();
void identityMatrix();
void translateMatrix(const float dx, const float dy, const float dz);
void scaleMatrix(const float a, const float b, const float c);
void scaleMatrix(const float a);
void rotationMatrix(const float radsX,const float radsY,const float radsZ);
void rotateXMatrix(const float rads);
void rotateYMatrix(const float rads);
void rotateZMatrix(const float rads);

};

Share this post


Link to post
Share on other sites
Hmm...
I don't really understand this code:
		
res = z*x*y;
_L1 = res._L1;


I mean, I don;t really know C++, but in C that doesn't make much sense. Unless this is a feature of C++ I am not familiar with, what exactly are you trying to acomplish in those 2 lines of code?

Share this post


Link to post
Share on other sites
res = z*x*y => calculate rotation matrix and store in res.

_L1 = res._L1; =>copy first four floats of the res matrix into the first four floats of the current object

Share this post


Link to post
Share on other sites
I still don't understand it.
comment out //res = z*x*y => calculate rotation matrix and store in res.
See if it still crashes.

Share this post


Link to post
Share on other sites
Quote:
Original post by Raduprv
Hmm...
I don't really understand this code:
		
res = z*x*y;
_L1 = res._L1;


I mean, I don;t really know C++, but in C that doesn't make much sense. Unless this is a feature of C++ I am not familiar with, what exactly are you trying to acomplish in those 2 lines of code?
That multiplies the matrices x, y and z, and then it copies line 1 of the matrix into this (and the following 3 lines copy the rest of the result into this).
He could also have done:

cMatrix x,y,z,res;
x.rotateXMatrix(radsX);
y.rotateYMatrix(radsY);
z.rotateZMatrix(radsZ);
*this = z*x*y;




Also, alignment does matter for SIMD instructions, since the CPU can't perform SIMD operations on unaliged data.


codehunter13: Have you tried that __declspec with other variable types? It seems to work for me.
EDIT: Apparently the __m128 data type is supposed to be aligned to 16 bytes anyway...
Edit 2: Clicky

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!