Sign in to follow this  
fread

fast matrix4x4 and vec4 multiply. SSE?

Recommended Posts

fread    240
hi guys, I'm looking for fast matrix4x4 per Vec4/vec3 multiply. It might be an SSE code or something like this. Would this bring a big speedup to my C++ code? Can anyone help me? Thanks in advance.

Share this post


Link to post
Share on other sites
Zaoshi Kaba    8434

It depends on how many such operations you perform. I managed to achieve 11x speedup in one particular case which had matrix multiplication + other code.

	__m128 vec_x = _mm_permute_ps(vector4, 0x00);
	__m128 vec_y = _mm_permute_ps(vector4, 0x55);
	__m128 vec_z = _mm_permute_ps(vector4, 0xAA);
	__m128 vec_w = _mm_permute_ps(vector4, 0xFF);

	// assume mat4_1, mat4_2, mat4_3, mat4_4 are matrix's components (I think rows)
	__m128 res0 = _mm_mul_ps(vec_x, mat4_1);
	__m128 res1 = _mm_fmadd_ps(vec_y, mat4_2, res0);
	__m128 res2 = _mm_fmadd_ps(vec_z, mat4_3, res1);
	__m128 res3 = _mm_fmadd_ps(vec_w, mat4_4, res2);
	// return res3; because it's transformed vector4.

	// for vector3 your mat4_4 and vec_w are just zero, so remove them altogether

Share this post


Link to post
Share on other sites
JohnnyCode    1046

in first place, you should work with interleaved arrays, as for the matrix, and for the vector. And cache the parameters.

 

void transform4x3mat(float* mat, float* vec,float* res)

{

// cache vec values in case the res points to same vector

float x= *vec;  

float y=*(vec+1);

float z=*(vec+2);

float w=1.0;// see , most likely always 1, so you do not need to multiply 4 column by w at all

 

*(res)=x*(*(mat))+y*(*(mat+1))+z*(*(mat+2))+(*(mat+3));

*(res+1)=x*(*(mat+4))+y*(*(mat+5))+z*(*(mat+6))+(*(mat+7));

*(res+2)=x*(*(mat+8))+y*(*(mat+9))+z*(*(mat+10))+(*(mat+11));

*(res+3)=x*(*(mat+12))+y*(*(mat+13))+z*(*(mat+14))+(*(mat+15)); // in case of projection matrix (included), compute this 4th compenent of result, else set straight 1

}

 

this function does colum matrix transformation, assuming row layout of matrix in memory.

Edited by JohnnyCode

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this