• Create Account

## fast matrix4x4 and vec4 multiply. SSE?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

3 replies to this topic

240
Like
0Likes
Like

Posted 09 June 2014 - 03:01 AM

hi guys, I'm looking for fast matrix4x4 per Vec4/vec3 multiply. It might be an SSE code or something like this. Would this bring a big speedup to my C++ code? Can anyone help me? Thanks in advance.

### #2Zaoshi Kaba  Members

7797
Like
0Likes
Like

Posted 09 June 2014 - 09:11 AM

It depends on how many such operations you perform. I managed to achieve 11x speedup in one particular case which had matrix multiplication + other code.

	__m128 vec_x = _mm_permute_ps(vector4, 0x00);
__m128 vec_y = _mm_permute_ps(vector4, 0x55);
__m128 vec_z = _mm_permute_ps(vector4, 0xAA);
__m128 vec_w = _mm_permute_ps(vector4, 0xFF);

// assume mat4_1, mat4_2, mat4_3, mat4_4 are matrix's components (I think rows)
__m128 res0 = _mm_mul_ps(vec_x, mat4_1);
__m128 res1 = _mm_fmadd_ps(vec_y, mat4_2, res0);
__m128 res2 = _mm_fmadd_ps(vec_z, mat4_3, res1);
__m128 res3 = _mm_fmadd_ps(vec_w, mat4_4, res2);
// return res3; because it's transformed vector4.

// for vector3 your mat4_4 and vec_w are just zero, so remove them altogether


### #3JohnnyCode  Members

1061
Like
1Likes
Like

Posted 09 June 2014 - 11:23 AM

in first place, you should work with interleaved arrays, as for the matrix, and for the vector. And cache the parameters.

void transform4x3mat(float* mat, float* vec,float* res)

{

// cache vec values in case the res points to same vector

float x= *vec;

float y=*(vec+1);

float z=*(vec+2);

float w=1.0;// see , most likely always 1, so you do not need to multiply 4 column by w at all

*(res)=x*(*(mat))+y*(*(mat+1))+z*(*(mat+2))+(*(mat+3));

*(res+1)=x*(*(mat+4))+y*(*(mat+5))+z*(*(mat+6))+(*(mat+7));

*(res+2)=x*(*(mat+8))+y*(*(mat+9))+z*(*(mat+10))+(*(mat+11));

*(res+3)=x*(*(mat+12))+y*(*(mat+13))+z*(*(mat+14))+(*(mat+15)); // in case of projection matrix (included), compute this 4th compenent of result, else set straight 1

}

this function does colum matrix transformation, assuming row layout of matrix in memory.

Edited by JohnnyCode, 09 June 2014 - 11:24 AM.

1880
Like
0Likes
Like

Posted 09 June 2014 - 01:47 PM

JohnnyCode's got the right idea. Brandon Jones' glMatrix unrolls all the code for things like that because WebGL needs all the speed it can get.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.