fast matrix4x4 and vec4 multiply. SSE?
Crossbones+ - Reputation: 7141
Posted 09 June 2014 - 09:11 AM
It depends on how many such operations you perform. I managed to achieve 11x speedup in one particular case which had matrix multiplication + other code.
__m128 vec_x = _mm_permute_ps(vector4, 0x00); __m128 vec_y = _mm_permute_ps(vector4, 0x55); __m128 vec_z = _mm_permute_ps(vector4, 0xAA); __m128 vec_w = _mm_permute_ps(vector4, 0xFF); // assume mat4_1, mat4_2, mat4_3, mat4_4 are matrix's components (I think rows) __m128 res0 = _mm_mul_ps(vec_x, mat4_1); __m128 res1 = _mm_fmadd_ps(vec_y, mat4_2, res0); __m128 res2 = _mm_fmadd_ps(vec_z, mat4_3, res1); __m128 res3 = _mm_fmadd_ps(vec_w, mat4_4, res2); // return res3; because it's transformed vector4. // for vector3 your mat4_4 and vec_w are just zero, so remove them altogether
Members - Reputation: 977
Posted 09 June 2014 - 11:23 AM
in first place, you should work with interleaved arrays, as for the matrix, and for the vector. And cache the parameters.
void transform4x3mat(float* mat, float* vec,float* res)
// cache vec values in case the res points to same vector
float x= *vec;
float w=1.0;// see , most likely always 1, so you do not need to multiply 4 column by w at all
*(res+3)=x*(*(mat+12))+y*(*(mat+13))+z*(*(mat+14))+(*(mat+15)); // in case of projection matrix (included), compute this 4th compenent of result, else set straight 1
this function does colum matrix transformation, assuming row layout of matrix in memory.
Edited by JohnnyCode, 09 June 2014 - 11:24 AM.