Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


fast matrix4x4 and vec4 multiply. SSE?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 fread   Members   -  Reputation: 240

Like
0Likes
Like

Posted 09 June 2014 - 03:01 AM

hi guys, I'm looking for fast matrix4x4 per Vec4/vec3 multiply. It might be an SSE code or something like this. Would this bring a big speedup to my C++ code? Can anyone help me? Thanks in advance.

Sponsor:

#2 Zaoshi Kaba   Crossbones+   -  Reputation: 4431

Like
0Likes
Like

Posted 09 June 2014 - 09:11 AM

It depends on how many such operations you perform. I managed to achieve 11x speedup in one particular case which had matrix multiplication + other code.

	__m128 vec_x = _mm_permute_ps(vector4, 0x00);
	__m128 vec_y = _mm_permute_ps(vector4, 0x55);
	__m128 vec_z = _mm_permute_ps(vector4, 0xAA);
	__m128 vec_w = _mm_permute_ps(vector4, 0xFF);

	// assume mat4_1, mat4_2, mat4_3, mat4_4 are matrix's components (I think rows)
	__m128 res0 = _mm_mul_ps(vec_x, mat4_1);
	__m128 res1 = _mm_fmadd_ps(vec_y, mat4_2, res0);
	__m128 res2 = _mm_fmadd_ps(vec_z, mat4_3, res1);
	__m128 res3 = _mm_fmadd_ps(vec_w, mat4_4, res2);
	// return res3; because it's transformed vector4.

	// for vector3 your mat4_4 and vec_w are just zero, so remove them altogether


#3 JohnnyCode   Members   -  Reputation: 271

Like
1Likes
Like

Posted 09 June 2014 - 11:23 AM

in first place, you should work with interleaved arrays, as for the matrix, and for the vector. And cache the parameters.

 

void transform4x3mat(float* mat, float* vec,float* res)

{

// cache vec values in case the res points to same vector

float x= *vec;  

float y=*(vec+1);

float z=*(vec+2);

float w=1.0;// see , most likely always 1, so you do not need to multiply 4 column by w at all

 

*(res)=x*(*(mat))+y*(*(mat+1))+z*(*(mat+2))+(*(mat+3));

*(res+1)=x*(*(mat+4))+y*(*(mat+5))+z*(*(mat+6))+(*(mat+7));

*(res+2)=x*(*(mat+8))+y*(*(mat+9))+z*(*(mat+10))+(*(mat+11));

*(res+3)=x*(*(mat+12))+y*(*(mat+13))+z*(*(mat+14))+(*(mat+15)); // in case of projection matrix (included), compute this 4th compenent of result, else set straight 1

}

 

this function does colum matrix transformation, assuming row layout of matrix in memory.


Edited by JohnnyCode, 09 June 2014 - 11:24 AM.


#4 cadjunkie   Members   -  Reputation: 1323

Like
0Likes
Like

Posted 09 June 2014 - 01:47 PM

JohnnyCode's got the right idea. Brandon Jones' glMatrix unrolls all the code for things like that because WebGL needs all the speed it can get.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS