Jump to content

  • Log In with Google      Sign In   
  • Create Account

#Actualsamoth

Posted 30 May 2013 - 06:59 AM

Unless you have a lot of experience and invest a lot of time, GLM is as good or better than anything you can write performance-wise. It's likely better than anything I could write in finite time, anyway.

 

That, and it just works, and it "looks" the same as GLSL, which is a big plus.

 

As for SSE implementations, GLM has them, at least in some places, but they don't make much of a difference anyway. SSE is good for whacking through long streams of SoA data, but it is pretty useless for doing a few dozen dot products or multiplying 3-4 matrices, or for any data that you normally have (because AoS is the natural thing, not SoA). You have to work really hard to make SSE truly useful (other than in a contrieved, artificial example).

Special applications like audio/video codecs are of course an exception, but this is not surprising, it's what SSE was made for after all.

 

If your SSE-optimized dot product saves 1-2 cycles compared to a C implementation (compiled with optimizations turned on) you can consider yourself a happy man. So what, one branch prediction gone wrong costs 7-8 times as much.

 

If your quaternion/vector multiply saves 5-6 clock cycles, you're lucky. So even if you calculate a thousand of them per frame (for skeletal animation, or whatever) that's 5,000 clocks. Big thing. If that's an issue, then don't you ever dare making a call to a D3D function, or even access the disk.


#3samoth

Posted 30 May 2013 - 06:57 AM

Unless you have a lot of experience and invest a lot of time, GLM is as good or better than anything you can write performance-wise. It's likely better than anything I could write in finite time, anyway.

 

That, and it just works, and it "looks" the same as GLSL, which is a big plus.

 

As for SSE implementations, GLM has them, at least in some places, but they don't make much of a difference anyway. SSE is good for whacking through long streams of SoA data, but it is pretty useless for doing a few dozen dot products or multiplying 3-4 matrices, or for any data that you normally have (because AoS is the natural thing, not SoA). You have to work really hard to make SSE truly useful (other than in a contrieved, artificial example).

 

If your SSE-optimized dot product saves 1-2 cycles compared to a C implementation (compiled with optimizations turned on) you can consider yourself a happy man. So what, one branch prediction gone wrong costs 7-8 times as much.

 

If your quaternion/vector multiply saves 5-6 clock cycles, you're lucky. So even if you calculate a thousand of them per frame (for skeletal animation, or whatever) that's 5,000 clocks. Big thing. If that's an issue, then don't you ever dare making a call to a D3D function, or even access the disk.


#2samoth

Posted 30 May 2013 - 06:55 AM

Unless you have a lot of experience and invest a lot of time, GLM is as good or better than anything you can write performance-wise. It's likely better than anything I could write in finite time, anyway.

 

That, and it just works, and it "looks" the same as GLSL, which is a big plus.

 

As for SSE implementations, GLM has them, at least in some places, but they don't make much of a difference anyway. SSE is good for whacking through long streams of data, but it is pretty useless for doing a few dozen dot products or multiplying 3-4 matrices.

If your SSE-optimized dot product saves 1-2 cycles compared to a C implementation (compiled with optimizations turned on) you can consider yourself a happy man. If your quaternion/vector multiply saves 5-6 clock cycles, you're lucky. So even if you calculate a thousand of them per frame (for skeletal animation, or whatever) that's 5,000 clocks. Big thing. If that's an issue, then don't you ever dare making a call to a D3D function, or even access the disk.


#1samoth

Posted 30 May 2013 - 06:54 AM

Unless you have a lot of experience and invest a lot of time, GLM is as good or better than anything you can write performance-wise. It's likely better than anything I could write in finite time.

 

That, and it just works, and it "looks" the same as GLSL, which is a big plus.

 

As for SSE implementations, GLM has them, at least in some places, but they don't make much of a difference anyway. SSE is good for whacking through long streams of data, but it is pretty useless for doing a few dozen dot products or multiplying 3-4 matrices.

If your SSE-optimized dot product saves 1-2 cycles compared to a C implementation (compiled with optimizations turned on) you can consider yourself a happy man. If your quaternion/vector multiply saves 5-6 clock cycles, you're lucky. So even if you calculate a thousand of them per frame (for skeletal animation, or whatever) that's 5,000 clocks. Big thing. If that's an issue, then don't you ever dare making a call to a D3D function, or even access the disk.


PARTNERS