Jump to content
  • Advertisement
Sign in to follow this  
DarkRonin

Matrix Calculation Efficiency

This topic is 823 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi Guys,

 

At present, I send the W, V, & P matrices to the shader where they are multiplied within the shader to position vertices.

 

Would it be more efficient to pre-multiply these on the CPU and then pass the result to the shader?

 

Thanks in advance :)

Share this post


Link to post
Share on other sites
Advertisement

Do not prematurely optimize things, you might end up having to switch to the other method later.  Profile and test things, that is what will make the best determination.  There are very, very few steadfast rules about this stuff, it is highly dependent upon what you're doing code wise, and the data you're pumping through the CPU/GPU, etc.

Share this post


Link to post
Share on other sites

It's my premature optimisation that is allowing me to be able to render so much in the first place.

 

I was just wondering what the normal practice was.

Share this post


Link to post
Share on other sites

Simple answer: yes - doing multiplication once ahead of time, in order to avoid doing it hundreds of thousands of times (once per vertex) is obviously a good idea.

 

However, there may be cases where uploading a single WVP matrix introduces its own problems too!

For example, lets say we have a scene with 1000 static objects in it and a moving camera.

Each frame, we have to calculate VP = V*P, and then perform 1000 WVP = W * VP calculations, and upload the 1000 resulting WVP matrices to the GPU.

If instead, we sent W and VP to the GPU separetely, then we could pre-upload 1000 W matrices one time in advance, and then upload a single VP matrix per frame.... which means that the CPU will be doing 1000x less matrix/upload work in the second situation... but the GPU will be doing Nx more matrix multiplications, where N is the number of vertices drawn.

 

The right choice there would depend on the exact size of the CPU/GPU costs incurred/saved, and how close to your GPU/CPU processing budgets you are.

Share this post


Link to post
Share on other sites
Yes. Multiply once outside is the way to go. If it's doing something static like rendering landscape then yes. A bit more tricky if its your game entities. In that case you need to weigh up instancing for translation and orientation of objects vs updating the matrix on the fly each draw call.

For static yes. For dynamic in low numbers yes. More murky when you start dealing with alot of objects.

Share this post


Link to post
Share on other sites
Thanks guys!

In my case just about all of the geometry will be pre-transformed in my 3D package. So, there won't be any additional rotations, scaling, etc to do either.

Thanks for the advice.

Share this post


Link to post
Share on other sites

Yes.

 

And no, no, no, no, no: this is not premature optimization, it's engineering for efficiency, they're not the same thing and don't listen to anyone who tells you different.

Share this post


Link to post
Share on other sites

I got a similar question about fine performance measurment:

 

Imagine I have in Geometry Shader two loops with known compile-time consts:

for (x = 0; x < 4; ++x) {
    for (y = 0; y < 3; ++y{
       ... DoStuff();
    }
}

This code in release mode gives me "Approximately 22 instruction slots used" (VS compiler will output this info)

 

If I would place [unroll] before each loop, I would have "Approximately 89 instruction slots used".

 

Right now I can measure time in NSight's "Events" window with nonosec-precision and can’t see performance gain between the shaders.

Is there a way to measure the difference in a finer way?

 

The question is similar, because measurement perf. diff in such optimizations (2 matrices vs 1, unroll/not unroll) requires some tool to measure the difference.

Edited by Happy SDE

Share this post


Link to post
Share on other sites

If you can't see any perf difference it might just be because you're bottlenecked elsewhere; e.g. you might be CPU-bound.

Share this post


Link to post
Share on other sites
If you can't see any perf difference it might just be because you're bottlenecked elsewhere; e.g. you might be CPU-bound.

No, I am not CPU bound at all.

This code calculates 4 Shadow Maps in one pass, which is faster, that 4 separate calls (I can see difference in NSight, because it is significant like 50-200% win dependent on quality settings).

This is a macro-optimization.

 

But passing unroll or 1/2 matrices is a micro optimization, which might give me something.

And with current tools I am aware of I can't detect it =(

 

One option - is to calculate instruction count.

But as I understand:

1. Each instruction has it's own cost and just summing them up is not a good idea.

2. NSight's measurement on same scene, with same shader, gives me error about 0.2% between passes.

 

So I am keep searching for a tool that will give me ability to measure micro-optimization perf.

 

The main reason for that: find (and measure) a good practice once, and after that apply it elsewhere without unnecessary code bloating because of some unmeasured speculations.

Edited by Happy SDE

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!