estimating how much time gpu work will take

Started by
3 comments, last by Hodgman 6 years, 6 months ago

So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:

Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls

Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists

Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs

Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones

OS takes ~60μs to schedule upcoming work

So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?

What do you do to take this into account?

What about other things like transitions?  I can only think of actual measurement in this case.

-potential energy is easily made kinetic-

Advertisement

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

If you've got 16ms of work per frame and it's evenly split over 30 command lists, that's an average 500μs per list (so maybe some are 200μs while others are 800μs), which fits with their advice to aim for at least 200-500μs per list :)

2 hours ago, Infinisearch said:

Benchmarking for a particular piece of hardware seems impractical.

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

12 hours ago, Hodgman said:

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

It says "you should aim for" but I'm not sure.  But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU.  So that implies multiple calls to ECLs which implies multiple command lists.  So it most likely means no more than approx 15-30 command lists but more than one.

12 hours ago, Hodgman said:

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

-potential energy is easily made kinetic-

8 hours ago, Infinisearch said:

But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU

You can inspect this kind of thing with GPUView :) 

This is from D3D11, not 12, but it's the screenshot I have at hand and it gives the same idea... In this I'm launching maybe three or so command lists per frame, and they queue up beautifully. My process's device context always contains about one full frame's worth of work and the actual HW queue is always saturated. The important thing is to submit large'ish blocks of work in a regular rhythm, and to keep your CPU frame-time below your GPU frame-time.

XF8d4Vl.png

If your CPU frame-time and GPU-frame time are perfectly matched, then you'll be able to fully utilize both... Otherwise, you'll be fully utilizing one, while the other one idles. Typically games / gamers choose to be GPU-bound, which means the CPU will idle a little bit each frame waiting for the GPU to catch up, which allows the GPU to be busy 100% of the time.

8 hours ago, Infinisearch said:

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

If a future PC improves CPU performance and GPU performance in equal amounts, then your game will run with pretty much the same utilization percentages as now, but at a higher framerate.
If a future PC improves GPU performance only, then yeah, the GPU will idle while the CPU maxes out. 
If a future PC improves CPU performance only, then the CPU will idle while the GPU maxes out.

From a gamedev business perspective though --
Is optimizing your game to lower your minimum HW requirements going to result in more sales?
Is optimizing for future PC's that don't exist yet going to result in more sales?
In either case, will those sales make more money than the cost of the optimization work?

If your minimum HW requirements are an Xbox One, then yeah, supporting it is going to result in a lot more sales = money.
Seeing that most games sell with a massive peak of sales in the first three months, followed by a very short tail (most games do not continue selling significant numbers for years after launch), optimizing for PC's that don't exist yet is a waste of money.
Also, if your game does continue to sell for years to come, you can always release optimization patches in the future, when you've actually got future HW to test against :D

This topic is closed to new replies.

Advertisement