Jump to content
  • Advertisement
Sign in to follow this  
Infinisearch

DX12 estimating how much time gpu work will take

Recommended Posts

So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:

Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls

Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists

Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs

Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones

OS takes ~60μs to schedule upcoming work

So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?

What do you do to take this into account?

What about other things like transitions?  I can only think of actual measurement in this case.

Share this post


Link to post
Share on other sites
Advertisement

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

If you've got 16ms of work per frame and it's evenly split over 30 command lists, that's an average 500μs per list (so maybe some are 200μs while others are 800μs), which fits with their advice to aim for at least 200-500μs per list :)

2 hours ago, Infinisearch said:

Benchmarking for a particular piece of hardware seems impractical.

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

Share this post


Link to post
Share on other sites
12 hours ago, Hodgman said:

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

It says "you should aim for" but I'm not sure.  But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU.  So that implies multiple calls to ECLs which implies multiple command lists.  So it most likely means no more than approx 15-30 command lists but more than one.

12 hours ago, Hodgman said:

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

Share this post


Link to post
Share on other sites
8 hours ago, Infinisearch said:

But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU

You can inspect this kind of thing with GPUView :) 

This is from D3D11, not 12, but it's the screenshot I have at hand and it gives the same idea... In this I'm launching maybe three or so command lists per frame, and they queue up beautifully. My process's device context always contains about one full frame's worth of work and the actual HW queue is always saturated. The important thing is to submit large'ish blocks of work in a regular rhythm, and to keep your CPU frame-time below your GPU frame-time.

XF8d4Vl.png

If your CPU frame-time and GPU-frame time are perfectly matched, then you'll be able to fully utilize both... Otherwise, you'll be fully utilizing one, while the other one idles. Typically games / gamers choose to be GPU-bound, which means the CPU will idle a little bit each frame waiting for the GPU to catch up, which allows the GPU to be busy 100% of the time.

8 hours ago, Infinisearch said:

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

If a future PC improves CPU performance and GPU performance in equal amounts, then your game will run with pretty much the same utilization percentages as now, but at a higher framerate.
If a future PC improves GPU performance only, then yeah, the GPU will idle while the CPU maxes out. 
If a future PC improves CPU performance only, then the CPU will idle while the GPU maxes out.

From a gamedev business perspective though --
Is optimizing your game to lower your minimum HW requirements going to result in more sales?
Is optimizing for future PC's that don't exist yet going to result in more sales?
In either case, will those sales make more money than the cost of the optimization work?

If your minimum HW requirements are an Xbox One, then yeah, supporting it is going to result in a lot more sales = money.
Seeing that most games sell with a massive peak of sales in the first three months, followed by a very short tail (most games do not continue selling significant numbers for years after launch), optimizing for PC's that don't exist yet is a waste of money.
Also, if your game does continue to sell for years to come, you can always release optimization patches in the future, when you've actually got future HW to test against :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!