• Advertisement
Sign in to follow this  

DX12 estimating how much time gpu work will take

Recommended Posts

So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:

Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls

Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists

Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs

Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones

OS takes ~60μs to schedule upcoming work

So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?

What do you do to take this into account?

What about other things like transitions?  I can only think of actual measurement in this case.

Share this post


Link to post
Share on other sites
Advertisement

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

If you've got 16ms of work per frame and it's evenly split over 30 command lists, that's an average 500μs per list (so maybe some are 200μs while others are 800μs), which fits with their advice to aim for at least 200-500μs per list :)

2 hours ago, Infinisearch said:

Benchmarking for a particular piece of hardware seems impractical.

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

Share this post


Link to post
Share on other sites
12 hours ago, Hodgman said:

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

It says "you should aim for" but I'm not sure.  But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU.  So that implies multiple calls to ECLs which implies multiple command lists.  So it most likely means no more than approx 15-30 command lists but more than one.

12 hours ago, Hodgman said:

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

Share this post


Link to post
Share on other sites
8 hours ago, Infinisearch said:

But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU

You can inspect this kind of thing with GPUView :) 

This is from D3D11, not 12, but it's the screenshot I have at hand and it gives the same idea... In this I'm launching maybe three or so command lists per frame, and they queue up beautifully. My process's device context always contains about one full frame's worth of work and the actual HW queue is always saturated. The important thing is to submit large'ish blocks of work in a regular rhythm, and to keep your CPU frame-time below your GPU frame-time.

XF8d4Vl.png

If your CPU frame-time and GPU-frame time are perfectly matched, then you'll be able to fully utilize both... Otherwise, you'll be fully utilizing one, while the other one idles. Typically games / gamers choose to be GPU-bound, which means the CPU will idle a little bit each frame waiting for the GPU to catch up, which allows the GPU to be busy 100% of the time.

8 hours ago, Infinisearch said:

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

If a future PC improves CPU performance and GPU performance in equal amounts, then your game will run with pretty much the same utilization percentages as now, but at a higher framerate.
If a future PC improves GPU performance only, then yeah, the GPU will idle while the CPU maxes out. 
If a future PC improves CPU performance only, then the CPU will idle while the GPU maxes out.

From a gamedev business perspective though --
Is optimizing your game to lower your minimum HW requirements going to result in more sales?
Is optimizing for future PC's that don't exist yet going to result in more sales?
In either case, will those sales make more money than the cost of the optimization work?

If your minimum HW requirements are an Xbox One, then yeah, supporting it is going to result in a lot more sales = money.
Seeing that most games sell with a massive peak of sales in the first three months, followed by a very short tail (most games do not continue selling significant numbers for years after launch), optimizing for PC's that don't exist yet is a waste of money.
Also, if your game does continue to sell for years to come, you can always release optimization patches in the future, when you've actually got future HW to test against :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Now

  • Advertisement
  • Similar Content

    • By AxeGuywithanAxe
      I wanted to see how others are currently handling descriptor heap updates and management.
      I've read a few articles and there tends to be three major strategies :
      1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc)
      2) You have one descriptor heap for an entire pipeline
      3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc)
      The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient.
      The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
    • By VietNN
      Hi all,
      I want to copy  just 1 mipmap level of a texture and I am doing like this:
      void CopyTextureRegion( &CD3DX12_TEXTURE_COPY_LOCATION(pDstData, mipmapIndex), 0, 0, 0, &CD3DX12_TEXTURE_COPY_LOCATION(pSrcData, pLayout), nullptr ); - pDstData : is DEFAULT_HEAP, pSrcData is UPLOAD_HEAP(buffer size was get by GetCopyableFootprints from pDstData with highest miplevel), pLayout is D3D12_PLACED_SUBRESOURCE_FOOTPRINT
      - I think the mipmapIndex will point the exact location data of Dest texture, but does it know where to get data location from Src texture because pLayout just contain info of this mipmap(Offset and Footprint).  (???)
      - pLayout has a member name Offset, and I try to modify it but it(Offset) need 512 Alignment but real offset in Src texture does not.
      So what I need to do to match the location of mip texture in Src Texture ?
      @SoldierOfLight @galop1n
    • By _void_
      Hello!
      I am wondering if there is a way to find out how many resources you could bind to the command list directly without putting them in a descriptor table.
      Specifically, I am referring to these guys:
      - SetGraphicsRoot32BitConstant
      - SetGraphicsRoot32BitConstants
      - SetGraphicsRootConstantBufferView
      - SetGraphicsRootShaderResourceView
      - SetGraphicsRootUnorderedAccessView
      I remember from early presentations on D3D12 that the count of allowed resources is hardware dependent and quite small. But I would like to learn some more concrete figures.
    • By lubbe75
      I am trying to set up my sampler correctly so that textures are filtered the way I want. I want to use linear filtering for both min and mag, and I don't want to use any mipmap at all.
      To make sure that mipmap is turned off I set the MipLevels to 1 for my textures.
      For the sampler filter I have tried all kind of combinations, but somehow the mag filter works fine while the min filter doesn't seem to work at all. As I zoom out there seems to be a nearest point filter.
      Is there a catch in Dx12 that makes my min filter not working?
      Do I need to filter manually in my shader? I don't think so since the mag filter works correctly.
      My pixel shader is just a simple texture lookup:
      textureMap.Sample(g_sampler, input.uv); My sampler setup looks like this (SharpDX):
      sampler = new StaticSamplerDescription() { Filter = Filter.MinMagLinearMipPoint, AddressU = TextureAddressMode.Wrap, AddressV = TextureAddressMode.Wrap, AddressW = TextureAddressMode.Wrap, ComparisonFunc = Comparison.Never, BorderColor = StaticBorderColor.TransparentBlack, ShaderRegister = 0, RegisterSpace = 0, ShaderVisibility = ShaderVisibility.Pixel, };  
    • By lubbe75
      Does anyone have a working example of how to implement MSAA in DX12? I have read short descriptions and I have seen code fragments on how to do it with DirectX Tool Kit.
      I get the idea, but with all the pipeline states, root descriptions etc I somehow get lost on the way.
      Could someone help me with a link pointing to a small implementation in DirectX 12 (or SharpDX with DX12)?
       
  • Advertisement