Sign in to follow this  
Infinisearch

DX12 estimating how much time gpu work will take

Recommended Posts

So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:

Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls

Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists

Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs

Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones

OS takes ~60μs to schedule upcoming work

So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?

What do you do to take this into account?

What about other things like transitions?  I can only think of actual measurement in this case.

Share this post


Link to post
Share on other sites

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

If you've got 16ms of work per frame and it's evenly split over 30 command lists, that's an average 500μs per list (so maybe some are 200μs while others are 800μs), which fits with their advice to aim for at least 200-500μs per list :)

2 hours ago, Infinisearch said:

Benchmarking for a particular piece of hardware seems impractical.

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

Share this post


Link to post
Share on other sites
12 hours ago, Hodgman said:

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

It says "you should aim for" but I'm not sure.  But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU.  So that implies multiple calls to ECLs which implies multiple command lists.  So it most likely means no more than approx 15-30 command lists but more than one.

12 hours ago, Hodgman said:

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

Share this post


Link to post
Share on other sites
8 hours ago, Infinisearch said:

But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU

You can inspect this kind of thing with GPUView :) 

This is from D3D11, not 12, but it's the screenshot I have at hand and it gives the same idea... In this I'm launching maybe three or so command lists per frame, and they queue up beautifully. My process's device context always contains about one full frame's worth of work and the actual HW queue is always saturated. The important thing is to submit large'ish blocks of work in a regular rhythm, and to keep your CPU frame-time below your GPU frame-time.

XF8d4Vl.png

If your CPU frame-time and GPU-frame time are perfectly matched, then you'll be able to fully utilize both... Otherwise, you'll be fully utilizing one, while the other one idles. Typically games / gamers choose to be GPU-bound, which means the CPU will idle a little bit each frame waiting for the GPU to catch up, which allows the GPU to be busy 100% of the time.

8 hours ago, Infinisearch said:

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

If a future PC improves CPU performance and GPU performance in equal amounts, then your game will run with pretty much the same utilization percentages as now, but at a higher framerate.
If a future PC improves GPU performance only, then yeah, the GPU will idle while the CPU maxes out. 
If a future PC improves CPU performance only, then the CPU will idle while the GPU maxes out.

From a gamedev business perspective though --
Is optimizing your game to lower your minimum HW requirements going to result in more sales?
Is optimizing for future PC's that don't exist yet going to result in more sales?
In either case, will those sales make more money than the cost of the optimization work?

If your minimum HW requirements are an Xbox One, then yeah, supporting it is going to result in a lot more sales = money.
Seeing that most games sell with a massive peak of sales in the first three months, followed by a very short tail (most games do not continue selling significant numbers for years after launch), optimizing for PC's that don't exist yet is a waste of money.
Also, if your game does continue to sell for years to come, you can always release optimization patches in the future, when you've actually got future HW to test against :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      627754
    • Total Posts
      2978947
  • Similar Content

    • By mark_braga
      I am working on a VR project where we have two devices. One for the renderer belonging to the game engine and the other used to present the textures to the VR screens.
      We ported both the game engine renderer and the VR renderer to DirectX12 recently. I haven't seen any examples of sharing textures across devices in DirectX12. Microsoft has an example on cross adapter sharing but we are only dealing with one GPU.
      Can we create a shared heap for two devices like we do for two adapters? Is there a way to do async copy between two devices? If async copy is possible, it would be ideal since we already have designed our engine along the lines of taking the most advantage of async copy and compute.
      Any guidance on this will really help to reduce the texture transfer overhead.
      Thank you
    • By Mr_Fox
      Hi Guys,
      Does anyone know how to grab a video frame on to DX texture easily just using Windows SDK? or just play video on DX texture easily without using 3rd party library?  I know during DX9 ages, there is a DirectShow library to use (though very hard to use). After a brief search, it seems most game dev settled down with Bink and leave all hobbyist dx programmer struggling....
      Having so much fun play with Metal video playback (super easy setup just with AVKit, and you can grab movie frame to your metal texture), I feel there must be a similar easy path for video playback on dx12 but I failed to find it.
      Maybe I missed something? Thanks in advance for anyone who could give me some path to follow
    • By _void_
      Hello guys,
      I have a texture of format DXGI_FORMAT_B8G8R8A8_UNORM_SRGB.
      Is there a way to create shader resource view for the texture so that I could read it as RGBA from the shader instead of reading it specifically as BGRA?
      I would like all the textures to be read as RGBA.
       
      Tx
    • By _void_
      Hello guys,
      I am wondering why D3D12 resource size has type UINT64 while resource view size is limited to UINT32.
      typedef struct D3D12_RESOURCE_DESC { … UINT64                   Width; … } D3D12_RESOURCE_DESC; Vertex buffer view can be described in UINT32 types.
      typedef struct D3D12_VERTEX_BUFFER_VIEW { D3D12_GPU_VIRTUAL_ADDRESS BufferLocation; UINT                      SizeInBytes; UINT                      StrideInBytes; } D3D12_VERTEX_BUFFER_VIEW; For the buffer we can specify offset for the first element as UINT64 but the buffer view should still be defined in UINT32 terms.
      typedef struct D3D12_BUFFER_SRV { UINT64                 FirstElement; UINT                   NumElements; UINT                   StructureByteStride; D3D12_BUFFER_SRV_FLAGS Flags; } D3D12_BUFFER_SRV; Does it really mean that we can create, for instance, structured buffer of floats having MAX_UNIT64 elements (MAX_UNIT64 * sizeof(float) in byte size) but are not be able to create shader resource view which will enclose it completely since we are limited by UINT range?
      Is there a specific reason for this? HLSL is restricted to UINT32 values. Calling function GetDimensions() on the resource of UINT64 size will not be able to produce valid values. I guess, it could be one of the reasons.
       
      Thanks!
    • By pcmaster
      Hello!
      Is it possible to mix ranges of samplers and ranges of SRVs and ranges of UAVs in one root parameter descriptor table? Like so:
      D3D12_DESCRIPTOR_RANGE ranges[3]; D3D12_ROOT_PARAMETER param; param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; param.DescriptorTable.NumDescriptorRanges = 3; param.DescriptorTable.pDescriptorRanges = ranges; range[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV; .. range[1].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV; .. range[2].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER; .. I wonder especially about CopyDescriptors, that will need to copy a range of D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER and a range of D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV.
      Thanks if anyone knows (while I try it :))
      .P
  • Popular Now