• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jason Smith
      While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
      #define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
      mov  qword ptr [rdx],rax
      which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.
       
    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
  • Advertisement
  • Advertisement
Sign in to follow this  

DX12 estimating how much time gpu work will take

Recommended Posts

So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:

Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls

Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists

Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs

Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones

OS takes ~60μs to schedule upcoming work

So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?

What do you do to take this into account?

What about other things like transitions?  I can only think of actual measurement in this case.

Share this post


Link to post
Share on other sites
Advertisement

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

If you've got 16ms of work per frame and it's evenly split over 30 command lists, that's an average 500μs per list (so maybe some are 200μs while others are 800μs), which fits with their advice to aim for at least 200-500μs per list :)

2 hours ago, Infinisearch said:

Benchmarking for a particular piece of hardware seems impractical.

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

Share this post


Link to post
Share on other sites
12 hours ago, Hodgman said:

Is that "You should have at least 15 but no more than 30 command lists", or is it "You should have no more than approx 15-30 command lists"? I've typically always used very few command lists when possible, even just one per frame on many games!

It says "you should aim for" but I'm not sure.  But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU.  So that implies multiple calls to ECLs which implies multiple command lists.  So it most likely means no more than approx 15-30 command lists but more than one.

12 hours ago, Hodgman said:

Benchmark for your high-spec hardware, and then you know that for low-spec hardware the times will be bigger, and for higher-spec hardware you don't care because performance won't be an issue for users that have better hardware!

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

Share this post


Link to post
Share on other sites
8 hours ago, Infinisearch said:

But if I were to guess, puttting all your work into one command list could leave the GPU idle while you're batch up work on the CPU

You can inspect this kind of thing with GPUView :) 

This is from D3D11, not 12, but it's the screenshot I have at hand and it gives the same idea... In this I'm launching maybe three or so command lists per frame, and they queue up beautifully. My process's device context always contains about one full frame's worth of work and the actual HW queue is always saturated. The important thing is to submit large'ish blocks of work in a regular rhythm, and to keep your CPU frame-time below your GPU frame-time.

XF8d4Vl.png

If your CPU frame-time and GPU-frame time are perfectly matched, then you'll be able to fully utilize both... Otherwise, you'll be fully utilizing one, while the other one idles. Typically games / gamers choose to be GPU-bound, which means the CPU will idle a little bit each frame waiting for the GPU to catch up, which allows the GPU to be busy 100% of the time.

8 hours ago, Infinisearch said:

This doesn't seem like the right way to do it to me.  But more importantly lets say your game runs at 60fps at X resolution on current top end hardware and you want to support higher frame rates at the same resolution on future hardware.  If you don't batch up enough work your GPU is going to idle and you won't reach peak framerates.

If a future PC improves CPU performance and GPU performance in equal amounts, then your game will run with pretty much the same utilization percentages as now, but at a higher framerate.
If a future PC improves GPU performance only, then yeah, the GPU will idle while the CPU maxes out. 
If a future PC improves CPU performance only, then the CPU will idle while the GPU maxes out.

From a gamedev business perspective though --
Is optimizing your game to lower your minimum HW requirements going to result in more sales?
Is optimizing for future PC's that don't exist yet going to result in more sales?
In either case, will those sales make more money than the cost of the optimization work?

If your minimum HW requirements are an Xbox One, then yeah, supporting it is going to result in a lot more sales = money.
Seeing that most games sell with a massive peak of sales in the first three months, followed by a very short tail (most games do not continue selling significant numbers for years after launch), optimizing for PC's that don't exist yet is a waste of money.
Also, if your game does continue to sell for years to come, you can always release optimization patches in the future, when you've actually got future HW to test against :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Advertisement