• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jason Smith
      While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
      #define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
      mov  qword ptr [rdx],rax
      which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.
       
    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
  • Advertisement
  • Advertisement
Sign in to follow this  

DX12 DX12 Multithreaded Rendering Architecture

This topic is 981 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have two questions.

 

As I understand it, you generally want to keep your GPU a frame or two behind your CPU.  While your GPU is rendering a frame, the CPU is generating the draw calls for the next frame, so that there isn't a bottleneck between the two.

 

1) How does this work in practical terms?  Is the CPU side, "generating the draw calls", just building the command lists with calls to DrawIndexedInstanced and the like?  And then to actually perform the rendering, the GPU side, you call ExecuteCommandLists?

 

2) In terms of multi-threaded rendering, is that a misnomer?  Are the other threads just generating draw calls, with the main rendering thread being the only thing that actually calls ExecuteCommandLists?  Or can you simultaneously render to various textures, and then your main rendering thread uses them to generate a frame for the screen?

Share this post


Link to post
Share on other sites
Advertisement
1) Basically, yes. You'll want to build everything up front (command lists, update any buffers from the CPU side/setup copy operations) before pushing them to the GPU to execute allowing the GPU to chew on that data while you setup the next things for it to render. You can overlap things of course, so it doesn't have to be a case of [generate everything][push everything to gpu], you could execute draw commands as things finish up. So you could generate say all the shadow map command lists, then push those to the gpu before generating the colour passes. (in fact you could dedicate one 'task' to pushing while at the same time starting to generate the colour pass lists.)

2) Yes and no.
Generally its accepted to mean the first bit, that draw calls are generated across multiple threads and queued as work by a single thread (or task) to ensure correct ordering.
That said if you could keep your dependencies in order then there is nothing stopping you queuing work from multiple threads, although I'd have to check the thread safety of the various command queues to see what locks/protection you might need.

However your 'render to various textures' thing brings up a second part; the GPU is itself highly threaded so even if you have one thread pushing execute commands the GPU itself can have multiple commands in flight at once (dependencies allowing) so regardless of what method you use to queue work to the device it can be doing multiple things at the same time.

Share this post


Link to post
Share on other sites

1) How does this work in practical terms? &amp;amp;amp;amp;nbsp;Is the CPU side, "generating the draw calls", just building the command lists with calls to DrawIndexedInstanced and the like? &amp;amp;amp;amp;nbsp;And then to actually perform the rendering, the GPU side, you call ExecuteCommandLists?

That's not exactly how it works. When you call ExecuteCommandLists() it's more the equivalent of how you called draw() previously on an immediate context. It's like a very efficient draw() (because supposedly the hard work has been done already).

What happens next is that the driver/OS will queue those calls (draw and executecommandlists) as the GPU processes them in order they've been received. That's the queue you're concerned about.
So you don't have really to do anything to make the GPU go behind the CPU. It is already behind it and the more GPU bound your are the more behind the CPU it's going to be. (if you are totally CPU bound.. then the GPU is only 0 step behind the CPU).
What you can do is take measures to limit how far ahead the CPU can go by waiting on the CPU side for the GPU to advance a certain point before submitting commands again (you can use fences for example). The reason you'd want to limit how far ahead the CPU is is for example : to save memory (you have to keep things in memory for as long the GPU can use them, so for buffers, the highest the number of commands in flight the bigger the buffer have to be. We're not talking about vsync and double buffering here (the sync is between GPU and screen) but more like constant buffer updates, dynamic vertex data, renderstates and so on (the sync is between CPU and GPU)), and to limit latency (if you record commands too early then the player will see the result of their actions a long time after they've done them).

TLDR; what I wanted to say is that execute() is not the message for the GPU to go ahead immediately. There are more queues involved and that's what is meant by letting the GPU go behind the CPU.

2) In terms of multi-threaded rendering, is that a misnomer? &amp;nbsp;Are the other threads just generating draw calls, with the main rendering thread being the only thing that actually calls ExecuteCommandLists? &amp;nbsp;Or can you simultaneously render to various textures, and then your main rendering thread uses them to generate a frame for the screen?

I don't think anybody (who knows how things work) actually think that. Your GPU mostly accepts things in a serial manner (while being able to process them in a massively parallel way..). Note : In d3d12 you are also able to submit work to separate engines (who will begin and end things on a separate queue but they target the same processing units). BUT the "multithreaded" word refers of course to the CPU building of commands. Building those commands is expensive, the submitting part is less expensive so you factor out the building you can reduce the cost to mostly "submitting" and building in parallel (the multithreaded part). On older APIs like d3d9 and d3d10 (discounting the multithreaded D3D11 that was supposed to work more like d3d12 today but didn't for variety of reasons) the building and submitting are basically one and the same and because the submitting had to happen in a serialized fashion you couldn't get much advantage to using multiple CPU cores for building the commands (you could get some but let's not go there).

Share this post


Link to post
Share on other sites

The way i am handling that in my code base is that i have  1 thread for GraphicsSubmission and 1 thread for ComputeSubmission and N threads for CommandList build up. Two CommandQueueList Class which i create that are thread safe so adding and removing command list from multiple threads there will be no clash. The graphics/compute thread just spin and checks to see if their appropriate command queue have any command list to execute. Doing it like that you can keep filling up new command list without having to wait on the gpu to execute them.

Edited by BornToCode

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement