• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jason Smith
      While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
      #define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
      mov  qword ptr [rdx],rax
      which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.
       
    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
  • Advertisement
  • Advertisement
Sign in to follow this  

DX12 [DX12] Fences and swap chain Present.

This topic is 753 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

 

I am looking at the most basic Microsoft's samples for DX12 and I was wondering about the way they use fences for forcing a wait after the call to "Present".

Isn't "Present" already doing that for us?

 

I mean, if not, why do we have to pass in a command queue when we create the swap chain?

// ...

// Present the frame.
ThrowIfFailed(m_swapChain->Present(0, 0));

MoveToNextFrame();

// ...
void D3D12Fullscreen::MoveToNextFrame()
{
	// Schedule a Signal command in the queue.
	const UINT64 currentFenceValue = m_fenceValues[m_frameIndex];
	ThrowIfFailed(m_commandQueue->Signal(m_fence.Get(), currentFenceValue));

	// Update the frame index.
	m_frameIndex = m_swapChain->GetCurrentBackBufferIndex();

	// If the next frame is not ready to be rendered yet, wait until it is ready.
	if (m_fence->GetCompletedValue() < m_fenceValues[m_frameIndex])
	{
		ThrowIfFailed(m_fence->SetEventOnCompletion(m_fenceValues[m_frameIndex], m_fenceEvent));
		WaitForSingleObjectEx(m_fenceEvent, INFINITE, FALSE);
	}

	// Set the fence value for the next frame.
	m_fenceValues[m_frameIndex] = currentFenceValue + 1;
}

Maybe I missed something but I placed a breakpoint in the "if (m_fenceCompletedValue < m_fenceValues[m_frameIndex])" and never reached it presumably because "Present" already blocked until a buffer was ready.

 

Thanks & Cheers,

 

Shnoutz

Share this post


Link to post
Share on other sites
Advertisement

Ok, I'm wrong, the fence is required. If I remove it I get a ton of error messages.

 

But my question still stand, why do we need the fence AND specify a command queue to the swap chain?
 

ThrowIfFailed(factory->CreateSwapChainForHwnd(
	m_commandQueue.Get(),		// Swap chain needs the queue so that it can force a flush on it.
	Win32Application::GetHwnd(),
	&swapChainDesc,
	nullptr,
	nullptr,
	&swapChain
	));

Share this post


Link to post
Share on other sites

The fence is for your command lists -- you can't reset/reuse a command list if it's sitting in a command queue, because that would screw it up by the time the gpu gets to it.

 

Your swap chain's command queue which consumes your command lists is being throttled by presentations. Your command lists presumably are writing to a swap chain texture or else you're not going to see anything, and that texture might be in use or already written to and waiting to be used.

 

Therefore, the fences aren't really for the swap chain, they're for your command lists, which just happen to be synchronized by the swap chain if they're using that command queue because they need access to the swap chain's texture resource.

 

If you're using a different queue not associated with a swap chain, that one wouldn't get synchronized by presents. You'd still need a fence of some sort though assuming you're reusing command lists.

 

Note that you can avoid the usage of a fence with a waitable swap chain because if a previous frame's texture is accessible, it should stand to reason that the command list operating on it is also finished.

Share this post


Link to post
Share on other sites

Thanks for the answer :)
 

I am mostly curious about that last point you mentioned:
 

 

 

Note that you can avoid the usage of a fence with a waitable swap chain because if a previous frame's texture is accessible, it should stand to reason that the command list operating on it is also finished.

 

Why does the swap chain need to be waitable for this to work?

Share this post


Link to post
Share on other sites

They're actually really cool. https://software.intel.com/en-us/articles/sample-application-for-direct3d-12-flip-model-swap-chains

 

Conceptually, the waitable object can be thought of as a semaphore which is initialized to the Maximum Frame Latency, and signaled whenever a present is removed from the Present Queue. If an application waits for the semaphore to be signalled before rendering then the present queue is not full (so Present will not block), and the latency is eliminated.

 

You're sacrificing throughput because you could theoretically be writing to a command list while you're instead waiting for the next present to be ready, but on the other hand, you're not generating a super early frame from player input that's going to sit around for a while. But if you know that the next present has no chance of blocking the command list writing to its texture must be finished.

Instead of blocking you could alternatively do some compute task that's not dependent on user input and doesn't need to write to the swap chain.

Share this post


Link to post
Share on other sites

If you use a waitable swap chain in place of fences, you'll be limited to half the screen refresh rate. Unless you are uncapping fps (which i believe would give you tears in your screen, haven't done it myself though so can't speak from experience), the present waits for the screen to refresh before displaying the next frame. Once you finish waiting for present, the screen starts refreshing again, but present has no new frame to flip to since you are working on building it. So one refresh cycle you would display the frame, the next you would be filling in the frame. if you don't mind fps being half refresh rates (fps would max at 30 if monitor refresh rate is 60) that's not a problem, but you are wasting a lot of time you could be using to fill in the next frame (i guess it would only be wasting if you had a lot of gpu work that needed to be done to build the next frame).

 

Just actually read all of what Dingleberry said and realized he already mentioned what i just said above. I'll leave this here because it explains it in a little more detail though

Edited by iedoc

Share this post


Link to post
Share on other sites

You won't be halving your frame rate necessarily, you just have less wiggle room to absorb spikes.  

 

https://software.intel.com/sites/default/files/managed/cd/fb/1_gamemode.png

 

The yellow blocks are the part where you're blocking on waitforsingleobject, or alternatively doing some other work that isn't related to using the buffer in the top of the same column. But there's still a perfect stream of presented frames because in the above example there's three swap chain buffers.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement