Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Jan 2011
Offline Last Active Oct 01 2016 04:18 PM

Posts I've Made

In Topic: [D3D12] Debug validation weirdness

05 July 2016 - 09:35 PM

It turns out it's the UAV barrier that's barking at me, and it only seems to happen if I do a UAV barrier on a command list without first transitioning the resource to a UAV in that command list, which seems wrong.

In Topic: [D3D12] How to correctly update constant buffers in different scenarios.

13 June 2016 - 09:57 PM

A thing I'm struggling with right now is how to handle mapping of resources across multiple command lists.



// Thread 1

ConstData cd;
cd.data = 1;

ptr = consBuffer.Map();
memcpy(ptr, cd);


cd.data = 2;

ptr = consBuffer.Map();
memcpy(ptr, cd);


// thread 2

CL2->draw(obj3) // use const buffer being written in thread 1

// submission thread:

CL1 -> CL2

One approach I've seen is to cache changes to the resource in a command list-local cache, and then update the contents of the buffers when the command lists are serialized to the submission thread.

In Topic: [D3D12] Command list submission ordering

26 May 2016 - 11:59 AM

Great, that's what I expected. I feel like a lot of documents assume you know that and gloss over it.


On that note, when building a task graph, it seems like it's wise to statically bake out your high level render passes on your queue submission thread, batch up all the command lists in those passes (e.g. wait until all your Z-prepass lists come in, for instance), and then submit in dependency sorted order (wait to submit g-buffer until z-prepass group has been submitted).

In Topic: Descriptor binding: DX12 and Vulkan

25 May 2016 - 05:39 PM

I've been thinking more about this, and I've come to realize some things.


I did some investigation into how some real workloads are handling the root signature. I found that a vast majority of what I saw have a structure similar to this:


DX12 style binding slots:


For bucketed scene draws:


0: Some push constants

1: per draw constant buffer

2: per pass constant buffer

3: per material constant buffer

4: A list of SRVs


For various post processing jobs:


0+ constant buffers

simple table of UAVs

simple table of SRVs


I didn't find any use cases where different regions of the same descriptor table were used for different stuff... for the most part is seems a simple list of SRVs / UAVs is enough.


I also realized that Vulkan has the strong notion of a render pass, and that UAVs could be factored into render passes as outputs (which are then transitioned to SRVs).


To me, it seems like having constant buffer binding slots, a way to bind a list of SRVs to the draw call, and a way to bind a list of UAVs to a render pass is enough to support most scenarios.


With regards to list allocation, it seems like descriptor layouts are going to be bounded by the application. Like you said, Witek902, you could just create a free list pool for descriptors and orphan them on update into a recycle queue. Static descriptor sets just get allocated once and held.


For DX12, you could model that same technique by allocating fixed size pieces out of a descriptor heap, or use some sort of buddy allocator. With the descriptor heap approach it becomes a bit weirder because it seems the ideal use case scenario is to keep the same heap bound for the whole frame.


I also read in Gpu Fast Paths that using dynamic constant buffers eats up 4 registers of the USER-DATA memory devoted to the pipeline layout. Apparently using a push constant to offset into a big table is more performant (I'm not sure how portable this is to platforms like mobile). 


Anyway, just some thoughts.

In Topic: D3D12 / Vulkan Synchronization Primitives

26 April 2016 - 09:54 PM

Don't you have the option of using one fence per frame on both API's? (i.e. just fence the final vkSumbit for a frame)


I believe nVidia explicitly calls out that you should avoid batch submitting your entire frame at the very end. I believe the idea to keep the hardware busy with ~5 vkSubmit / ExecuteCommandList calls per frame?


That said, I guess there are several ways you can pipeline your frame.