Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Jan 2011
Offline Last Active Yesterday, 08:10 PM

Topics I've Started

D3D12 / Vulkan Synchronization Primitives

26 April 2016 - 02:20 PM

Vulkan and DX12 have very similar API's, but the way they handle their synchronization primitives seem to differ in the fundamental design.


Now, both Vulkan and DirectX 12 have resource barriers, so I'm going to ignore those.


DirectX 12 uses fences with explicit values that are expected to monotonically increase. In the simplest case, you have the swap chain present barrier. I can see two ways to implement fencing in this case:


1) You create N fences. At the end of frame N you signal fence N and then wait on fence (N + 1) % SwapBufferCount.

2) You create 1 fence. At the end of each frame you increment the fence and save off the value. You then wait for the fence to reach the value for frame (N + 1) % SwapBufferCount.


In general, it seems like the "timestamp" approach to fencing is powerful. For instance, I can have a page allocator that retires pages with a fence value and then wait for the fence to reach that point before recycling the page. It seems like creating one fence per command list submission would be expensive (maybe not? how lightweight are fences?).


Now compare this with Vulkan.


Vulkan has the notion of fences, semaphores, and events. They are explained in detail here. All these primitives are binary, it is signaled once and stay signaled until you reset it. I'm less familiar with how to use these kinds of primitives, because you can't do the timestamp approach like you can with DX12 fences.


For instance, to do the page allocator in Vulkan, the fence is the correct primitive to use because it involves synchronizing the state of the hardware queue with the host (i.e. to know when a retired page can be recycled).


In order to do this, I now have to create 1 fence for each vkSubmit call, and the page allocator receives a fence handle instead of a timestamp.


It seems to me like the DirectX-style fence is more flexible, as I would imagine that internally the Vulkan fence is using the same underlying primitive as the DirectX fence to track signaling. In short, it seems like the DirectX timestamp-based fencing allows you to use less fence objects overall.


My primary concern is thinking about a common backend between Vulkan and DX12. It seems like the wiser course of action is to support the Vulkan style binary fences because they can be implemented with DX12 fences. My concern is whether I will lose performance due to creating 1 fence per ExecuteCommandLists call vs 1 overall in DirectX.


For those who understand the underlying hardware and API's deeper than me, I would appreciate some insight into these design decisions.



[D3D12] ID3D12Resource::Map on a default resource

26 April 2016 - 12:29 PM

It seems like it's valid usage, the docs don't seem to mention it. However, isn't the point of a DEFAULT resource that it's not visible to the host? You would have to use an upload heap to stage it.

Resource barrier pre and post states

24 April 2016 - 01:08 PM

Resource barriers expect a pre-state as well as a post-state. Does that pre state have to match the current state of the resource in the driver? Or can you have "don't care" states where you are just transitioning from an unknown state to something new.


For instance, in my frame I might have a transition after the buffer flip that goes from "present" to "render target" mode, and vice versa at the end of the frame.


But what if I create the resource in some other initial state (like common), do I have to explicitly transform it before I hit the main render loop? Or can it go from a common "uninitialized" state to the render target state without issue?



[DX12] Constant Buffer Packing

07 February 2016 - 03:01 AM

Hey all,


I'm trying to wrap my head around some weird behavior I'm seeing in my compute shader. It looks to be related to packing and the nature of float4 vectors on GPUs, but it's super unintuitive.


Basically, I have a constant buffer (HLSL 5.1).

static const int SampleCount = 128;

struct OffsetData
    float2 Samples[SampleCount];

ConstantBuffer<OffsetData> offsetData : register(b1);

In C++, I have a similar layout:


struct OffsetData
    static const size_t SampleCount = 128;

    void Compute(float angle, float width, float height)
        const float CoCMultiplier = CoCSizeMax * 0.05f;
        float x = 0.5f * CoCMultiplier * cosf(angle) * (height / width);
        float y = 0.5f * CoCMultiplier * sinf(angle);

        for (size_t i = 0; i < SampleCount; ++i)
            float t = static_cast<float>(i) / (SampleCount - 1);
            samples[i][0] = Lerp(-x, x, t);
            samples[i][1] = Lerp(-y, y, t);

    float samples[SampleCount][2];


I then have a shader that renders the contents of the constant buffer to the screen. Basically, I map the current uv from [0, 1] -> [0, SampleCount - 1] and then return the contents of the constant buffer as the buffer color.


I get really weird results:


If I change everything to use floats (i.e. OffsetData.Samples is an array of SamplesCount floats), I get this:


Attached File  float1.PNG   32.89KB   0 downloads


This is indexing the constant buffer from 0 to SamplesCount - 1. It's basically skipping the buffer in increments of 4.


For float2:


Attached File  float2.PNG   39.55KB   0 downloads


Float3 (this one looks really weird, I don't even understand what happened):


Attached File  float3.PNG   61.56KB   0 downloads


And finally, the "correct" one where everything uses float4's:


Attached File  float4.PNG   26.31KB   0 downloads


Naturally, it seems like there's something inherent to vec4's going on. But it doesn't make sense. I should be able to index an array of floats, right? What am I missing?

[D3D12] Command Allocator / Command List usage

23 January 2016 - 06:04 PM

Hey all,


In the MSDN docs, they describe the dynamic between ID3D12CommandAllocator and ID3D12CommandList.


Immediately after being created, command lists are in the recording state. You can also re-use an existing command list by calling ID3D12GraphicsCommandList::Reset, which also leaves the command list in the recording state. Unlike ID3D12CommandAllocator::Reset, you can call Reset while the command list is still being executed. A typical pattern is to submit a command list and then immediately reset it to reuse the allocated memory for another command list. Note that only one command list associated with each command allocator may be in a recording state at one time.


I understand that a command allocator is the heap for which commands in a command list are allocated. My assumption is that this is a sort of growing heap with a water mark that will remain at a certain size to avoid further allocations (this must be the case, since you don't define a size for the thing at creation time).


If true, it makes sense that a command allocator is resident physical memory which the command list records into. In the samples, it appears as though one a command allocator is created for each command list. This makes sense; however, according to the docs, it appears that a command allocator can be used on any command list so long as only one of them is in the recording state at one time.


Now, the part that confuses me is that it's okay to reset the command list and reuse it immediately, but it's not okay to reset the command allocator until the command list is finished on the GPU.


Command List Reset: I would venture to guess this preserves the contents of the original commands within the command allocator and starts a fresh record?


Command Allocator Reset: It seems as though this is literally destroying the contents of the heap, which may have command list data active on the GPU.


My big question is this: How does the memory ownership work between the command allocator and command list? Is the allocator doing implicit double (or more?) buffering on a command list reset? What's the actual difference between ID3D12CommandList::Reset and ID3D12CommandAllocator::Reset