Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Jan 2011
Offline Last Active Today, 07:33 AM

#5292977 Descriptor binding: DX12 and Vulkan

Posted by ZBethel on 22 May 2016 - 09:24 PM

I've been reading up on how the resource binding methods work in Vulkan and DX12. I'm trying to figure out how to best design an API that abstracts the two with respect to binding descriptors to the pipeline. Naturally, the two API's are similar, but I'm finding that they treat descriptor binding differently in subtle ways.


Disclaimer: Skip to the bottom if you have a deep understanding of this already and just care about my specific question.




In DirectX, you define a "root signature". It can have push constants, inlined descriptors binding points, or descriptor table binding points. It also defines static samples on the signature itself. A descriptor table is a contiguous block of descriptors within a descriptor heap. Binding a table involves specifying the first descriptor in the heap to the pipeline. Tables can hold either UAV/SRV/CBV descriptors or SAMPLER descriptors. You cannot share the two within a single heap--and therefore table. Descriptor tables are also organized into ranges, where each range defines one or more descriptors of a SINGLE type.


Root Signature Example:



Descriptor Heap indirection:




In Vulkan, you define a "pipeline layout". It can have push constants and "descriptor set" binding points. You cannot define inlined descriptors binding points. Each descriptor set defines a set of static samplers. A descriptor set is a first class object in Vulkan. It also has one or more ranges of a SINGLE descriptor type.




Descriptor Sets:




Now, an interesting pattern I'm seeing is that the two API's provide descriptor versioning functionality for completely different things. In DirectX, you can version descriptors implicitly within the command list using the root descriptor bindings. This allows you to do things like specify a custom offset for a constant buffer view. In Vulkan, they provide an explicit UNIFORM_DYNAMIC descriptor type that allows you to version an offset into the command list. See the image below:







Okay, so I'm really just looking for advice on how to organize binding points for an API that wraps these two models.


My current tentative approach is to provide an API for creating buffers and images, and then explicit UAV/SRV/CBV/RTV/DSV views into those objects. The resulting view is an opaque, typeless handle on the frontend that can map to descriptors on DirectX 12 or some staging resource in Vulkan for building descriptor sets.


I think I want to provide an explicit "ResourceSet" object that defines 1..N ranges of views similar to how both the descriptor set and descriptor table models work. I expect that I would make sampler binding a separate API that does its own thing for the two backends. I would really like to treat these ResourceSet objects similar to constant buffers, except that I'm just writing view handles into it.


I need to figure out how to handle versioning of updates to these descriptor sets. In the simplest case, I treat them as fully static. This maps well to both DX12 and Vulkan because I can simply allocate space in a descriptor heap or create a descriptor set, write the descriptors to it, and I'm done.


Handling dynamic updates becomes complicated for both API's and this is the crux of where I'm struggling right now.


Both APIs let me push constants, so that's not really a problem. However, DirectX allows you to version descriptors directly in the command list, but Vulkan allows you to dynamic offsets into buffers. It seems like this is chiefly for CBVs.


So it seems like if I want to do something like have a descriptor set with 3 CBV's, and then do dynamic offsets, I have to explicitly version the entire table in DirectX by allocating some new space in the heap and spilling descriptors to it.


On the other hand, since Vulkan doesn't really have the notion of root descriptors, I'd have to create multiple descriptorset objects and version those out if I want to bind a single dynamic UAV.


Either way, it seems like the preferred model is to build static descriptor sets but provide some fast path for constant buffers, and that's the direction I think I'm going to head in.


Anyway, does this sound like a sane approach? Have you guys find better ways to abstract these two binding models?


Side question: How do you version descriptor sets in vulkan? Do you just have to pool descriptor sets for the frame and spill when updates occur?



#5288813 D3D12 / Vulkan Synchronization Primitives

Posted by ZBethel on 26 April 2016 - 02:20 PM

Vulkan and DX12 have very similar API's, but the way they handle their synchronization primitives seem to differ in the fundamental design.


Now, both Vulkan and DirectX 12 have resource barriers, so I'm going to ignore those.


DirectX 12 uses fences with explicit values that are expected to monotonically increase. In the simplest case, you have the swap chain present barrier. I can see two ways to implement fencing in this case:


1) You create N fences. At the end of frame N you signal fence N and then wait on fence (N + 1) % SwapBufferCount.

2) You create 1 fence. At the end of each frame you increment the fence and save off the value. You then wait for the fence to reach the value for frame (N + 1) % SwapBufferCount.


In general, it seems like the "timestamp" approach to fencing is powerful. For instance, I can have a page allocator that retires pages with a fence value and then wait for the fence to reach that point before recycling the page. It seems like creating one fence per command list submission would be expensive (maybe not? how lightweight are fences?).


Now compare this with Vulkan.


Vulkan has the notion of fences, semaphores, and events. They are explained in detail here. All these primitives are binary, it is signaled once and stay signaled until you reset it. I'm less familiar with how to use these kinds of primitives, because you can't do the timestamp approach like you can with DX12 fences.


For instance, to do the page allocator in Vulkan, the fence is the correct primitive to use because it involves synchronizing the state of the hardware queue with the host (i.e. to know when a retired page can be recycled).


In order to do this, I now have to create 1 fence for each vkSubmit call, and the page allocator receives a fence handle instead of a timestamp.


It seems to me like the DirectX-style fence is more flexible, as I would imagine that internally the Vulkan fence is using the same underlying primitive as the DirectX fence to track signaling. In short, it seems like the DirectX timestamp-based fencing allows you to use less fence objects overall.


My primary concern is thinking about a common backend between Vulkan and DX12. It seems like the wiser course of action is to support the Vulkan style binary fences because they can be implemented with DX12 fences. My concern is whether I will lose performance due to creating 1 fence per ExecuteCommandLists call vs 1 overall in DirectX.


For those who understand the underlying hardware and API's deeper than me, I would appreciate some insight into these design decisions.



#5288486 Resource barrier pre and post states

Posted by ZBethel on 24 April 2016 - 01:22 PM

Heh, I believe I've just answered my own question through the MSDN docs...


At any given time, a subresource is in exactly one state (determined by a set of flags). The application must ensure that the states are matched when making a sequence of ResourceBarrier calls. In other words, the before and after states in consecutive calls toResourceBarrier must agree.


I also found this video, which answers how to handle state tracking with multiple threads:



#5272411 [D3D12] Command Allocator / Command List usage

Posted by ZBethel on 23 January 2016 - 06:04 PM

Hey all,


In the MSDN docs, they describe the dynamic between ID3D12CommandAllocator and ID3D12CommandList.


Immediately after being created, command lists are in the recording state. You can also re-use an existing command list by calling ID3D12GraphicsCommandList::Reset, which also leaves the command list in the recording state. Unlike ID3D12CommandAllocator::Reset, you can call Reset while the command list is still being executed. A typical pattern is to submit a command list and then immediately reset it to reuse the allocated memory for another command list. Note that only one command list associated with each command allocator may be in a recording state at one time.


I understand that a command allocator is the heap for which commands in a command list are allocated. My assumption is that this is a sort of growing heap with a water mark that will remain at a certain size to avoid further allocations (this must be the case, since you don't define a size for the thing at creation time).


If true, it makes sense that a command allocator is resident physical memory which the command list records into. In the samples, it appears as though one a command allocator is created for each command list. This makes sense; however, according to the docs, it appears that a command allocator can be used on any command list so long as only one of them is in the recording state at one time.


Now, the part that confuses me is that it's okay to reset the command list and reuse it immediately, but it's not okay to reset the command allocator until the command list is finished on the GPU.


Command List Reset: I would venture to guess this preserves the contents of the original commands within the command allocator and starts a fresh record?


Command Allocator Reset: It seems as though this is literally destroying the contents of the heap, which may have command list data active on the GPU.


My big question is this: How does the memory ownership work between the command allocator and command list? Is the allocator doing implicit double (or more?) buffering on a command list reset? What's the actual difference between ID3D12CommandList::Reset and ID3D12CommandAllocator::Reset



#5176052 Software rendering tutorials/examples

Posted by ZBethel on 25 August 2014 - 01:30 PM

I wrote a fully featured software rasterizer as part of my senior project at school.




It's not the most elegant codebase, but you might find it useful.


#5004964 What can software rasterizers be used for today?

Posted by ZBethel on 28 November 2012 - 08:09 AM

They have a paper somewhere that outlines some of the details, but I believe they do very simplistic rasterization to a 320x280 depth buffer (or something like that). It's heavily vectorized. If you own a core-I7 with AVX, I believe those can do 8-wide vector operations, which would speed up something like that heavily. I've considered writing an occlusion library that utilizes AVX and SSE instructions. I don't think anyone's really used AVX much yet in production (from my very limited viewpoint).

#5004957 What can software rasterizers be used for today?

Posted by ZBethel on 28 November 2012 - 07:55 AM

I wrote one. It's great fun! :)

There isn't much use for them nowadays, to be honest. However, I think that as CPUs become more vectorized and parallel, there's potential to see a comeback, but as a supplement to the GPU rather than a replacement. For instance, DICE utilizes software rasterization to aid in occlusion culling, because reading back data from the GPU takes a while.

#4973416 HDR + Tonemapping + Skylight = Fail?

Posted by ZBethel on 26 August 2012 - 12:39 AM

Which tonemapper are you using? I had a lot of trouble getting it to look right for my project, but the uncharted 2 filmic mapping method ended up looking great. The cause could either be the tone mapping algorithm itself, or just incorrect light values in your scene. From your screenshots, it looks like the tone mapper itself isn't set up right.

#4962368 Tile based Lighting Question

Posted by ZBethel on 23 July 2012 - 03:14 PM

Without seeing your code, I can't really help much, but I think your intuition about updating left->right/top->down is on the right track. Fortunately, since you can easily reproduce the bug, you should be able to quickly set a breakpoint and see exactly what's going on with the lighting at that moment. I'd encourage you to try that, because you might find that it's a simple off-by-one sort of issue. Or it could be an edge case in your algorithm that you didn't realize.

#4962320 Few question about DirectX 11

Posted by ZBethel on 23 July 2012 - 12:30 PM

1) That entirely depends on your primitive topology. http://msdn.microsof...4(v=vs.85).aspx
Assuming you're referring to an indexed triangle list, which is the most sensitive to vertex ordering (in terms of vertex cache performance), then the answer is no. As long as your indices are in the correct order, the vertex buffer can be in any order. That's why optimization is important. If you're sending in indices like (1, 1049, 2, 510, 16, 53), the vertex cache isn't going to do you much good.

2) Like I said above, the vertex cache only helps you if you're taking advantage of temporal locality with your indexing. More simply put, make sure your index buffer references specific indices as much as possible before moving on in the buffer. It doesn't matter what your object rendering order is. Although the GPU will cull triangles eventually, it's not until late in the pipeline. It's much better to use coarse-grained methods like frustum culling for entire objects.

3) According to the MSDN docs, using multiple streams is a bit slower, in the usual case, but it can be a huge advantage in certain situations. For example, if you're rendering a depth pass for a shadow map or for deferred rendering, you don't want to pass in the entire interleaved vertex to a shader that only cares about the position. Using multiple streams in this situation allows you to just bind the position stream and use that, saving tons of bandwidth.

4. I use PIX, which is ok. I haven't delved much into the vendor specific tools yet.

#4961820 AUTO_PTR issue.

Posted by ZBethel on 21 July 2012 - 07:28 PM

You could use std::shared_ptr instead?

#4961783 Rendering a space skybox

Posted by ZBethel on 21 July 2012 - 04:48 PM

You could try rendering the stars and other bodies as point sprites and just use the skybox for things like nebula haze. That's what EVE Online does and it looks beautiful. One clear benefit of this technique is that you can have your stars pulse and glow.

#4961763 Hovercraft Physics

Posted by ZBethel on 21 July 2012 - 02:42 PM

The term you're looking for is spring damping.

http://www.myphysicslab.com/spring1.html (more useful, IMO)

Basically you need to apply a friction force against the oscillating spring force that you're applying to the hovercraft.

#4960755 Register renderables every frame?

Posted by ZBethel on 18 July 2012 - 07:28 PM

Is adding a bunch of objects to a list every frame really going to cause performance issues? You don't need to allocate space for the array every frame, you could have an array that grows (like a variant of std::vector, or just use that). Besides, you could have one renderable for a set of instanced meshes if you're going to have tons of similar objects onscreen.

#4957428 Abstracting Draw Code: Am I on the right path?

Posted by ZBethel on 09 July 2012 - 03:55 PM

I guess it doesn't really matter how you implement the visual/logic aspects of your game, it's just a good idea to keep them separate. Since your example is a simple tic-tac-toe game, having a render function in the game engine that uses the data from your grid seems to me like a good solution. What you don't want is a render function on your grid class. That adds bloat to your class and defeats the purpose of an object oriented design.

In terms of larger scale projects, the big thing right now is entity/component systems. Basically, your entity is just a placeholder or id. Your components are the data, and you have separate "system" classes that perform logic on the component data. That logic can be anything: translating input into character movement, physics, rendering, etc. The components can communicate between each other, either by shared variables within the entity, or by just keeping references to each other (i.e. a render component needs a transform to use. The transform could be another component).

Take a look at Artemis framework for a good example of this: http://gamadu.com/artemis/

For something like tic-tac-toe, don't go overboard trying to engineer a good design. No solution is perfect. For what it's worth, I think you're on the right track for what you're trying to do.