SoldierOfLight

Members
  • Content count

    229
  • Joined

  • Last visited

Community Reputation

2183 Excellent

About SoldierOfLight

  • Rank
    Member

Personal Information

  • Interests
    Programming
  1. 3D Emulate FAST tripple buffering in Direct3d9

    The solution you're looking for is difficult to build. What you want, is that every VSync you decide which frame to scan out, based on what is the most recently completed frame. The way that DWM accomplishes this is that every VSync, they wake up, look at what's most recently completed, and then copy/compose it into another surface and schedule it to be scanned out on the next VSync. This adds an extra copy and an extra frame of latency. Trying to remove the extra frame of latency is possible if you wake up *before* the VSync instead of after, with enough time buffered to schedule the copy and have it complete right before the VSync. As it turns out, this is pretty difficult. Now that we've published some implementation details of Windows Mixed Reality via PresentMon, I can tell you that this is pretty much how it works, and it's very complicated. Trying to remove the copy is also very difficult, because now not only do you need to decide what to flip based on what's completed, but now you need to decide what to render to based on when previous rendering completed, which means that you can't get any CPU/GPU parallelism or frame queueing. If you just render to the resources in-order, eventually rendering will block because you'll be trying to render to the on-screen surface. Using a copy here prevents this. Note that I think NVIDIA does have an implementation of this, called Fast Sync, that they've implemented in hardware and their driver. I don't really have any technical details on how they made it work, but I have to imagine it's pretty complicated as well.
  2. Heh, that's a good point. It's essentially already got the swizzling baked into it so you can read it normally. Don't know why I didn't realize that...
  3. Assuming that you are actually using D3D12 like your tag implies, you can re-order the channels on SRV loads/samples using the Shader4ComponentMapping field of the SRV desc. See D3D12_SHADER_COMPONENT_MAPPING.
  4. DX11 Frame allocator of constant buffers

    For 3, like Hodgman said, if your event queries are working correctly, you shouldn't need to worry about this value. However with that said, the maximum frame latency is not 100% accurate, due to several factors. The drivers are able to override this frame latency, both explicitly as an override if an app never set anything, and implicitly by deferring the actual present operation until after the Present() API has returned. However, on new drivers and new OSes (Windows 10 Anniversary Update with WDDM2.1 drivers at least) using a FLIP_SEQUENTIAL or FLIP_DISCARD swap effect, the maximum frame latency should actually be accurate. For 4... maybe. At best, you're getting simpler allocation strategies from the drivers because you're allocating large buffers instead of small ones, and are (maybe) running less code to do it. At worst, you're actually doing pretty much the exact same approach the driver would if you were using MAP_WRITE_DISCARD.
  5. Yes, CopyDescriptors can do arbitrary scatter/gather. The only requirement is that the total number of descriptors specified in the source ranges and dest ranges must be equal. How they're divided among the ranges is unimportant. You can write into GPU-visible descriptor heaps at will, either via creates or copies, but cannot copy out of them.
  6. This most likely will not work. Memory aliasing of textures with preserved contents is not something that is widely supported. What kind of different format are you talking about? Is it just a channel ordering? You can use the SRV channel swizzling functionality to accomplish that. If it's a different data interpretation (i.e. UNORM vs UINT vs FLOAT) you can accomplish that using typeless resources.
  7. DX12 Resource synchronization

    The only guaranteed synchronization points in the graphics pipeline are that stream output and the output merger (RTV/DSV writes) are guaranteed to happen in sequence (per pixel for OM) from draw to draw. So writing to an RTV in draw 1, and then again in draw 2, those render target writes are guaranteed to happen in order. But within the pixel shader itself, there's no guarantee of ordering - unless you use ROVs instead of UAVs. To enforce ordering of UAVs from draw to draw, you can use a UAV barrier. This ensures ALL UAV writes from draw/dispatch 1 are done before draw/dispatch 2 can start, which is more heavyweight than ROVs.
  8. DESCRIPTORS_VOLATILE is the default behavior of root signature 1.0. It requires that drivers cannot read the descriptors at all until the GPU is executing. This means I can do something like: commandList->SetGraphicsRootDescriptorTable(foo); commandList->Close(); commandQueue->Wait(fence, 1); commandQueue->ExecuteCommandLists(&commandList); device->CopyDescriptors(foo, bar); fence->Signal(1); This would be valid, and the GPU would read the updated descriptors when it becomes unblocked. If you did that in root signature 1.1 without the DESCRIPTORS_VOLATILE flag, that would be invalid, because the descriptors changed from the point of recording the command in the command list, and the GPU executing those commands. It is a hint to the driver that they can read the descriptors on the CPU and potentially make optimizations off of that information - whether that means that they embed the entire contents of the descriptor in the command list or not is a driver implementation detail (and unless it's a buffer, I don't think anyone can do that today).
  9. You know that you can't overwrite the descriptors while they're in use by the GPU, right? There's no automatic driver versioning.
  10. As long as the descriptor is marked VOLATILE, then the guarantee is the hardware will read the contents of the descriptor at execution time. No need to re-apply the same descriptor handle to the command list.
  11. DX12 Split Barrier Question

    Yep, pretty much this. A full barrier is equivalent to a begin+end with nothing in between.
  12. Swap chain creation failed

    In D3D12, you can create a descriptor table containing 1,000,000 UAVs, and use data passed to the shader in order to determine which ones are being used. That, combined with the fact that they're GPU-side descriptors (which don't need to be initialized until GPU timeline execution of the command list) means that we can't just inspect which resources are "bound" to the pipeline via UAVs like we can with render targets, which only use CPU-side descriptors and only have 8 slots. We require the ability to track back buffer references in order to synchronize with the display so that VSync presents work correctly. So, as long as that VSync synchronization is implicit, it precludes UAV access to those resources.
  13. DX12 Split Barrier Question

    There's two things it does for you: 1. Think of the begin as inserting a signal at the end of the pipeline, and the end as inserting a wait at the beginning of the pipeline. If the signal has already passed by the time the wait is executing, then the wait is a no-op. If you don't use split barriers, then it requires completely (or at least partially) draining the GPU pipeline when you do a barrier that requires that kind of sync. 2. If there's actual work associated with the barrier (e.g. a decompression of some kind), then the driver is able to begin that work when the begin is issued, and wait for it when the end is issued. If you don't use split barriers, then there's no parallelism to be had between the work you know about and the work associated with the barrier.
  14. Swap chain creation failed

    Correct, BGRA isn't supported as a UAV. I'm pretty sure it's an oversight, but UAV access was only ever enabled for the RGBA channel ordering.
  15. Not that I know of, unless you want to also hook the DComp APIs and deduce the visual tree so you can determine their relative positions/sizes/Z-order.