• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

925 Good

About ZachBethel

  • Rank

Personal Information

  • Interests
  1. DX12 Implicit State Promotion

    I figured I would give an update in case somebody else has similar confusion. I was able to convert most of my resource transition logic to utilize state promotion and decay. There are some caveats that are mentioned in the docs but the implications weren't quite clear to me. Buffers and read-only images always decay to the common state after an ExecuteCommandLists call. This means you effectively don't need to track them across queues, which is really nice. Likewise, the first access of the above resource types is implicit and does not require a transition. This means if you are using a copy queue, you don't need to do any barriers at all. However, any subsequent transitions from that first usage within an ExecuteCommandLists scope requires a transition from the implicitly used state to the new state. For example, if I first use my buffer as a copy destination, copy to it, and then attempt to use it as a vertex buffer within the same command list batch, I have to transition it first from CopyDest -> VertexAndConstantBuffer. That part wasn't clear to me initially. Any image that is written to by the GPU, either by UAV / RTV / DSV access, does not participate in state promotion / decay, unless it has the D3D12_RESOURCE_FLAG_SIMULTANEOUS_ACCESS bit. This makes sense, since the image is likely compressed and must now be meticulously tracked. Anyway, like I said, this is all dictated in the docs, but some of the bits weren't clear to me at first.
  2. DX12 Implicit State Promotion

    Would you guys happen to know whether something like this is available on Vulkan as well? I know the pipeline barrier model is a bit different. I'm trying to determine whether I can forgo some of the barrier hand-off logic for Graphics -> Copy -> Graphics scenarios. My concern would be whether I need to explicitly transition to a copy dest layout.
  3. Hey all, I'm trying to understand implicit state promotion for directx 12 as well as its intended use case. https://msdn.microsoft.com/en-us/library/windows/desktop/dn899226(v=vs.85).aspx#implicit_state_transitions I'm attempting to utilize copy queues and finding that there's a lot of book-keeping I need to do to first "pre-transition" from my Graphics / Compute Read-Only state (P-SRV | NP-SRV) to Common, Common to Copy Dest, perform the copy on the copy command list, transition back to common, and then find another graphics command list to do the final Common -> (P-SRV | NP-SRV) again. With state promotion, it would seem that I can 'nix the Common -> Copy Dest, Copy Dest -> Common bits on the copy queue easily enough, but I'm curious whether I could just keep all of my "read-only" buffers and images in the common state and effectively not perform any barriers at all. This seems to be encouraged by the docs, but I'm not sure I fully understand the implications. Does this sound right? Thanks.
  4. That's what I thought. The solution I went with is to keep a map of image descriptor hash to resource allocation info. It cut down on the cost by 3x. Thanks!
  5. Hey, I'm working on a placed resource system, and I need a way to determine the size and alignement of image resources before placing them on the heap. This is used for transient resources within a frame. The appropriate method on ID3D12Device is GetResourceAllocationInfo. Unfortunately, this method is quite slow and eats up a pretty significant chunk of time. Way more than I would expect for just returning a size and alignment (I'm using a single D3D12_RESOURCE_DESC) each time. Is there a way I can conservatively estimate this value for certain texture resources (i.e. ones without mip chains or something)? Thanks.
  6. Yeah, I was mistaken. Ibelieve I was confused by the fact that most hardware typically has a single hardware graphics queue. At any rate, the issue is that I was querying GetCompletedValue on my fences at a time when I thought all previous work would have completed. This was not the case.
  7. Hey all, I'm trying to debug some async compute synchronization issues. I've found that if I force all command lists to run through a single ID3D12CommandQueue instance, everything is fine. However, if I create two DIRECT queue instances, and feed my "compute" work into the second direct queue, I start seeing the issues again. I'm not fencing between the two queues at all because they are both direct. According to the docs, it seems as though command lists should serialize properly between the two instances of the direct queue because they are of the same queue class. Another note is that I am feeding command lists to the queues on an async thread, but it's the same thread for both queues, so the work should be serialized properly. Anything obvious I might be missing here? Thanks!
  8. I'm reading through the Microsoft docs trying to understand how to properly utilize aliasing barriers to alias resources properly. "Applications must activate a resource with an aliasing barrier on a command list, by passing the resource in D3D12_RESOURCE_ALIASING_BARRIER::pResourceAfter. pResourceBefore can be left NULL during an activation. All resources that share physical memory with the activated resource now become inactive or somewhat inactive, which includes overlapping placed and reserved resources." If I understand correctly, it's not necessary to actually provide the pResourceBefore* for each overlapping resource, as the driver will iterate the pages and invalidate resources for you. This is the Simple Model. The Advanced Model is different: Advanced Model The active/ inactive abstraction can be ignored and the following lower-level rules must be honored, instead: An aliasing barrier must be between two different GPU resource accesses of the same physical memory, as long as those accesses are within the same ExecuteCommandLists call. The first rendering operation to certain types of aliased resource must still be an initialization, just like the Simple Model. I'm confused because it looks like, in the Advanced Model, I'm expected to declare pResourceBefore* for every resource which overlaps pResourceAfter* (so I'd have to submit N aliasing barriers). Is the idea here that the driver can either do it for you (null pResourceBefore) or you can do it yourself? (specify every overlapping resource instead)? That seems like the tradeoff here. It would be nice if I can just "activate" resources with AliasingBarrier (NULL, activatingResource) and not worry about tracking deactivations. Am I understanding the docs correctly? Thanks.
  9. A retrospective on the Infinity project

    Please tell me you're not going to be the only engineer on this. That just isn't working out for you, as brilliant as you are. ;)
  10. Is it valid behavior to map a region of a read back resource while simultaneously writing to a disjoint region via the GPU? I've got a profiler subsystem with a single read back buffer that is N times the size of my query heap for N frames. The debug SDK layer gives a warning that the subresource is mapped while writing from the GPU.
  11. [D3D12] Debug validation weirdness

    It turns out it's the UAV barrier that's barking at me, and it only seems to happen if I do a UAV barrier on a command list without first transitioning the resource to a UAV in that command list, which seems wrong.
  12. I've got a scenario where I am building a command list that involves using UAVs. The UAV is transitioned to the Unordered Access state in a prior command list, like so:   Command List A: Transition NonPixelShaderResource -> UnorderedAccess   Command List B: UAV barrier ClearUnorderedAccessViewUint Dispatch more UAV barriers   Direct Queue: (A, B)   When I try and queue a UAV barrier on the later command list, I get this error spewing:   D3D12 ERROR: ID3D12CommandList::ClearUnorderedAccessViewUint: Resource state (0x0) of resource (0x00000242CA4635A0:'Histogram') (subresource: 0) is invalid for use as a unordered access view.  Expected State Bits: 0x8, Actual State: 0x0, Missing State: 0x8. [ EXECUTION ERROR #538: INVALID_SUBRESOURCE_STATE] D3D12 ERROR: ID3D12GraphicsCommandList::ResourceBarrier: Before state (0x8) of resource (0x00000242CA4635A0:'Histogram') (subresource: 0) specified transition barrier does not match with the state (0x0) specified in the previous call to ResourceBarrier [ RESOURCE_MANIPULATION ERROR #527: RESOURCE_BARRIER_BEFORE_AFTER_MISMATCH]   Is the debug layer just over validating? Or is there actually an issue here? For one, the error doesn't really make sense, if I remove the UAV barrier call the errors stop, but my resource is definitely not in the common state (0x0). I get this error even when I create the resource in the UnorderedAccess state.     Besides, how can the debug layer know I haven't transitioned the resource properly before I call ExecuteCommandLists? A prior command list could do the transition.   Has anyone encountered this issue before?
  13. A thing I'm struggling with right now is how to handle mapping of resources across multiple command lists.   i.e. // Thread 1 ConstData cd; cd.data = 1; ptr = consBuffer.Map(); memcpy(ptr, cd); constBuffer.Unmap(); CL1->draw(obj1); cd.data = 2; ptr = consBuffer.Map(); memcpy(ptr, cd); constBuffer.Unmap(); CL1->draw(obj2); // thread 2 CL2->draw(obj3) // use const buffer being written in thread 1 // submission thread: CL1 -> CL2 One approach I've seen is to cache changes to the resource in a command list-local cache, and then update the contents of the buffers when the command lists are serialized to the submission thread.
  14. [D3D12] Command list submission ordering

    Great, that's what I expected. I feel like a lot of documents assume you know that and gloss over it.   On that note, when building a task graph, it seems like it's wise to statically bake out your high level render passes on your queue submission thread, batch up all the command lists in those passes (e.g. wait until all your Z-prepass lists come in, for instance), and then submit in dependency sorted order (wait to submit g-buffer until z-prepass group has been submitted).
  15. When you submit command lists to a command queue, what ordering guarantees / expectations do you have?   According to MSDN:   GPU work submission To execute work on the GPU, an app must explicitly submit a command list to a command queue associated with the Direct3D device. A direct command list can be submitted for execution multiple times, but the app is responsible for ensuring that the direct command list has finished executing on the GPU before submitting it again. Bundles have no concurrent-use restrictions and can be executed multiple times in multiple command lists, but bundles cannot be directly submitted to a command queue for execution. Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order. That last sentence is where I'm confused. Is it the case that if I build N command lists and call ExecuteCommandLists(...) with an array of those N command lists, that they are processed in order? That much seems to be true. The fuzzier part for me is how transition barriers and fences play into the submission order. Say I have a Z-Prepass and a shadow pass, and then some G-Buffer pass. Assuming I transition barrier everything correctly, am I expected to submit my Z-Prepass / Shadow pass command lists before the g-buffer command lists? That would basically mean I have to schedule my submission thread to wait for all the precursor work to come in from the job system before it can submit. This is what I am expecting that I have to do, but it's pretty unclear to me. It doesn't help that none of the samples online actually do a job-system based multithreaded demo :) I would love an elaboration on how the driver actually schedules the command list work. Thanks!
  • Advertisement