• 15
• 15
• 11
• 9
• 10
• ### Similar Content

• Hey all,
I'm looking into building a streaming system for mipped images. I'm referencing the DirectX sample for memory management here: https://github.com/Microsoft/DirectX-Graphics-Samples/tree/master/TechniqueDemos/D3D12MemoryManagement
I have a couple related questions to this.
I'm leaning towards also utilizing tiled resources for mips, mainly because it allows me to avoid invalidating my pre-cooked descriptor tables any time an image updates, since I would effectively have to create a new ID3D12Resource with more / fewer mip levels when a stream / trim event occurs, respectively. Has anyone had success using tiled resources or noticed any potential performance impact related to having the page table indirection?
Also, I noticed that tiled resource tier 1 doesn't support mip clamping. Are there workarounds (in the shader, for example), or limiting the mip level in cases where we don't have a mip resident? Or am I required to create a new view mapped to the resident subset. This would also require that I rebake my descriptor tables, which I would like to avoid.
My second question is how to handle the actual updates. I would like to utilize a copy queue to stream contents up to the GPU. I have a couple approaches here:
Create a device-local staging image and run my async copy job to upload to it. This happens in parallel with the current frame using the existing image. At the beginning of the next frame (on the graphics queue) I blit from the staging memory to the newly resident mip, and then use the full mip chain for rendering. Utilize sub-resource transitions to put part of the image into an SRV state and the other part into a Copy Destination state. The async copy queue uploads to the more-detailed mip levels while the current frame renders using the SRV subresources. This approach seems a bit more complicated due to having to manage sub-resource transitions, but it avoids a copy in the process. My question here is whether I need to specify the D3D12_RESOURCE_FLAG_ALLOW_SIMULTANEOUS_ACCESS bit on my resource, even though the transitions and accesses are occurring between different sub-resources. If so, do you know what kind of performance repercussions I could expect from this? Would I still be able to store my images in BCn formats, for example? Thanks much,
Zach.
• By Krypt0n
Finally the ray tracing geekyness starts:
https://blogs.msdn.microsoft.com/directx/2018/03/19/announcing-microsoft-directx-raytracing/

https://www.remedygames.com/experiments-with-directx-raytracing-in-remedys-northlight-engine/
• By lubbe75
What is the best practice when you want to draw a surface (for instance a triangle strip) with a uniform color?
At the moment I send vertices to the shader, where each vertice has both position and color information. Since all vertices for that triangle strip have the same color I thought I could reduce memory use by sending the color separate somehow. A vertex could then be represented by three floats instead of seven (xyz instead of xys + rgba).
Does it make sense? What's the best practice?

• Hey all,
I'm trying to understand implicit state promotion for directx 12 as well as its intended use case. https://msdn.microsoft.com/en-us/library/windows/desktop/dn899226(v=vs.85).aspx#implicit_state_transitions
I'm attempting to utilize copy queues and finding that there's a lot of book-keeping I need to do to first "pre-transition" from my Graphics / Compute Read-Only state (P-SRV | NP-SRV) to Common, Common to Copy Dest, perform the copy on the copy command list, transition back to common, and then find another graphics command list to do the final Common -> (P-SRV | NP-SRV) again.
With state promotion, it would seem that I can 'nix the Common -> Copy Dest, Copy Dest -> Common bits on the copy queue easily enough, but I'm curious whether I could just keep all of my "read-only" buffers and images in the common state and effectively not perform any barriers at all.
This seems to be encouraged by the docs, but I'm not sure I fully understand the implications. Does this sound right?
Thanks.
• By NikiTo
I need to share heap between RTV and Stencil. I need to render to a texture and without copying it(only changing the barriers, etc) to be able to use that texture as stencil. without copying nothing around. But the creating of the placed resource fails. I think it could be because of the D3D12_RESOURCE_DESC has 8_UINT format, but D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL enabled too, and MSDN says Stencil does not support that format. Is the format the problem? And if the format is the problem, what format I have to use?

For the texture of that resource I have the flags like: "D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET | D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL" and it fails, but when I remove the allow-stencil flag, it works.

# DX12 DX12 Occlusion Queries

## Recommended Posts

Hi!

I wonder if I can achieve the same (not quite optimal) CPU readback of occlusion queries as with DX11.

u64 result = 0;
HRESULT hr = deviceCtx11->GetData(id3d11Query, result, sizeof(u64), D3D11_ASYNC_GETDATA_DONOTFLUSH);
if (S_OK == hr) return "ready"; else "not ready";

This happens on the CPU. I'm able to see if it's ready or not and do other stuff it isn't.

In DX12, ResolveQueryData obviously happens on the GPU. If I put a fence after ResolveQueryData, I can be sure it copied the results into my buffer. However I wonder, if there's any other way then inserting fences after each EndQuery to see if the individual queries already finished. It sounds bad and I guess the fence might do some flushing.

I first want to implement what other platforms in our engine do, before changing all of them to some more sensible batched occlusion query querying model.

Thanks for any remarks.

##### Share on other sites
15 minutes ago, pcmaster said:

It sounds bad and I guess the fence might do some flushing.

The flushing in the D11 flag refers to submitting previously made draw calls to the GPU (the equivalent of finishing the immediate context and calling ID3D12CommandQueue::ExecuteCommandLists). The no-flush flag means "don't call ExecuteCommandLists" before checking the query results.

Though, yes, I wouldn't be surprised if fences caused some kind of GPU cache flushing... But this would generally be a requirement for the GPU to be completely sure that data has reached RAM before it tells the CPU that the data is ready.

##### Share on other sites

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

##### Share on other sites

Out of curiosity have you considered conditional rendering? (predication)

##### Share on other sites

Sure but the time budget doesn't allow right now

##### Share on other sites
5 hours ago, pcmaster said:

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Yeah. Or you could just fence N times per frame, and check the fence that proceeds the query that you're checking. Could even just fence once per frame and accept a full frame of query latency.

Sorry I'm not too experienced with queries so don't know any low details,  because in my book they're a horrible hack for visibility culling (getting results to a problem long after you were required to have answers always rubbed me the wrong way).

##### Share on other sites

I agree it's a horrible solution.

##### Share on other sites

One last thought. By reading back the query results on CPU, I decide not to issue the draws already on CPU. Therefore I save the CPU time needed to prepare the constant buffers, descriptor tables, set other states, etc.. With GPU predication, I'd still have to prepare each draw, possibly in vain.

This is all only valid for a "traditional" renderer without fancy on-GPU command list building.

Edited by pcmaster

##### Share on other sites

I had recently come across an article on retrofitting a dx11 renderer with GPU based occlusion culling.  Maybe you'll find it useful.

##### Share on other sites

Thank you for the article. It's very interesting, however in the engine (and rather types of games) I'm implementing DX12 into, we don't happen to be instancing that very much and that approach doesn't lower the CPU cost - the higher level still has to prepare the data for each draw, which isn't negligible. But the approach sounds very good for many applications.

Edited by pcmaster