Jump to content
  • Advertisement
pcmaster

DX12 DX12 Occlusion Queries

Recommended Posts

Hi!

I wonder if I can achieve the same (not quite optimal) CPU readback of occlusion queries as with DX11.

u64 result = 0;
HRESULT hr = deviceCtx11->GetData(id3d11Query, result, sizeof(u64), D3D11_ASYNC_GETDATA_DONOTFLUSH);
if (S_OK == hr) return "ready"; else "not ready";

This happens on the CPU. I'm able to see if it's ready or not and do other stuff it isn't.

In DX12, ResolveQueryData obviously happens on the GPU. If I put a fence after ResolveQueryData, I can be sure it copied the results into my buffer. However I wonder, if there's any other way then inserting fences after each EndQuery to see if the individual queries already finished. It sounds bad and I guess the fence might do some flushing.

I first want to implement what other platforms in our engine do, before changing all of them to some more sensible batched occlusion query querying model.

Thanks for any remarks.

Share this post


Link to post
Share on other sites
Advertisement
15 minutes ago, pcmaster said:

It sounds bad and I guess the fence might do some flushing.

The flushing in the D11 flag refers to submitting previously made draw calls to the GPU (the equivalent of finishing the immediate context and calling ID3D12CommandQueue::ExecuteCommandLists). The no-flush flag means "don't call ExecuteCommandLists" before checking the query results. 

Though, yes, I wouldn't be surprised if fences caused some kind of GPU cache flushing... But this would generally be a requirement for the GPU to be completely sure that data has reached RAM before it tells the CPU that the data is ready. 

Share this post


Link to post
Share on other sites

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Share this post


Link to post
Share on other sites
5 hours ago, pcmaster said:

So the expected CPU-readback approach on PC should be inserting a fence after ResolveQueryData and waiting on it on CPU.

Btw, Hodgman, just out of curiosity, do you know by any chance on GCN, if already at the bottom-of-pipe it writes the query results for each of the 4/8 DBs, based on counters, into the backing memory? Or are some caches (DB?) involved?

Yeah. Or you could just fence N times per frame, and check the fence that proceeds the query that you're checking. Could even just fence once per frame and accept a full frame of query latency. 

Sorry I'm not too experienced with queries so don't know any low details,  because in my book they're a horrible hack for visibility culling (getting results to a problem long after you were required to have answers always rubbed me the wrong way). 

Share this post


Link to post
Share on other sites

One last thought. By reading back the query results on CPU, I decide not to issue the draws already on CPU. Therefore I save the CPU time needed to prepare the constant buffers, descriptor tables, set other states, etc.. With GPU predication, I'd still have to prepare each draw, possibly in vain.

This is all only valid for a "traditional" renderer without fancy on-GPU command list building.

Edited by pcmaster

Share this post


Link to post
Share on other sites

Thank you for the article. It's very interesting, however in the engine (and rather types of games) I'm implementing DX12 into, we don't happen to be instancing that very much and that approach doesn't lower the CPU cost - the higher level still has to prepare the data for each draw, which isn't negligible. But the approach sounds very good for many applications.

Edited by pcmaster

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
  • Advertisement
  • Popular Tags

  • Similar Content

    • By Roman R.
      I have a problem synchronizing data between shared resources. On the input I receive a D3D11 2D texture, which itself is enabled for sharing and has D3D11_RESOURCE_MISC_SHARED_NTHANDLE in its description.
      Having a D3D12 device created on the same adapter I open a resource through sharing a handle.
                  const CComQIPtr<IDXGIResource1> pDxgiResource1 = pTexture; // <<--- Input texture on 11 device
                  HANDLE hTexture;
                  pDxgiResource1->CreateSharedHandle(NULL, GENERIC_ALL, NULL, &hTexture);
                  CComPtr<ID3D12Resource> pResource; // <<--- Counterparty resource on 12 device
                  pDevice->OpenSharedHandle(hTexture, __uuidof(ID3D12Resource), (VOID**) &pResource);
      I tend to keep the mapping between the 11 texture and 12 resource further as they are re-filled with data, but in context of the current problem it does not matter if I reuse the mapping or I do OpenSharedHandle on every iteration.
      Further on I have a command list on 12 device where I use 12 resource (pResource) as a copy source. It is an argument in further CopyResource or CopyTextureRegion calls. I don't have any resource barriers in the command list (including that my attempts to use any don't change the behavior).
      My problem is that I can't have the data synchronized. Sometimes and especially initially the resource has the correct data, however further iterations have issues such as resource having stale/old data.
      I tried to flush immediate context on 11 device to make sure that preceding commands are completed.
      I tried to insert resource barriers at the beginning of command list to possibly make sure that source resource has time to receive the correct data.
      Same time I have other code paths which don't do OpenSharedHandle mapping and instead do additional texture copying and mapping between original 11 device and 11on12 device, and the code including the rest of the logic works well there. This makes me think that I fail to synchronize the data on the step I mentioned above, even though I am lost how do I synchronize exactly outside of command list.
      I originally thought that 12 resource has a IDXGIKeyedMutex implementation which is the case with sharing-enabled 11 textures, but I don't have the IDXGIKeyedMutex and I don't see what is the D3D12 equivalent, if any.
      Could you please advise where to look at to fix the sync?
    • By NikiTo
      Recently I read that the APIs are faking some behaviors, giving to the user false impressions.
      I assume Shader Model 6 issues the wave instructions to the hardware for real, not faking them.

      Is Shader Model 6, mature enough? Can I expect the same level of optimization form Model 6 as from Model 5? Should I expect more bugs from 6 than 5?
      Would the extensions of the manufacturer provide better overall code than the Model 6, because, let say, they know their own hardware better?

      What would you prefer to use for your project- Shader Model 6 or GCN Shader Extensions for DirectX?

      Which of them is easier to set up and use in Visual Studio(practically)?
    • By mark_braga
      I am trying to get the DirectX Control Panel to let me do something like changing the break severity but everything is greyed out.
      Is there any way I can make the DirectX Control Panel work?
      Here is a screenshot of the control panel.
       

    • By Keith P Parsons
      I seem to remember seeing a version of directx 11 sdk that was implemented in directx12 on the microsoft website but I can't seem to find it anymore. Does any one else remember ever seeing this project or was it some kind off fever dream I had? It would be a nice tool for slowly porting my massive amount of directx 11 code to 12 overtime.
    • By NikiTo
      In the shader code, I need to determine to which AppendStructuredBuffers the data should append. And the AppendStructuredBuffers are more than 30.
      Is declaring 30+ AppendStructuredBuffers going to overkill the shader? Buffers descriptors should consume SGPRs.

      Some other way to distribute the output over multiple AppendStructuredBuffers?

      Is emulating the push/pop functionality with one single byte address buffer worth it? Wouldn't it be much slower than using AppendStructuredBuffer?
    • By Sobe118
      I am rendering a large number of objects for a simulation. Each object has instance data and the size of the instance data * number of objects is greater than 4GB. 
      CreateCommittedResource is giving me: E_OUTOFMEMORY Ran out of memory. 
      My PC has 128GB (only 8% ish used prior to testing this), I am running the DirectX app as x64. <Creating a CPU sided resource so GPU ram doesn't matter here, but using Titan X cards if that's a question>
      Simplified code test that recreates the issue (inserted the code into Microsofts D3D12HelloWorld): 
      unsigned long long int siz = pow(2, 32) + 1024; D3D12_FEATURE_DATA_D3D12_OPTIONS options; //MaxGPUVirtualAddressBitsPerResource = 40 m_device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS, &options, sizeof(options)); HRESULT oops = m_device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(siz), D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&m_vertexBuffer)); if (oops != S_OK) { printf("Uh Oh"); } I tried enabling "above 4G" in the bios, which didn't do anything. I also tested using malloc to allocate a > 4G array, that worked in the app without issue. 
      Are there more options or build setup that needs to be done? (Using Visual Studio 2015)
      *Other approaches to solving this are welcome too. I thought about splitting up the set of items to render into a couple of sets with a size < 4G each but would rather have one set of objects. 
      Thank you.
    • By _void_
      Hey guys!
      I am not sure how to specify array slice for GatherRed function on Texture2DArray in HLSL.
      According to MSDN, "location" is one float value. Is it a 3-component float with 3rd component for array slice?
      Thanks!
    • By lubbe75
      I have a winforms project that uses SharpDX (DirectX 12). The SharpDX library provides a RenderForm (based on a System.Windows.Forms.Form). 
      Now I need to convert the project to WPF instead. What is the best way to do this?
      I have seen someone pointing to a library, SharpDX.WPF at Codeplex, but according to their info it only provides support up to DX11.
      (Sorry if this has been asked before. The search function seems to be down at the moment)
    • By korben_4_leeloo
      Hi.
      I wanted to experiment D3D12 development and decided to run some tutorials: Microsoft DirectX-Graphics-Samples, Braynzar Soft, 3dgep...Whatever sample I run, I've got the same crash.
      All the initialization process is going well, no error, return codes ok, but as soon as the Present method is invoked on the swap chain, I'm encountering a crash with the following call stack:
      https://drive.google.com/open?id=10pdbqYEeRTZA5E6Jm7U5Dobpn-KE9uOg
      The crash is an access violation to a null pointer ( with an offset of 0x80 )
      I'm working on a notebook, a toshiba Qosmio x870 with two gpu's: an integrated Intel HD 4000 and a dedicated NVIDIA GTX 670M ( Fermi based ). The HD 4000 is DX11 only and as far as I understand the GTX 670M is DX12 with a feature level 11_0. 
      I checked that the good adapter was chosen by the sample, and when the D3D12 device is asked in the sample with a 11_0 FL, it is created with no problem. Same for all the required interfaces ( swap chain, command queue...).
      I tried a lot of things to solve the problem or get some info, like forcing the notebook to always use the NVIDIA gpu, disabling the debug layer, asking for a different feature level ( by the way 11_0 is the only one that allows me to create the device, any other FL will fail at device creation )...
      I have the latest NVIDIA drivers ( 391.35 ), the latest Windows 10 sdk ( 10.0.17134.0 ) and I'm working under 
      Visual Studio 2017 Community.
      Thanks to anybody who can help me find the problem...
    • By _void_
      Hi guys!
      In a lot of samples found in the internet, people when initialize D3D12_SHADER_RESOURCE_VIEW_DESC with resource array size 1 would normallay set its dimension as Texture2D. If the array size is greater than 1, then they would use dimension as Texture2DArray, for an example.
      If I declare in the shader SRV as Texture2DArray but create SRV as Texture2D (array has only 1 texture) following the same principle as above, would this be OK? I guess, this should work as long as I am using array index 0 to access my texture?
      Thanks!
  • Advertisement
  • Popular Now

  • Forum Statistics

    • Total Topics
      631396
    • Total Posts
      2999783
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!