Sign in to follow this  
ConvexRumbler

D3D11: Forcing update to VRAM and/or checking if it happened

Recommended Posts

Hey,

I am working on volumetric video player renderer. It uploads ~100MB of vertex buffers and textures couple times a second and I would like to make sure, that the uploads are completely async (using GPUs copy engine) and I would like to know if there is any way to force upload to VRAM on d3d11 buffer unmap and later on to know if the transfer is finished. 

Current strategy in the engine is to map memcpy unmap dynamic buffer (ping pong strategy) and copyResource to default buffer, which happens in the same frame when the data are requested. I want to change it, but would like to be smart about it.

Problem is that I dont see any way to tell driver "hey I will be using this" like D3D9 Preload function and for driver to tell me "your data are ready fine sir". Im basically looking on how to do the same thing as described in this talk, but for D3D11:

http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-GTC2012-Texture-Transfers.pdf

I was analyzing my data transfers with nsight and GPUView and tried these things without much success:

  • Deferred context: Which is just a way to record command buffer asynchronously and replay it on main context later
  • Creating separate D3D11 upload device: I wasnt able to make any uploads (or at least verify it) with this approach

Seem like that driver knows when the upload was finished according to this: 

https://msdn.microsoft.com/en-us/windows/hardware/drivers/display/device-paging-queues

I'll be happy for any advice or articles to study on this. 

Edited by ConvexRumbler

Share this post


Link to post
Share on other sites

You can issue a draw call on it and then wait on a fence, but I'm curious if anyone else knows a more elegant technique. Do keep in mind that the actual transfer can be delayed by a few frames, so you don't want to wait for something you just requested to upload.

Edited by Promit

Share this post


Link to post
Share on other sites
D3D12/Vulkan have what you need... :(
Earlier APIs put you at the mercy of the driver.

Another option is to transfer a smaller amount of data once per frame, and force it to complete by referencing it with a draw/dispatch... And scale that amount up/down by guesswork on the frametime impact :(

Share this post


Link to post
Share on other sites

This being D3D11 you would issue a query object created with D3D11_QUERY_EVENT after the unmap, then test it with GetData which is documented to return S_OK with TRUE in the pData parameter when the GPU has finished processing commands.  See https://msdn.microsoft.com/en-us/library/windows/desktop/ff476191%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396

If the GPU has not finished then GetData will not return S_OK/TRUE and you can decide what, if any, useful work you can do while continuing to wait.

Share this post


Link to post
Share on other sites

You can issue a draw call on it and then wait on a fence, but I'm curious if anyone else knows a more elegant technique. Do keep in mind that the actual transfer can be delayed by a few frames, so you don't want to wait for something you just requested to upload.

Are you suggesting this for the solution with separate d3d11 device for upload? Im sure that would work, but sounds like a lot of work for GPU. I think in order to force driver to upload the resource I would have to reference them in shader, otherwise they would get compiled out, even the simpliest shader with disabled RGBA writes would be probably costly, just for running the pipeline. But good worst case idea :)

D3D12/Vulkan have what you need... :(

I know. Just wrote gnmx (PS4) renderer and it was easy to do stuff like that there. I miss that control in D3D11 :( . DX12/Vulkan is planned for this renderer, but would like to speed up what we have. It is possible for Ogl with extensions (article I have linked), so I was really hoping for someone to show up with something that I was missing.

 

 

Another option is to transfer a smaller amount of data once per frame, and force it to complete by referencing it with a draw/dispatch... And scale that amount up/down by guesswork on the frametime impact :(

Sounds reasonable, but expensive to force extra drawcals. I think what you are suggesting is going to have to be adaptive, since different GPUs and because this renderer is a plugin (unity so far), the situation would be constantly changing as the host program would have to upload its resources too.

Share this post


Link to post
Share on other sites

So I've created separate D3D11 device on separate thread and run in along with my rendering. It just maps and unmaps dynamic buffer and does copyResource to the Default buffer. That is all. All buffer are created on "upload" device and commands are run on the "upload" deviceContext.

This is my GPUView capture, my program is p45.exe. The selected command (yellow) command from my upload device. As you can see these commands are put onto the 3D queue, stalling my render devices GPU commands. I would expect the upload device calls (which are only mapping and copyresource) to run on one of the two copy queues. 

I have GTX 980 TI, which is Maxwell architecture and have 2 copy engines so it should be capable to put these uploads outside 3D queue right?

Any idea on what I am missing? I'm running out of ideas on what to try. Thanks!

gpuview.PNG

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this