DX11 Update textures every frame

Graphics and GPU Programming Programming DX11

Started by hiya83 April 12, 2016 03:23 AM

15 comments, last by aWG 8 years ago

13,459

April 13, 2016 06:30 AM

Have you tried UpdateSubresource from a CPU memory pointer? In certain very specific circumstances I've found this efficient, despite the dire warnings about it in the documentation & elsewhere, because it will manage resource contention automatically for you, which is where I suspect your primary bottleneck is.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

SoldierOfLight

2,378

April 13, 2016 06:37 AM

Have you tried uploading less data? Depending on what your data looks like, you could compute dirty regions on the CPU and only upload that data (potentially via UpdateSubresource as called out above). Is your data really changing all over the place, non-uniformly, every frame?

hiya83

130

Author

April 14, 2016 12:59 AM

I'll be darned. UpdateSubResource is actually faster; low 3ms instead of high 3ms. Not ideal yet, but it's better. Thanks for the tip! :D

aWG

151

April 14, 2016 01:19 AM

I've also been looking into this for days. My use case is slightly different: I'm writing a video application and an external source is decoding the video, leaving me with a 4K RGBA texture. I need to display this texture in my 3D App (it's Unity, but I'm writing a native plug-in which means I'm using DX11).

I'm always getting hitches, no matter what I do. The worst case is an Intel HD 4600 which can take up to 25ms just to upload a 1080p texture. As Ashaman73 has mentioned, bus bandwidth is probably playing a large role in this.

I'm using the normally advocated method of using a DYNAMIC texture, writing to that, then CopyResource over into the real texture. Here's an article where someone has gone through all of the scenarios and benchmarked them: https://eatplayhate.me/2013/09/29/d3d11-texture-update-costs/.

My problem is that even the memcpy() of a 1080p RGBA texture into Map()'d memory takes a really long time (5+ms), so when I get up to 4K it's substantial. What I could really use, I think, is a way to begin this copy process asynchronously. Right now the copy blocks the GPU thread (since you must Map()/Unmap() on GPU thread, I'm also generally doing my memcpy there).

I've read this may be possible in OpenGL with some kind of PixelBufferObject? Is there anything like this in DirectX? I haven't tried reverting my code to UpdateSubResource for this case, but are there any other suggestions?

Ashaman73

13,718

April 14, 2016 05:58 AM

My problem is that even the memcpy() of a 1080p RGBA texture into Map()'d memory takes a really long time (5+ms), so when I get up to 4K it's substantial. What I could really use, I think, is a way to begin this copy process asynchronously. Right now the copy blocks the GPU thread (since you must Map()/Unmap() on GPU thread, I'm also generally doing my memcpy there).

To be honest, I am more familiar with OGL, so some DX11 expert should have better tips.

For one, once the memory is mapped, you can access it from any other thread, just avoid calling API functions from multiple threads. The basic setup for memory to buffer copy could be:

GPU thread: map buffer A
Worker thread: decode video frame into buffer A
GPU thread: when decoded, unmap buffer A

This will most likely trigger an asynchronously upload from CPU to GPU memory, or might do nothing if the DX11 decides to keep the texture in CPU memory for now (shared mem on HD4600 ?).

The next issue will be, when accessing the buffer. If you access it too early, e.g. by copying the buffer content to the target texture, then the asynchronously upload will be suddently result in synchronosouly stalling your rendering pipeline. So I would test out to use multple buffers, 3 at least. This kind of delay should be not critical for displaying a video.

An other option would be to look for a codex which can be decoded on the GPU. I'm not familiar with video codex, but there might be a codex which allows you to use the GPU to decode it. In this case I could work like this:

map buffer X
copy delta frame (whatever) to buffer (much smaller than full frame)
unmap buffer X
fence X
..
if(fence X has been reached) start decode shader (buffer->target texture)
swap target texture with rendered texture

Ashaman

Gnoblins: Website - Facebook - Twitter - Youtube - Steam Greenlit - IndieDB - Gamedev Log

21st Century Moose

13,459

April 14, 2016 10:45 AM

I've read this may be possible in OpenGL with some kind of PixelBufferObject? Is there anything like this in DirectX? I haven't tried reverting my code to UpdateSubResource for this case, but are there any other suggestions?

An OpenGL PBO is the equivalent of using two textures in D3D, either via CopyResource or CopySubresourceRegion.

To summarise, in OpenGL the workflow with a PBO is (1) map the PBO, (2) write data to it, (3) unmap the PBO and (4) update the texture via glTexImage2D/glTexSubImage2D.

The D3D equivalent is (1) map a staging resource, (2) write data to it, (3) unmap the staging resource, and (4) update the texture via CopyResource/CopySubresourceRegion.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

aWG

151

April 22, 2016 02:15 AM

Just a final update: I got it working using the Ashaman73 approach: Map / MemCopy / Unmap / CopyResource. For a bit better performance I've added multi-threading for the Memcopy and fences at the Unmap and CopyResource stages to ensure I never touch the texture until it's ready (avoiding all stalls). Performance went through the roof after enforcing no writes to the texture until the fence is finished.

I've talked with a few people who are much more familiar with the issue than I am, and they let me know that OpenGL does have a performance benefit because you don't have to unmap the texture when you perform the upload (you can leave it mapped, reducing some of the complexity and contention). Another issue is that for 4K textures it's better to upload in a compressed format (for video like I'm doing, that's a YUV format as opposed to RGBA because it's about 1/2 the data depending on your encoding scheme). You can then perform the final conversion via shaders (this saves the memory bandwidth and trades it for computation).

DX11 Update textures every frame

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

DX11 Update textures every frame

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines