Sign in to follow this  
souledgeii

directx11 : map vs updatesubresource, which one is better for my case?

Recommended Posts

souledgeii    100
I have a BIG dynamic buffer, which adding tons of vertex and tons of draw calls every frame. What I want to do is just call map/unmap or updatesubresource once per frame.

Read directx document, DEFAULT is better for fast GPU read, DYNAMIC is slower for GPU read but why?

I guess DYNAMIC working like :
1) map, reserve a memory pool in main memory which can be access directly by CPU.
2) unmap copy memory to GPU memory.
but if the final copy of DYNAMIC buffer residents in video memory, DYNAMIC shouldn't be slower for GPU read, right?

read from other post, updatesubresource will do two passes of memory copy that sounds waste and slow to me.

so I want fast GPU read, but don't want to use updatesubresource since it does two passes of memory copy which is waste to me, what's the best solution for me?

Thanks

Share this post


Link to post
Share on other sites
iedoc    2527
well, i can tell you, you will surely want to use a dynamic buffer, mapping and unmapping it to update it. updatesubresource is really slow compared to it, which is why it's recommended to only use updatesubresource whenever you will not be updating every frame.

I'm not sure if you've seen this link, but it might be helpful:
[url="http://msdn.microsoft.com/en-us/library/windows/desktop/bb205132(v=vs.85).aspx"]http://msdn.microsoft.com/en-us/library/windows/desktop/bb205132(v=vs.85).aspx[/url]

Share this post


Link to post
Share on other sites
Evil Steve    2017
[quote name='souledgeii' timestamp='1329883905' post='4915403']
Read directx document, DEFAULT is better for fast GPU read, DYNAMIC is slower for GPU read but why?
[/quote]Dynamic resources are usually placed in AGP memory, somewhere where both the CPU and GPU can access them. It's slower for the GPU to access than video memory, but the CPU can access it without getting the GPU to copy the resource out of video memory and into CPU-accessible memory (If you're reading from it), and then back from CPU-accessible memory into VRAM when you unlock / unmap the buffer.

Share this post


Link to post
Share on other sites
Aqua Costa    3692
[quote name='iedoc' timestamp='1329911008' post='4915463']
well, i can tell you, you will surely want to use a dynamic buffer, mapping and unmapping it to update it. updatesubresource is really slow compared to it, which is why it's recommended to only use updatesubresource whenever you will not be updating every frame.
[/quote]

Hi, have you tested both methods performance? Because I was wondering that since the GPU can access resources with Default usage faster than Dynamic usage, UpdateSubresource should be faster right?

Share this post


Link to post
Share on other sites
iedoc    2527
I'm sorry, i actually made a mistake in my previous post. What i "meant" to be comparing is the dynamic and default buffers, but since mapping is used for dynamic buffers, and updatesubresource is used for default buffers and staging buffers, I still hold to what i said.

I actually have tested it, and i know for a fact updating a dynamic buffer every frame is "usually" faster than using updatesubresource on a default buffer (if done right). the reason is because of the "intent" of the buffer. You could look at default buffers as "drawing" fast, and dynamic buffers as "updating" fast (from the cpu's perspective, which is similar to our perspective). updatesubresource can be slow (not ALWAYS though) because theres a chance that you might call draw which uses the default buffer, then call updatesubresource, but the buffer might not be able to be updated because the draw call has not finished with the buffer, so the update needs to wait until the draw call is finished, and because of this, it creates 2 copies of the resource.. I'm not going to pretend i know the details of all this, i'm sure someone can fill in though, or at least comment on my likely mistakes, dynamic buffers are placed in "mappable" memory, where the cpu is able to copy data to it "on the fly". because of this though, if you don't do it right, and map the resource at the wrong time, the gpu might have to wait for the resource to be unmapped so it can use it to draw or whatever. Also, the dynamic buffer is placed in memory that pretty equally accessible to both the cpu and gpu, so it will take slightly longer for the gpu to read a dynamic buffer than a default buffer who is placed in video memory. Hope that makes sense

maybe i can expand or be more clear. default buffers give the CPU almost no access to the buffer. that is why updatesubresource needs to create copies of the data, so that the copy can be sent for the gpu to write to the buffer. mapping a dynamic buffer and updating it that way is faster because the cpu has more access to it, and can copy to it on the fly. there's like a trade-off with performance for the gpu and cpu when using resources. as msdn says, default gives the gpu pretty much complete access to the resource, and the cpu pretty much none at all, while the staging buffer gives the gpu pretty much no access and the cpu full access. dynamic buffers are right in between as far as i see it

Share this post


Link to post
Share on other sites
souledgeii    100
[quote name='iedoc' timestamp='1329911008' post='4915463']
well, i can tell you, you will surely want to use a dynamic buffer, mapping and unmapping it to update it. updatesubresource is really slow compared to it, which is why it's recommended to only use updatesubresource whenever you will not be updating every frame.

I'm not sure if you've seen this link, but it might be helpful:
[url="http://msdn.microsoft.com/en-us/library/windows/desktop/bb205132(v=vs.85).aspx"]http://msdn.microsof...2(v=vs.85).aspx[/url]
[/quote]
Thanks for the link, new to directx11 stuff, haven't read this doc.

But it's more confusing with this doc, from this doc, cpu create dynamic on frame N, and gpu run the command buffer on frame N+1. From this, it hint that the command buffer is double bufferred. so DYNAMIC data mapped on system memory has to be double buffered too. then we should have no sync issue when we do map/unmap, right?

In other part of docs, we do have sync issue between map/umap and following draw call. "Performing a map operation at the wrong time could potentially cause a severe drop in performance by forcing the GPU and the CPU to synchronize with each other. This synchronization will occur if the application wants to access a resource before the GPU is finished copying it into a resource the CPU can map." seems a stall will be generated when GPU finished command and wait for unmap finishing. from this, it seems command buffer is a ring buffer, there are some situations that CPU can catch up GPU.

Share this post


Link to post
Share on other sites
souledgeii    100
Thanks all the replies! that's really helpful.

Due to updatesubresource actually takes two passes copies, is it faster to do like this : allocate a DEFAULT buffer, allocate another DYNAMIC buffer, using one map/unmap to update the buffer, then using copysubresource copy to the DEFAULT buffer?

Share this post


Link to post
Share on other sites
souledgeii    100
Here are some ways to initialize a vertex buffer that changes over time.[list=1]
[*]Create a 2nd buffer with [b]D3D10_USAGE_STAGING[/b]; fill the second buffer using [url="mk:@MSITStore:C:Program%20Files%20(x86)Microsoft%20DirectX%20SDK%20(June%202010)DocumentationDirectX9windows_graphics.chm::/direct3d11/id3d11devicecontext_map.htm"][b]ID3D11DeviceContext::Map[/b][/url], [url="mk:@MSITStore:C:Program%20Files%20(x86)Microsoft%20DirectX%20SDK%20(June%202010)DocumentationDirectX9windows_graphics.chm::/direct3d11/id3d11devicecontext_unmap.htm"][b]ID3D11DeviceContext::Unmap[/b][/url]; use [url="mk:@MSITStore:C:Program%20Files%20(x86)Microsoft%20DirectX%20SDK%20(June%202010)DocumentationDirectX9windows_graphics.chm::/direct3d11/id3d11devicecontext_copyresource.htm"][b]ID3D11DeviceContext::CopyResource[/b][/url] to copy from the staging buffer to the default buffer.
[*]Use [url="mk:@MSITStore:C:Program%20Files%20(x86)Microsoft%20DirectX%20SDK%20(June%202010)DocumentationDirectX9windows_graphics.chm::/direct3d11/id3d11devicecontext_updatesubresource.htm"][b]ID3D11DeviceContext::UpdateSubresource[/b][/url] to copy data from memory.
[*]Create a buffer with [b][color="#ffffff"]D3D11_USAGE_DYNAMIC[/color][/b], and fill it with [url="mk:@MSITStore:C:Program%20Files%20(x86)Microsoft%20DirectX%20SDK%20(June%202010)DocumentationDirectX9windows_graphics.chm::/direct3d11/id3d11devicecontext_map.htm"][b]ID3D11DeviceContext::Map[/b][/url], [url="mk:@MSITStore:C:Program%20Files%20(x86)Microsoft%20DirectX%20SDK%20(June%202010)DocumentationDirectX9windows_graphics.chm::/direct3d11/id3d11devicecontext_unmap.htm"][b]ID3D11DeviceContext::Unmap[/b][/url] (using the Discard and NoOverwrite flags appropriately).
[/list]
#1 and #2 are useful for content that changes less than once per frame. In general, GPU reads will be fast and CPU updates will be slower.
#3 is useful for content that changes more than once per frame. In general, GPU reads will be slower, but CPU updates will be faster.

comparing option 1&2, which one is better for performance? it's strange to only mentioned [b]D3D10_USAGE_STAGING[/b] in option 1, not consider [b][color="#ffffff"]D3D11_USAGE_DYNAMIC[/color][/b]

Share this post


Link to post
Share on other sites
So, it's recommended to use Dynamic Buffers when a mesh or model changes every frame? As in Water animations; particles; and skeletal animations...Default would be for static and non-animation? If staging has performance issues then is it only recommended to use once in a great while?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this