Jump to content

  • Log In with Google      Sign In   
  • Create Account


Asynchronous constant buffer update


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 David_pb   Members   -  Reputation: 668

Like
0Likes
Like

Posted 17 July 2013 - 03:13 AM

Hi,

 

currently I'm rethinking the way I handle shader constants in our engine. What I currently do is holding a local backing store for each constant buffer which gets filled by the shader constant provider(s). After all constants are assembled the constant buffers are mapped and the hole memory junk is simply copied via memcpy. Additionally I'm doing other stuff, to keep the number of updates as low as possible (sharing buffers, packing constants by update frequence).

 

This seems to be not efficient though, the GPU seems to keep renaming buffers what stalls the CPU far to often. I thought about doing the update asynchronous so other operations can be done during the update. It now happens that the device context is not multi thread safe, so the synchronization must be done by myself. Does anyone have experience with this topic? Or maybe I'm doing it all wrong and somebody can give me a hint.

 

Cheers


@D13_Dreinig

Sponsor:

#2 imoogiBG   Members   -  Reputation: 771

Like
0Likes
Like

Posted 17 July 2013 - 06:58 AM

Can you provide your D3D11_BUFFER_DESC and UpdateSubresource/Map functions for cbuffers?

Note: the device is only *thread-safe* for resource creation.



#3 David_pb   Members   -  Reputation: 668

Like
0Likes
Like

Posted 17 July 2013 - 07:36 AM

The BUFFER_DESC is fairly standard:

const bool isStatic = (flags & CBF_STATIC_BUFFER) != 0;

D3D11_BUFFER_DESC desc;
desc.ByteWidth = size; // size is already multiple of 16 here
desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
desc.CPUAccessFlags = 0;

if (isStatic)
{
  desc.Usage = D3D11_USAGE_IMMUTABLE;
}
else
{
  desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
  desc.Usage = D3D11_USAGE_DYNAMIC;
}
      
desc.MiscFlags = 0;
desc.StructureByteStride = 0;

D3D11_SUBRESOURCE_DATA data;
data.pSysMem = p_Data;
data.SysMemPitch = 0;
data.SysMemSlicePitch = 0;

HRESULT hr;
ID3D11Buffer* buffer;
hr = device->CreateBuffer(&desc, isStatic ? &data : 0, &buffer);
//...
I can't provide the actual update code, since it's to deeply integrated in the engine. But what basically happens is that the buffers which are marked for update are mapped, the memory chunk is copied via memcpy and the buffers are unmapped afterwards.

Note: the device is only *thread-safe* for resource creation.


Yes, I'm aware of that. But the context can be used still from many threads, although the access needs to be synchronized manually. I thought maybe someone here does have some knowledge with this.
@D13_Dreinig

#4 imoogiBG   Members   -  Reputation: 771

Like
0Likes
Like

Posted 17 July 2013 - 09:10 AM

Evrything looks fine..

 

XSSetConstantBuffers will update the state immediately, there is no state machine for cbuffers (maybe that depends, but I'm not quite sure)



#5 mhagain   Crossbones+   -  Reputation: 6328

Like
0Likes
Like

Posted 17 July 2013 - 05:33 PM

Map with discard can be very efficient with cbuffers - the D3D documentation calls this out as a specific path that can be expected to be optimized in drivers.  It's risky though - you need to be very careful not to read from a mapped cbuffer, otherwise performance could fall off badly.  And totally innocent looking code can do this; consider, for example:

cbstruct->a = cbstrcu->b = cbstruct->c = cbstruct->d = 0;

That's going to read from the mapped memory, and it flies in face of everything you've learned about making this stuff more efficient with pure CPU-side code, but it happens.

 

If you avoid this, you can easily do hundreds of cbuffer updates per frame without appreciably losing performance.

 

The alternative is to use UpdateSubresource, which will manage contention auotmatically for you, and automatically give you an asynchronous update if the driver determines that's what you need, but at the cost of some extra memory copying.

 

This is one of those "it depends" answers.  If you're satisfied that cbuffer updates are a performance problem for you, then the usual advice applies - profile, determine which is best for you, and use that.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#6 Jason Z   Crossbones+   -  Reputation: 3761

Like
0Likes
Like

Posted 19 July 2013 - 09:18 AM

Have you considered using multithreaded rendering with deferred contexts?  That could allow you to amortize the costs associated with updating over more threads and potentially smooth out any stalls that you are seeing.

 

I mention this because you said that the context is not thread safe, and the answer to this situation is that you can have multiple contexts...


Edited by Jason Z, 19 July 2013 - 09:18 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS