Unbinding a constant buffer

Started by
5 comments, last by Tispe 9 years, 2 months ago

Hi

I have a render() method that binds a cbuffer using VSSetConstantBuffers so it can be used in the vertex shader. Now after setting the constant buffer I issue a draw call and return to the caller.


void DXDevice::Render(VertexResource &Vert, IndexResource &Idx, InstanceResource &Inst)
{
	m_pImmediateContext->VSSetConstantBuffers(1, 1, Inst.GetBufferAddress());
	m_pImmediateContext->IASetVertexBuffers(0, 1, Vert.GetBufferAddress(), &Vert.GetVertexSize(), &Vert.GetOffset());
	m_pImmediateContext->IASetIndexBuffer(*Idx.GetBufferAddress(), Idx.GetFormat(), 0);
	m_pImmediateContext->IASetPrimitiveTopology(Vert.GetTopology());
	m_pImmediateContext->DrawIndexedInstanced(Idx.GetIndexCount(), Inst.GetInstanceCount(), 0, 0, 0);
}

However, the method does not manage the lifetime of the buffers, so that any buffer bound to the pipeline might be released at some later point while the buffer is bound.

Should I unbind/unset all buffers before returning from my render method?

Advertisement

However, the method does not manage the lifetime of the buffers, so that any buffer bound to the pipeline might be released at some later point while the buffer is bound.

This was true of D3D10 but not of 11; VSSetConstantBuffers will itself hold a reference to the buffer, so that you can safely Release it in your own code but it won't be destroyed until unbound (i.e by binding another buffer to the same slot, or by calling ClearState); see the documentation at https://msdn.microsoft.com/en-us/library/windows/desktop/ff476491%28v=vs.85%29.aspx

The method will hold a reference to the interfaces passed in. This differs from the device state behavior in Direct3D 10.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

So on that note, if you have slots that are rarely used (e.g. only on object uses VS cbuffer slot #13) then you might want to periodically set all slots to NULL (e.g. One a frame) to ensure that Released buffers actually get released :)

Going off topic - this hidden reference counting is slow, so I'd expect them to get rid of it in D3D12.

I just wanted to post some quick related questions before this thread drops.

I am passing a huge cbuffer array to the vertex shader that can contain upto 4096*4 indices which will be used to index into a tbuffer of mvp matrices during instancing.

And I get this warning:

D3D11 WARNING: ID3D11DeviceContext::DrawIndexedInstanced: The size of the Constant Buffer at slot 1 of the Vertex Shader unit is too small (1024 bytes provided, 65536 bytes, at least, expected). This is OK, as out-of-bounds reads are defined to return 0. It is also possible the developer knows the missing data will not be used anyway. This is only a problem if the developer actually intended to bind a sufficiently large Constant Buffer for what the shader expects. [ EXECUTION WARNING #351: DEVICE_DRAW_CONSTANT_BUFFER_TOO_SMALL]

So here is some of the code:


cbuffer InstanceIndices : register(b1)
{
	uint4 InstIdxArray[4096];
};

uint ReadInstanceIndex(uint instID)
{
	return InstIdxArray[instID >> 2][instID & 3];
}

std::shared_ptr<InstanceResource> CreateInstanceResource(DXDevice &Renderer, std::vector<DWORD> &InstanceIndices)
{
	return std::make_shared<InstanceResource>(InstanceIndices.size(), Renderer.CreateInstanceBuffer(InstanceIndices.data(), InstanceIndices.size() * sizeof(DWORD)));
}

CComPtr<ID3D11Buffer> DXDevice::CreateInstanceBuffer(const void* pDataSrc, UINT BufferSize)
{
	if (BufferSize % 16 != 0)						// Force BufferSize to be a multiple of 16
		BufferSize += (16 - (BufferSize % 16));

	return CreateBufferResource(pDataSrc, BufferSize, D3D11_BIND_CONSTANT_BUFFER, D3D11_USAGE_DEFAULT, 0);
}

CComPtr<ID3D11Buffer> DXDevice::CreateBufferResource(const void* pDataSrc, UINT BufferSize, UINT BindFlags, D3D11_USAGE Usage, UINT CPUAccessFlags)
{
	CComPtr<ID3D11Buffer> pBuffer = nullptr;

	try
	{
		if (BufferSize == 0)
			throw std::exception("The requested buffer resource is of size 0");

		D3D11_SUBRESOURCE_DATA sd;
		ZeroMemory(&sd, sizeof(sd));
		sd.pSysMem = pDataSrc;

		D3D11_BUFFER_DESC bd;
		ZeroMemory(&bd, sizeof(bd));
		bd.Usage = Usage;
		bd.ByteWidth = BufferSize;
		bd.BindFlags = BindFlags;
		bd.CPUAccessFlags = CPUAccessFlags;

		HR(m_pDevice->CreateBuffer(&bd, &sd, &pBuffer));
	}
	catch (std::exception &e)
	{
		WriteFile("error.log", e.what());
		return nullptr;
	}

	return pBuffer;
}

With this code I turn a vector of uints into a cbuffer to be used later for indexing. Is it the uint4 InstIdxArray[4096]; in HLSL that is causing the warning?

Also, I've done some testing at about 250 FPS, I am able to create 260,000 cbuffers (of size 1024 Bytes) per second (yes I am creating 1000 instance buffers and tossing them away every frame):


concurrency::concurrent_vector<std::shared_ptr<InstanceResource>> testvec;
		concurrency::parallel_for(0, 1000, [&Renderer, &pCubeMesh, &testvec](int i){
			testvec.push_back(CreateInstanceResource(Renderer, pCubeMesh->InstanceIndices));
		});

Is this a viable method of feeding the GPU with data? Say I want to instance up 1000 different draw calls. Doing it this way in parallel, creating the buffers means I don't have to map them using the device contex which is single threaded.


Is it the uint4 InstIdxArray[4096]; in HLSL that is causing the warning?

Yes. Basically it's telling you that the constant buffer you bound isn't big enough to match the HLSL declaration, which comes out to sizeof(uint) * 4 * 4096 = 64KB. It sounds to me like you're using 4096 as a max for your instance index buffer and then only creating buffers based on how many instances you actually render, which would cause that warning to occur if you have less than 16K elements in your std::vector of instance indices.

I'm actually kind of amazed that you can create a 1000 constant buffers in a single frame, and not get huge slowdowns or driver issues. While it's totally valid, it's not at all common to do things this way, since creating and destroying resources is generally considered to be a heavy operation. Are you doing this on an Nvidia driver? If I were you, I would try your test on hardware from different vendors and see how they handle it.


I'm actually kind of amazed that you can create a 1000 constant buffers in a single frame, and not get huge slowdowns or driver issues. While it's totally valid, it's not at all common to do things this way, since creating and destroying resources is generally considered to be a heavy operation. Are you doing this on an Nvidia driver? If I were you, I would try your test on hardware from different vendors and see how they handle it.

Yes, I am running a GTX 690 (347.25). I figured that concurrently creating buffers with initial data would outperform buffer reuse using device context mapping. Maybe I am wrong.


I'm actually kind of amazed that you can create a 1000 constant buffers in a single frame, and not get huge slowdowns or driver issues.

Using a single texture, I can render 1000 cube meshes with each mesh having 1000 instances. That is 1000 new ID3D11Buffer's every frame with updated instance data. Since each of the 1000 cubes have 1000 instances, that is 4000 Bytes of instance data per cube, totaling 4MB of instance data every frame.

Here is the cube:

[sharedmedia=gallery:images:6094]

Here is one million cubes hovering at about 70fps:

[sharedmedia=gallery:images:6093]

Surprisingly, adding concurrency to ID3D11Buffer recreation did not do a thing in my latest demo.

**EDIT:

I ran a bunch of tests with different numbers of draw calls and instances and found that there was a performance gain of using concurrency when there was more then 100 DrawCalls with a low amount of instances per call (10~100).


		// Multi Threaded results:
		// 1000 Cubes and 1000 Instances = 69 FPS
		// 2000 Cubes and 1000 Instances = 35 FPS
		// 1000 Cubes and 2000 Instances = 35 FPS
		// 500 Cubes and 1000 Instances = 136 FPS
		// 1000 Cubes and 500 Instances = 140 FPS
		// 500 Cubes and 500 Instances = 275 FPS
		// 500 Cubes and 100 Instances = 410 FPS
		// 100 Cubes and 500 Instances = 1200 FPS
		// 100 Cubes and 100 Instances = 1300 FPS
		// 2000 Cubes and 10 Instances = 109 FPS
		// 1000 Cubes and 10 Instances = 220 FPS
		// 500 Cubes and 10 Instances = 440 FPS
		// 100 Cubes and 10 Instances = 1100 FPS


		// Single Threaded results:
		// 1000 Cubes and 1000 Instances = 69 FPS
		// 2000 Cubes and 1000 Instances = 35 FPS
		// 1000 Cubes and 2000 Instances = 35 FPS
		// 500 Cubes and 1000 Instances = 136 FPS
		// 1000 Cubes and 500 Instances = 123 FPS
		// 500 Cubes and 500 Instances = 235 FPS
		// 500 Cubes and 100 Instances = 260 FPS
		// 100 Cubes and 500 Instances = 1000 FPS
		// 100 Cubes and 100 Instances = 1100 FPS
		// 2000 Cubes and 10 Instances = 64 FPS
		// 1000 Cubes and 10 Instances = 130 FPS
		// 500 Cubes and 10 Instances = 260 FPS
		// 100 Cubes and 10 Instances = 1100 FPS

Some code:


std::vector<std::shared_ptr<TextureHandle>> Textures;
Textures.push_back(CreateTextureFromBmpFile(Renderer, std::string("Wood.bmp")));

for (int z = 0; z < 1000 ; z++)						// 1000 Cubes
Textures.at(0)->Meshes.push_back(CreateCube(Renderer, 1000, z));	// 1000 Instances each cube

for (auto &Texture : Textures)
{
	for (auto &Mesh : Texture->Meshes)
		UpdateInstanceResource(Renderer, *Mesh);
} 

Renderer.Clear();

for (auto &Texture : Textures)
{
	if (Texture->Meshes.size() > 0)
	{
		Renderer.SetTexture(Texture->pTex.p);
			for (auto &Mesh : Texture->Meshes)
			Renderer.Render(*Mesh->pVertices, *Mesh->pIndices, *Mesh->pInstances);
	}
}

Renderer.Present(); 

Also, doubling the number of instances per cube to 2000 cut FPS in half. Doubling the number of cubes to 2000, and keeping 1000 instances per cube also cut FPS in half.

This topic is closed to new replies.

Advertisement