Single big shader buffer for hardware instancing

Started by
16 comments, last by lipsryme 11 years ago

I saw this part from the battlefield3 presentation (http://dice.se/wp-content/uploads/GDC11_DX11inBF3_Public.pdf) on page 30 where they show how they transfer transformations of each instance to the shader using a single big buffer and a const buffer with indices to index this buffer at the right position using the SV_InstanceID semantic. Can someone tell me what exactly this buffer is ?

This part here:


Buffer<float4> instanceVectorBuffer : register(t0);

I'm not too familiar with buffers in directx11 beyond constant buffers...

Isn't register(t0) for texture slots ? Also they seem to use 3x4 floats for their transformations, so does that mean they construct their transformation matrix inside the shader using translation, rotation and scale vectors ?

And last how would I set such a buffer in the DX11 API ?

Advertisement

The MSDN documentation is very lacking in explaining these concepts...

IIRC:

The t registers are similar to the c registers, but they're designed for random memory access patterns (like texture lookups) rather than constant memory access patterns (e.g. it's assumed that every pixel will read the same constants, but may read different texels). You can bind both textures and DX11 buffer objects to the t registers and then load values from them in the shader.

You bind values to the t registers using *SSetShaderResources, which takes "shader resource views". You can create a view of a texture, which is the common case, but you can also create a view of a buffer.

As mentioned above, you should prefer this method of binding buffers to t registers when you're going to be performing random lookups into the buffer. If every pixel/vertex/etc is going to read the same values from the buffer, then you should bind it to a c register using the usual method.

Also they seem to use 3x4 floats for their transformations, so does that mean they construct their transformation matrix inside the shader using translation, rotation and scale vectors ?

No, in a regular transformation matrix that's been constructed from a traslation + rotation + scale, the 4th row/column (depending on your conventions) will always be [0,0,0,1] so you can hard-code that value in the shader to save some space in the buffer.

I'm still not as experienced with DX11 as I am with DX9, so I hope that's correct cool.png

Yes I meant 3 x float4 (sorry for the confusing syntax smile.png )

Alright thanks for the info.

EDIT: By the way this buffer then should be dynamic, correct? Since I need to put in or remove elements during rendering.

Yeah by the looks of it, they'd potentially be updating the entire buffer every frame.

EDIT: Nevermind got it fixed...my shader wasn't up to date when I ran it smile.png

The advantage of a buffer over cbuffer is the size (128mb vs 64kb, although in typical scenario 2-4mb should be enough) which allows you to store a frame worth of data to the buffer, also it allows you to fill the buffer once so you'll minimize buffer updates to minimum. The minor inconvenience is that you'll be able to store only float data (probably you can get around this with some bit manipulation method).

The buffer is pratically a texture and you'd bind it as vertex shader resource.

[source]
float4x4 GetInstanceMatrix(uint InstID,uint Offset)
{
uint BufferOffset = InstID * ElementsPerInstance + StartIndex + Offset;
// InstID = InstanceID
// ElementsPerInstance, how many float4's there are per drawn instance (typically 3 for a single mesh if 4x3 matrices are used)
// StartIndex = location of the first float4 inside the buffer for the first instance
// Offset (used to retrieve bone matrices)
float4 r0 = InstanceData.Load(BufferOffset + 0);
float4 r1 = InstanceData.Load(BufferOffset + 1);
float4 r2 = InstanceData.Load(BufferOffset + 2);
float4 r3 = float4(0.0f,0.0f,0.0f,1.0f);
return float4x4(r0,r1,r2,r3);
}
[/source]
Inside the vertex shader you may use the above code to retrieve the transform matrix for each vertex.
Cheers!

The advantage of a buffer over cbuffer is the size (128mb vs 64kb, although in typical scenario 2-4mb should be enough)

Another important advantage from a tbuffer over a cbuffer is that cbuffer suffer from constant waterfalling, which make them a horrible fit for scalable hw accelerated vertex skinning (indexing the constant buffer with a different index in each vertex).

When you're not suffering constant waterfalling, cbuffers can be faster though, for well.... constant data.

Ok this may sound stupid, but what about just using regular instancing?Has this method been benchmarked against regular instancing with a second vertex buffer?

>removed<

Ok this may sound stupid, but what about just using regular instancing?Has this method been benchmarked against regular instancing with a second vertex buffer?

If you mean that regular instancing means using a second vertex buffer, I remember reading some years ago that one of the big companies wrote that shaders using a second vertex stream will take a performance hit. It had something to do with the fact that the vertex declaration structure gets pretty big.

I think that the problem with classic instancing is also, that you'll need to define a vertex declaration for each different vertex shader type where the parameters are different. Normally of course you would just have a transform matrix as instance data for the vertex shader, but what if you need also some other per instance parameters, such as color, inverse world matrix or some other parameter which makes your instance different from the others. When using second vertex stream you'll need a different vertex declaration for each case.

In the case of the generic buffer, you'll need just one vertex declaration since the vertex data won't change. After it is the job of the vertex shader to extract the desired data from the generic buffer. Of course you'll need to make sure that the buffer contains the expected data. Also, since the buffer can be pretty big you'll be updating data much less often to the graphic card, which reduces API calls. Also, the program side logic get simpler when you don't need to worry so much about "will the data fit to the buffer or not, do I have to make several draw calls instead of one".

Of course, under D3D11 the instancing just another parameter that you can access in the vertex shader (InstanceID) and after that you may read or generate the instancing data in the best way you can imagine, though vertex streams can't be randomly accessed.

Cheers!

So..I've created the buffer like this:


	D3D11_BUFFER_DESC instanceBufferDesc = {};
	
	instanceBufferDesc.Usage = D3D11_USAGE_DYNAMIC;
	instanceBufferDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
	instanceBufferDesc.ByteWidth = sizeof(XMFLOAT4);
	instanceBufferDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
	instanceBufferDesc.MiscFlags = 0;
	instanceBufferDesc.StructureByteStride = 0;

	hr = this->device->CreateBuffer(&instanceBufferDesc, NULL, &this->instanceTransformBuffer);

is that correct so far ?

Now I'm a little clueless on how to create the shader resource view from this.
Any ideas ? Do I set the shader resource view description to NULL ?
This seems to work without errors:

	D3D11_SHADER_RESOURCE_VIEW_DESC shaderResourceDesc;
	shaderResourceDesc.Format = DXGI_FORMAT_R32_FLOAT;
	shaderResourceDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
	shaderResourceDesc.Buffer.FirstElement = 0;
	shaderResourceDesc.Buffer.ElementWidth = 4;
	shaderResourceDesc.Buffer.ElementOffset = 0;

	hr = this->device->CreateShaderResourceView(this->instanceTransformBuffer, &shaderResourceDesc, &this->instanceTransformBuffer_SRV);

Now if the above is correct, how can I basically push float4's onto the buffer ?

This topic is closed to new replies.

Advertisement