Single big shader buffer for hardware instancing

lipsryme · 2013-04-16T14:04:59

I saw this part from the battlefield3 presentation (http://dice.se/wp-content/uploads/GDC11_DX11inBF3_Public.pdf) on page 30 where they show how they transfer transformations of each instance to the shader using a single big buffer and a const buffer with indices to index this buffer at the right position using the SV_InstanceID semantic. Can someone tell me what exactly this buffer is ? This part here: Buffer<float4> instanceVectorBuffer : register(t0); I'm not too familiar with buffers in directx11 beyond constant buffers... Isn't register(t0) for texture slots ? Also they seem to use 3x4 floats for their transformations, so does that mean they construct their transformation matrix inside the shader using translation, rotation and scale vectors ? And last how would I set such a buffer in the DX11 API ?

Graphics and GPU Programming Programming DX11

Started by lipsryme March 28, 2013 12:51 PM

16 comments, last by lipsryme 11 years ago

kauna

2,925

April 02, 2013 12:01 AM

Use DeviceContext->Map(...) with D3D11_MAP_WRITE_DISCARD to get a pointer to an empty the buffer.

Fill the data (with memcpy for example) and the call Unmap. Don't write outside of the buffer! It will cause memory corruption and may even hang your computer.

It doesn't really make sense to use the buffer for one float, but I assume it to be a test. I used DXGI_FORMAT_R32G32B32A32_FLOAT as the buffer format since handling things as float4 makes maybe a bit more sense. You'll need at least 3x4 floats to store a typical matrix, but of course probably you'll want to allocate some megabytes of storage (within reason of course).

Cheers!

lipsryme

1,522

Author

April 02, 2013 12:47 PM

Got it working perfectly thanks !

So the only way is to define a constant max size for this buffer inside the code...?

Worst case would be 3 x numInstances which could add up to quite a lot.

By the way I'm currently assigning each instance the 3 float4's inside a loop. I assume it would be more efficient to add them inside a structure which I then copy once using memcopy or is that neglectable?

I'm only doing map / unmap once, but I'm filling the data like this:


// Update buffer
XMMATRIX worldTransform = XMMatrixTranspose(XMLoadFloat4x4(&scenePrimitive->worldTransform));			
for(int u = 0; u < 3; u++)
{	
XMStoreFloat4(&pInstanceData[(numInstances * 3) + u], worldTransform.r);
				}

basically in the loop that goes through every instance and counts them to do the instanced draw call at the end.

Also am I correct in thinking that I am restricted to one specific material (texture and properties from const buffer) for every unique instance group ?

update: Just read about putting them inside a Texture2DArray...so for specific material properties would it make sense to also store them inside a buffer like I did with the transforms ? Might be overkill for e.g. booleans but still...

Portfolio/Blog: http://marcel-schindler.weebly.com

mrheisenberg

362

April 04, 2013 02:16 AM

Got it working perfectly thanks !

So the only way is to define a constant max size for this buffer inside the code...?

Worst case would be 3 x numInstances which could add up to quite a lot.

By the way I'm currently assigning each instance the 3 float4's inside a loop. I assume it would be more efficient to add them inside a structure which I then copy once using memcopy or is that neglectable?

I'm only doing map / unmap once, but I'm filling the data like this:
// Update buffer
XMMATRIX worldTransform = XMMatrixTranspose(XMLoadFloat4x4(&scenePrimitive->worldTransform));			
for(int u = 0; u < 3; u++)
{	
XMStoreFloat4(&pInstanceData[(numInstances * 3) + u], worldTransform.r);
				}
basically in the loop that goes through every instance and counts them to do the instanced draw call at the end.

Also am I correct in thinking that I am restricted to one specific material (texture and properties from const buffer) for every unique instance group ?

update: Just read about putting them inside a Texture2DArray...so for specific material properties would it make sense to also store them inside a buffer like I did with the transforms ? Might be overkill for e.g. booleans but still...

Wait, can I ask something - how does your draw call look like?I mean, do you call DrawIndexedInstanced with only 1 buffer?Or with 1 vertex buffer and 1 empty instance buffer/or having the ID inside the instance buffer data?

lipsryme

1,522

Author

April 04, 2013 10:14 AM

I only use a vertex buffer yes, not using instance buffer at all but giving it per instance transformations via a float4 buffer that holds these.
They're then accessed inside the shader using the instanceID.

Portfolio/Blog: http://marcel-schindler.weebly.com

mrheisenberg

362

April 16, 2013 05:54 AM

I only use a vertex buffer yes, not using instance buffer at all but giving it per instance transformations via a float4 buffer that holds these.
They're then accessed inside the shader using the instanceID.

Hm I tried to do the same as you to see how it compares to normal instancing, however nothing leaves the vertex shader.The Shader Debugger detects that the sphere goes in:
However nothing gets rasterized.It can't be the view matrix, since it works on normal rendering, so it has to be something to do with the large buffer object.Did you ever experience such an issue?

lipsryme

1,522

Author

April 16, 2013 10:28 AM

Hmm I do remember having some issues that no pixel shader was being executed....can't remember exactly what it was though...

Have you made sure that your InputLayout is correctly set ?

Should look something like this:


	D3D11_INPUT_ELEMENT_DESC lo[] = 
	{
		{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
		{"TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
		{"NORMAL",   0, DXGI_FORMAT_R32G32B32_FLOAT, 2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
		{"TANGENT",  0, DXGI_FORMAT_R32G32B32_FLOAT, 3, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
	};

fourth parameter is important here as this is the vertex buffer slot...

Also try to make sure the float4s are combined in the correct order.

I've had a lot of problems with this. In the end I had to transpose the matrix first and then pass them to the cbuffer because otherwise you'd loose the 4th row because

if you pass your 3 rows something like:


cbufferData.data = worldMatrix.r;

you're passing only 3 vectors of this matrix and you will loose important data if you pass the rows instead of the columns.

That's why I changed it over to columns and it worked:


float4x4 GetInstanceTransform(uint instID, uint offset)
{
	uint BufferOffset = instID * elementsPerInstance + startIndex + offset;

	float4 c0 = InstanceDataBuffer.Load(BufferOffset + 0);
	float4 c1 = InstanceDataBuffer.Load(BufferOffset + 1);
	float4 c2 = InstanceDataBuffer.Load(BufferOffset + 2);
	float4 c3 = float4(0.0f, 0.0f, 0.0f, 1.0f);

	float4x4 _World = { c0.x, c1.x, c2.x, c3.x,
						c0.y, c1.y, c2.y, c3.y,
						c0.z, c1.z, c2.z, c3.z,
						c0.w, c1.w, c2.w, c3.w };


	return _World;
}

I don't see a way around that...but maybe someone does ?

Portfolio/Blog: http://marcel-schindler.weebly.com

mrheisenberg

362

April 16, 2013 01:59 PM

That's why I changed it over to columns and it worked

oh I see, cause in your other post you specify #pragma pack_matrix( row_major )

I'll try to get it to work tonight when i can to see what happens

lipsryme

1,522

Author

April 16, 2013 02:04 PM

Yes but that only applies to float4x4 values coming from the cbuffer. It doesn't apply to what you use inside the vertex shader and also not for float4's.

Portfolio/Blog: http://marcel-schindler.weebly.com

Single big shader buffer for hardware instancing

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Single big shader buffer for hardware instancing

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines