• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
lipsryme

DX11
Single big shader buffer for hardware instancing

17 posts in this topic

I saw this part from the battlefield3 presentation (http://dice.se/wp-content/uploads/GDC11_DX11inBF3_Public.pdf) on page 30 where they show how they transfer transformations of each instance to the shader using a single big buffer and a const buffer with indices to index this buffer at the right position using the SV_InstanceID semantic. Can someone tell me what exactly this buffer is ? 

This part here: 

Buffer<float4> instanceVectorBuffer : register(t0);

 

I'm not too familiar with buffers in directx11 beyond constant buffers...

Isn't register(t0) for texture slots ?  Also they seem to use 3x4 floats for their transformations, so does that mean they construct their transformation matrix inside the shader using translation, rotation and scale vectors ?

And last how would I set such a buffer in the DX11 API ?

0

Share this post


Link to post
Share on other sites

The MSDN documentation is very lacking in explaining these concepts...

IIRC:

The t registers are similar to the c registers, but they're designed for random memory access patterns (like texture lookups) rather than constant memory access patterns (e.g. it's assumed that every pixel will read the same constants, but may read different texels). You can bind both textures and DX11 buffer objects to the t registers and then load values from them in the shader.

 

You bind values to the t registers using *SSetShaderResources, which takes "shader resource views". You can create a view of a texture, which is the common case, but you can also create a view of a buffer.

 

As mentioned above, you should prefer this method of binding buffers to t registers when you're going to be performing random lookups into the buffer. If every pixel/vertex/etc is going to read the same values from the buffer, then you should bind it to a c register using the usual method.

Also they seem to use 3x4 floats for their transformations, so does that mean they construct their transformation matrix inside the shader using translation, rotation and scale vectors ?

No, in a regular transformation matrix that's been constructed from a traslation + rotation + scale, the 4th row/column (depending on your conventions) will always be [0,0,0,1] so you can hard-code that value in the shader to save some space in the buffer.

 

I'm still not as experienced with DX11 as I am with DX9, so I hope that's correct cool.png

Edited by Hodgman
2

Share this post


Link to post
Share on other sites

Yes I meant 3 x float4 (sorry for the confusing syntax smile.png  )

Alright thanks for the info.

 

EDIT: By the way this buffer then should be dynamic, correct? Since I need to put in or remove elements during rendering.

Edited by lipsryme
0

Share this post


Link to post
Share on other sites

EDIT: Nevermind got it fixed...my shader wasn't up to date when I ran it smile.png

Edited by lipsryme
0

Share this post


Link to post
Share on other sites

The advantage of a buffer over cbuffer is the size (128mb vs 64kb, although in typical scenario 2-4mb should be enough)

Another important advantage from a tbuffer over a cbuffer is that cbuffer suffer from constant waterfalling, which make them a horrible fit for scalable hw accelerated vertex skinning (indexing the constant buffer with a different index in each vertex).

When you're not suffering constant waterfalling, cbuffers can be faster though, for well.... constant data.
1

Share this post


Link to post
Share on other sites

Ok this may sound stupid, but what about just using regular instancing?Has this method been benchmarked against regular instancing with a second vertex buffer?

0

Share this post


Link to post
Share on other sites

Ok this may sound stupid, but what about just using regular instancing?Has this method been benchmarked against regular instancing with a second vertex buffer?

 

If you mean that regular instancing means using a second vertex buffer, I remember reading some years ago that one of the big companies wrote that shaders using a second vertex stream will take a performance hit. It had something to do with the fact that the vertex declaration structure gets pretty big. 

 

I think that the problem with classic instancing is also, that you'll need to define a vertex declaration for each different vertex shader type where the parameters are different. Normally of course you would just have a transform matrix as instance data for the vertex shader, but what if you need also some other per instance parameters, such as color, inverse world matrix or some other parameter which makes your instance different from the others. When using second vertex stream you'll need a different vertex declaration for each case.

 

In the case of the generic buffer, you'll need just one vertex declaration since the vertex data won't change. After it is the job of the vertex shader to extract the desired data from the generic buffer. Of course you'll need to make sure that the buffer contains the expected data. Also, since the buffer can be pretty big you'll be updating data much less often to the graphic card, which reduces API calls. Also, the program side logic get simpler when you don't need to worry so much about "will the data fit to the buffer or not, do I have to make several draw calls instead of one".

 

Of course, under D3D11 the instancing just another parameter that you can access in the vertex shader (InstanceID) and after that you may read or generate the instancing data in the best way you can imagine, though vertex streams can't be randomly accessed. 

 

Cheers!

1

Share this post


Link to post
Share on other sites

So..I've created the buffer like this:

	D3D11_BUFFER_DESC instanceBufferDesc = {};
	
	instanceBufferDesc.Usage = D3D11_USAGE_DYNAMIC;
	instanceBufferDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
	instanceBufferDesc.ByteWidth = sizeof(XMFLOAT4);
	instanceBufferDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
	instanceBufferDesc.MiscFlags = 0;
	instanceBufferDesc.StructureByteStride = 0;

	hr = this->device->CreateBuffer(&instanceBufferDesc, NULL, &this->instanceTransformBuffer);

 

is that correct so far ?

 

 
Now I'm a little clueless on how to create the shader resource view from this.
Any ideas ? Do I set the shader resource view description to NULL ?
 
This seems to work without errors:
	D3D11_SHADER_RESOURCE_VIEW_DESC shaderResourceDesc;
	shaderResourceDesc.Format = DXGI_FORMAT_R32_FLOAT;
	shaderResourceDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
	shaderResourceDesc.Buffer.FirstElement = 0;
	shaderResourceDesc.Buffer.ElementWidth = 4;
	shaderResourceDesc.Buffer.ElementOffset = 0;

	hr = this->device->CreateShaderResourceView(this->instanceTransformBuffer, &shaderResourceDesc, &this->instanceTransformBuffer_SRV);

 

 

Now if the above is correct, how can I basically push float4's onto the buffer ?

Edited by lipsryme
0

Share this post


Link to post
Share on other sites

Use DeviceContext->Map(...) with D3D11_MAP_WRITE_DISCARD to get a pointer to an empty the buffer.

 

Fill the data (with memcpy for example) and the call Unmap. Don't write outside of the buffer! It will cause memory corruption and may even hang your computer.

 

It doesn't really make sense to use the buffer for one float, but I assume it to be a test. I used DXGI_FORMAT_R32G32B32A32_FLOAT as the buffer format since handling things as float4 makes maybe a bit more sense. You'll need at least 3x4 floats to store a typical matrix, but of course probably you'll want to allocate some megabytes of storage (within reason of course).

 

Cheers!

Edited by kauna
1

Share this post


Link to post
Share on other sites

Got it working perfectly thanks !

So the only way is to define a constant max size for this buffer inside the code...?

Worst case would be 3 x numInstances which could add up to quite a lot.

 

By the way I'm currently assigning each instance the 3 float4's inside a loop. I assume it would be more efficient to add them inside a structure which I then copy once using memcopy or is that neglectable?

 

I'm only doing map / unmap once, but I'm filling the data like this:

// Update buffer
XMMATRIX worldTransform = XMMatrixTranspose(XMLoadFloat4x4(&scenePrimitive->worldTransform));			
for(int u = 0; u < 3; u++)
{	
XMStoreFloat4(&pInstanceData[(numInstances * 3) + u], worldTransform.r[u]);
				}

basically in the loop that goes through every instance and counts them to do the instanced draw call at the end.

 

Also am I correct in thinking that I am restricted to one specific material (texture and properties from const buffer) for every unique instance group ?

update: Just read about putting them inside a Texture2DArray...so for specific material properties would it make sense to also store them inside a buffer like I did with the transforms ? Might be overkill for e.g. booleans but still...

Edited by lipsryme
0

Share this post


Link to post
Share on other sites

Got it working perfectly thanks !

So the only way is to define a constant max size for this buffer inside the code...?

Worst case would be 3 x numInstances which could add up to quite a lot.

 

By the way I'm currently assigning each instance the 3 float4's inside a loop. I assume it would be more efficient to add them inside a structure which I then copy once using memcopy or is that neglectable?

 

I'm only doing map / unmap once, but I'm filling the data like this:

// Update buffer
XMMATRIX worldTransform = XMMatrixTranspose(XMLoadFloat4x4(&scenePrimitive->worldTransform));			
for(int u = 0; u < 3; u++)
{	
XMStoreFloat4(&pInstanceData[(numInstances * 3) + u], worldTransform.r[u]);
				}

basically in the loop that goes through every instance and counts them to do the instanced draw call at the end.

 

Also am I correct in thinking that I am restricted to one specific material (texture and properties from const buffer) for every unique instance group ?

update: Just read about putting them inside a Texture2DArray...so for specific material properties would it make sense to also store them inside a buffer like I did with the transforms ? Might be overkill for e.g. booleans but still...

Wait, can I ask something - how does your draw call look like?I mean, do you call DrawIndexedInstanced with only 1 buffer?Or with 1 vertex buffer and 1 empty instance buffer/or having the ID inside the instance buffer data?

0

Share this post


Link to post
Share on other sites

I only use a vertex buffer yes, not using instance buffer at all but giving it per instance transformations via a float4 buffer that holds these.
They're then accessed inside the shader using the instanceID.

Edited by lipsryme
0

Share this post


Link to post
Share on other sites

I only use a vertex buffer yes, not using instance buffer at all but giving it per instance transformations via a float4 buffer that holds these.
They're then accessed inside the shader using the instanceID.

Hm I tried to do the same as you to see how it compares to normal instancing, however nothing leaves the vertex shader.The Shader Debugger detects that the sphere goes in:
87adedad25e8a33b.pngHowever nothing gets rasterized.It can't be the view matrix, since it works on normal rendering, so it has to be something to do with the large buffer object.Did you ever experience such an issue?

0

Share this post


Link to post
Share on other sites

Hmm I do remember having some issues that no pixel shader was being executed....can't remember exactly what it was though...

Have you made sure that your InputLayout is correctly set ?

 

Should look something like this:

	D3D11_INPUT_ELEMENT_DESC lo[] = 
	{
		{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
		{"TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
		{"NORMAL",   0, DXGI_FORMAT_R32G32B32_FLOAT, 2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
		{"TANGENT",  0, DXGI_FORMAT_R32G32B32_FLOAT, 3, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
	};

 

fourth parameter is important here as this is the vertex buffer slot...

 

Also try to make sure the float4s are combined in the correct order.

I've had a lot of problems with this. In the end I had to transpose the matrix first and then pass them to the cbuffer because otherwise you'd loose the 4th row because

if you pass your 3 rows something like:

cbufferData.data[i] = worldMatrix.r[i];

 

you're passing only 3 vectors of this matrix and you will loose important data if you pass the rows instead of the columns.

That's why I changed it over to columns and it worked:

float4x4 GetInstanceTransform(uint instID, uint offset)
{
	uint BufferOffset = instID * elementsPerInstance + startIndex + offset;

	float4 c0 = InstanceDataBuffer.Load(BufferOffset + 0);
	float4 c1 = InstanceDataBuffer.Load(BufferOffset + 1);
	float4 c2 = InstanceDataBuffer.Load(BufferOffset + 2);
	float4 c3 = float4(0.0f, 0.0f, 0.0f, 1.0f);

	float4x4 _World = { c0.x, c1.x, c2.x, c3.x,
						c0.y, c1.y, c2.y, c3.y,
						c0.z, c1.z, c2.z, c3.z,
						c0.w, c1.w, c2.w, c3.w };


	return _World;
}

I don't see a way around that...but maybe someone does ?

Edited by lipsryme
0

Share this post


Link to post
Share on other sites

That's why I changed it over to columns and it worked

oh I see, cause in your other post you specify #pragma pack_matrix( row_major )

I'll try to get it to work tonight when i can to see what happens

0

Share this post


Link to post
Share on other sites

Yes but that only applies to float4x4 values coming from the cbuffer. It doesn't apply to what you use inside the vertex shader and also not for float4's.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By lonewolff
      Hi Guys,
      I am revisiting an old DX11 framework I was creating a while back and am scratching my head with a small issue.
      I am trying to set the pixel shader resources and am getting the following error on every loop.
      As you can see in the below code, I am clearing out the shader resources as per the documentation. (Even going overboard and doing it both sides of the main PSSet call). But I just can't get rid of the error. Which results in the render target not being drawn.
      ID3D11ShaderResourceView* srv = { 0 }; d3dContext->PSSetShaderResources(0, 1, &srv); for (std::vector<RenderTarget>::iterator it = rtVector.begin(); it != rtVector.end(); ++it) { if (it->szName == name) { //std::cout << it->srv <<"\r\n"; d3dContext->PSSetShaderResources(0, 1, &it->srv); break; } } d3dContext->PSSetShaderResources(0, 1, &srv);  
      I am storing the RT's in a vector and setting them by name. I have tested the it->srv and am retrieving a valid pointer.
      At this stage I am out of ideas.
      Any help would be greatly appreciated
       
    • By bowerbirdcn
      hi, guys, how to understand the math used in CDXUTDirectionWidget ::UpdateLightDir 
      the  following code snippet is taken from MS DXTU source code
       
        D3DXMATRIX mInvView;
          D3DXMatrixInverse( &mInvView, NULL, &m_mView );
          mInvView._41 = mInvView._42 = mInvView._43 = 0;
          D3DXMATRIX mLastRotInv;
          D3DXMatrixInverse( &mLastRotInv, NULL, &m_mRotSnapshot );
          D3DXMATRIX mRot = *m_ArcBall.GetRotationMatrix();
          m_mRotSnapshot = mRot;
          // Accumulate the delta of the arcball's rotation in view space.
          // Note that per-frame delta rotations could be problematic over long periods of time.
          m_mRot *= m_mView * mLastRotInv * mRot * mInvView;
          // Since we're accumulating delta rotations, we need to orthonormalize 
          // the matrix to prevent eventual matrix skew
          D3DXVECTOR3* pXBasis = ( D3DXVECTOR3* )&m_mRot._11;
          D3DXVECTOR3* pYBasis = ( D3DXVECTOR3* )&m_mRot._21;
          D3DXVECTOR3* pZBasis = ( D3DXVECTOR3* )&m_mRot._31;
          D3DXVec3Normalize( pXBasis, pXBasis );
          D3DXVec3Cross( pYBasis, pZBasis, pXBasis );
          D3DXVec3Normalize( pYBasis, pYBasis );
          D3DXVec3Cross( pZBasis, pXBasis, pYBasis );
       
       
      https://github.com/Microsoft/DXUT/blob/master/Optional/DXUTcamera.cpp
    • By YixunLiu
      Hi,
      I have a surface mesh and I want to use a cone to cut a hole on the surface mesh.
      Anybody know a fast method to calculate the intersected boundary of these two geometries?
       
      Thanks.
       
      YL
       
    • By hiya83
      Hi, I tried searching for this but either I failed or couldn't find anything. I know there's D11/D12 interop and there are extensions for GL/D11 (though not very efficient). I was wondering if there's any Vulkan/D11 or Vulkan/D12 interop?
      Thanks!
    • By lonewolff
      Hi Guys,
      I am just wondering if it is possible to acquire the address of the backbuffer if an API (based on DX11) only exposes the 'device' and 'context' pointers?
      Any advice would be greatly appreciated
  • Popular Now