# DX11 [DirectX11] Instancing

This topic is 2403 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've managed to get basic instancing working - what I mean is that I can easily add instances of a given model and they all render correctly. For now I can only change the positions, but the buffers already include data for rotations and scales. Problem starts, when I set the instance buffer as a dynamic one, allow CPU to write there and try to change the data (position of the model) each frame.

 Texture2D color_map : register( t0 ); SamplerState sample_type : register( s0 ); cbuffer world_view_proj : register( b0 ) { matrix world; matrix view; matrix projection; }; struct Vertex_Input_Type { float4 position : POSITION; float2 tex : TEXCOORD0; float3 normal : NORMAL; float3 tangent : TANGENT; float3 binormal : BINORMAL; float3 instance_pos : TEXCOORD1; float3 instance_rot : TEXCOORD2; float3 instance_scale : TEXCOORD3; }; struct Pixel_Input_Type { float4 position : SV_POSITION; float2 tex : TEXCOORD0; }; Pixel_Input_Type VS( Vertex_Input_Type input ) { Pixel_Input_Type output; input.position.w = 1.0f; input.position.x += input.instance_pos.x; input.position.y += input.instance_pos.y; input.position.z += input.instance_pos.z; output.position = mul( input.position, world ); output.position = mul( output.position, view ); output.position = mul( output.position, projection ); output.tex = input.tex; return output; } float4 PS( Pixel_Input_Type input ) : SV_TARGET { return color_map.Sample( sample_type, input.tex ); } technique11 Render { pass P0 { SetVertexShader( CompileShader( vs_4_0, VS() ) ); SetGeometryShader( 0 ); SetPixelShader( CompileShader( ps_4_0, PS() ) ); } } 

Creation of the layout:
 D3D11_INPUT_ELEMENT_DESC polygon_layout[8]; polygon_layout[0].SemanticName = "POSITION"; polygon_layout[0].SemanticIndex = 0; polygon_layout[0].Format = DXGI_FORMAT_R32G32B32A32_FLOAT; polygon_layout[0].InputSlot = 0; polygon_layout[0].AlignedByteOffset = 0; polygon_layout[0].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[0].InstanceDataStepRate = 0; polygon_layout[1].SemanticName = "TEXCOORD"; polygon_layout[1].SemanticIndex = 0; polygon_layout[1].Format = DXGI_FORMAT_R32G32_FLOAT; polygon_layout[1].InputSlot = 0; polygon_layout[1].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[1].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[1].InstanceDataStepRate = 0; polygon_layout[2].SemanticName = "NORMAL"; polygon_layout[2].SemanticIndex = 0; polygon_layout[2].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[2].InputSlot = 0; polygon_layout[2].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[2].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[2].InstanceDataStepRate = 0; polygon_layout[3].SemanticName = "TANGENT"; polygon_layout[3].SemanticIndex = 0; polygon_layout[3].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[3].InputSlot = 0; polygon_layout[3].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[3].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[3].InstanceDataStepRate = 0; polygon_layout[4].SemanticName = "BINORMAL"; polygon_layout[4].SemanticIndex = 0; polygon_layout[4].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[4].InputSlot = 0; polygon_layout[4].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[4].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[4].InstanceDataStepRate = 0; // INSTANCED DATA // position polygon_layout[5].SemanticName = "TEXCOORD"; polygon_layout[5].SemanticIndex = 1; polygon_layout[5].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[5].InputSlot = 1; polygon_layout[5].AlignedByteOffset = 0; polygon_layout[5].InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA; polygon_layout[5].InstanceDataStepRate = 1; // rotation polygon_layout[6].SemanticName = "TEXCOORD"; polygon_layout[6].SemanticIndex = 2; polygon_layout[6].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[6].InputSlot = 1; polygon_layout[6].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[6].InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA; polygon_layout[6].InstanceDataStepRate = 1; // scale polygon_layout[7].SemanticName = "TEXCOORD"; polygon_layout[7].SemanticIndex = 3; polygon_layout[7].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[7].InputSlot = 1; polygon_layout[7].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[7].InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA; polygon_layout[7].InstanceDataStepRate = 1; unsigned int num_elements = ARRAYSIZE( polygon_layout ); ID3DX11EffectTechnique* technique; technique = m_effect->GetTechniqueByName( "Render" ); ID3DX11EffectPass* pass = technique->GetPassByIndex( 0U ); D3DX11_PASS_SHADER_DESC pass_desc; D3DX11_EFFECT_SHADER_DESC shader_desc; pass->GetVertexShaderDesc( &pass_desc ); pass_desc.pShaderVariable->GetShaderDesc( pass_desc.ShaderIndex, &shader_desc ); if( FAILED( device->CreateInputLayout( polygon_layout, num_elements, shader_desc.pBytecode, shader_desc.BytecodeLength, &m_layout ) ) ) return false; 

The effect file gets loaded properly, textures show up as expected, thus I've cut it out. Creating the vertex buffer:
vertices - table that contains the vertex data of type Vertex_Type
 D3D11_BUFFER_DESC vertex_buf_desc; D3D11_SUBRESOURCE_DATA vertex_data; vertex_buf_desc.Usage = D3D11_USAGE_DEFAULT; vertex_buf_desc.ByteWidth = sizeof( Vertex_Type ) * vertex_count; vertex_buf_desc.BindFlags = D3D11_BIND_VERTEX_BUFFER; vertex_buf_desc.CPUAccessFlags = 0; vertex_buf_desc.MiscFlags = 0; vertex_buf_desc.StructureByteStride = 0; vertex_data.pSysMem = vertices; vertex_data.SysMemPitch = 0; vertex_data.SysMemSlicePitch = 0; if( FAILED( device->CreateBuffer( &vertex_buf_desc, &vertex_data, &m_vertex_buf ) ) ) { delete[] vertices; return false; } delete[] vertices; 

And here the instance buffer:
instances - table that contains the instance data of type Model_Instance_Type
 D3D11_BUFFER_DESC instance_buf_desc; instance_buf_desc.Usage = D3D11_USAGE_DYNAMIC; instance_buf_desc.ByteWidth = sizeof( Model_Instance_Type ) * m_model_instance_list.size(); instance_buf_desc.BindFlags = D3D11_BIND_VERTEX_BUFFER; instance_buf_desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; instance_buf_desc.MiscFlags = 0; instance_buf_desc.StructureByteStride = 0; D3D11_SUBRESOURCE_DATA instance_data; instance_data.pSysMem = instances; instance_data.SysMemPitch = 0; instance_data.SysMemSlicePitch = 0; if( FAILED( device->CreateBuffer( &instance_buf_desc, &instance_data, &m_instance_buf ) ) ) { delete[] instances; return false; } delete[] instances; 

Everything works perfectly until I try to access the data in the instance buffer and change it (on a per-frame basis). Here is the function that does that, along with the structs I'm using (in case there's an error).

 D3D11_MAPPED_SUBRESOURCE mapped_subresource; if( FAILED( device_context->Map( m_instance_buf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource ) ) ) return false; Model_Instance_Type* instance_data = static_cast<Model_Instance_Type*>( mapped_subresource.pData ); instance_data[index].pos = XMFLOAT3( posX, posY, posZ ); instance_data[index].rot = XMFLOAT3( rotX, rotY, rotZ ); instance_data[index].scale = XMFLOAT3( scaleX, scaleY, scaleZ ); device_context->Unmap( m_instance_buf, 0 ); 

I think that the functions that actually render the model and set up the shader variables will be needed:
 void IceModel::Render( ID3D11DeviceContext* device_context ) { RenderBuffers( device_context ); XMFLOAT4X4 world, view, projection; XMMATRIX xna_world, xna_view, xna_projection; GetWorldMatrix( xna_world ); XMStoreFloat4x4( &world, xna_world ); GetViewMatrix( xna_view ); XMStoreFloat4x4( &view, xna_view ); GetProjectionMatrix( xna_projection ); XMStoreFloat4x4( &projection, xna_projection ); shader->Render( device_context, m_model.size(), m_model_instance_list.size(), world, view, projection, m_tex ); } void Model::RenderBuffers( ID3D11DeviceContext* device_context ) { unsigned int strides[] = { sizeof( Vertex_Type ), sizeof( Model_Instance_Type ) }; unsigned int offsets[] = { 0, 0 }; ID3D11Buffer* buf_ptrs[] = { m_vertex_buf, m_instance_buf }; device_context->IASetVertexBuffers( 0, 2, buf_ptrs, strides, offsets ); device_context->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); } bool Shader2D::Render( ID3D11DeviceContext* device_context, const int& vertex_count, const int& instance_count, XMFLOAT4X4 world, XMFLOAT4X4 view, XMFLOAT4X4 projection, const std::vector<Texture*>& tex ) { XMMATRIX xna_world = XMLoadFloat4x4( &world ); XMMATRIX xna_view = XMLoadFloat4x4( &view ); XMMATRIX xna_projection = XMLoadFloat4x4( &projection ); ID3DX11EffectShaderResourceVariable* color_map = m_effect->GetVariableByName( "color_map" )->AsShaderResource(); if( FAILED( color_map->SetResource( tex[1]->GetTexture() ) ) ) return false; ID3DX11EffectSamplerVariable* sample_type = m_effect->GetVariableByName( "sample_type" )->AsSampler(); if( FAILED( sample_type->SetSampler( 0, m_sample_state ) ) ) return false; ID3DX11EffectMatrixVariable* world_matrix = m_effect->GetVariableByName( "world" )->AsMatrix(); if( FAILED( world_matrix->SetMatrix( reinterpret_cast<float*>( &xna_world ) ) ) ) return false; ID3DX11EffectMatrixVariable* view_matrix = m_effect->GetVariableByName( "view" )->AsMatrix(); if( FAILED( view_matrix->SetMatrix( reinterpret_cast<float*>( &xna_view ) ) ) ) return false; ID3DX11EffectMatrixVariable* projection_matrix = m_effect->GetVariableByName( "projection" )->AsMatrix(); if( FAILED( projection_matrix->SetMatrix( reinterpret_cast<float*>( &xna_projection ) ) ) ) return false; RenderShader( device_context, vertex_count, instance_count ); return true; } void Shader::RenderShader( ID3D11DeviceContext* device_context, const int& vertex_count, const int& instance_count ) { device_context->IASetInputLayout( m_layout ); ID3DX11EffectTechnique* technique = m_effect->GetTechniqueByName( "Render" ); D3DX11_TECHNIQUE_DESC tech_desc; technique->GetDesc( &tech_desc ); ID3DX11EffectPass* pass; for( unsigned int i = 0; i < tech_desc.Passes; ++i ) { pass = technique->GetPassByIndex( i ); if( pass ) { pass->Apply( 0, device_context ); device_context->DrawInstanced( vertex_count, instance_count, 0, 0 ); } } } 

 struct Vertex_Type { XMFLOAT4 pos; XMFLOAT2 tex; XMFLOAT3 normal; XMFLOAT3 tangent; XMFLOAT3 binormal; }; struct Model_Instance_Type { XMFLOAT3 pos; XMFLOAT3 rot; XMFLOAT3 scale; }; 

Now about how it's not working. The model I want to move (the one I'm updating with new position) renders ideally, moves, no artifacts. However all other instanced objects are blinking, like they were rendered each 2nd frame so it's clearly seen that they're not rendered properly. If that wasn't enough, the instance I'm moving, not only renders in the proper spot, but it keeps rendering itself in the original position with the same kind of blinking. I'm completely lost on this, since if I get that well - when you update the instance buffer data, the old content gets overwritten. So how come the object renders itself at the original position?

I know that it's a lot of code, but I thought that I understood instancing as for static objects it works (I can set as many instances of each model as I wish with any coordinates) and this just destroys the day. If there's something more You need to know, please ask as I'd really like to get this going.

##### Share on other sites
 D3D11_MAPPED_SUBRESOURCE mapped_subresource; if( FAILED( device_context->Map( m_instance_buf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource ) ) ) return false; Model_Instance_Type* instance_data = static_cast<Model_Instance_Type*>( mapped_subresource.pData ); instance_data[index].pos = XMFLOAT3( posX, posY, posZ ); instance_data[index].rot = XMFLOAT3( rotX, rotY, rotZ ); instance_data[index].scale = XMFLOAT3( scaleX, scaleY, scaleZ ); device_context->Unmap( m_instance_buf, 0 ); 

Could you post all of the code for this part? You have an index which suggests you're using a loop, but there's no loop here.

##### Share on other sites
That's not really a loop index, but rather instance index in the vector, that contains them (raw data, not the instance buffer). Here is the whole function:

 bool Model::UpdateInstance( const int& index, const float& posX, const float& posY, const float& posZ, const float& rotX, const float& rotY, const float& rotZ, const float& scaleX, const float& scaleY, const float& scaleZ ) { m_model_instance_list[index]->pos = XMFLOAT3( posX, posY, posZ ); m_model_instance_list[index]->rot = XMFLOAT3( rotX, rotY, rotZ ); m_model_instance_list[index]->scale = XMFLOAT3( scaleX, scaleY, scaleZ ); D3D11_MAPPED_SUBRESOURCE mapped_subresource; if( FAILED( device_context->Map( m_instance_buf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource ) ) ) return false; Model_Instance_Type* instance_data = static_cast<Model_Instance_Type*>( mapped_subresource.pData ); instance_data[index].pos = XMFLOAT3( posX, posY, posZ ); instance_data[index].rot = XMFLOAT3( rotX, rotY, rotZ ); instance_data[index].scale = XMFLOAT3( scaleX, scaleY, scaleZ ); device_context->Unmap( m_instance_buf, 0 ); return true; } 

m_model_instance_list :
 std::vector<Model_Instance_Type*> m_model_instance_list;

The idea is to not update whole buffer, when I want to update one given instance. As the data is a table I assumed the [] operator should work fine. The data gets changed when I debug that part, and the the instance moves. What happens are the blinking and "copying itself" I described earlier.

EDIT: No idea if that helps, but I'm unsure of the memory alignments. While I did manage to get around the XMMATRIX requirements (storing them as XMFLOAT4X4 and using Store and Load functions), I might have an error when describing layout. How I understood it is, that the instance part of a layout does not need (nor have to) be aligned to the per vertex data - thus the 1st thing from instance vector - position, has 0 as a byte alignment. Also made InputSlot : 0 for per-vertex, 1 for per-instance, as there are 2 vertex buffers (haven't seen D3D11_BIND_INSTANCE_BUFFER flag, so I used vertex).

##### Share on other sites

The idea is to not update whole buffer, when I want to update one given instance. As the data is a table I assumed the [] operator should work fine. The data gets changed when I debug that part, and the the instance moves. What happens are the blinking and "copying itself" I described earlier.

Ah, well that's your problem. When you map with D3D11_MAP_WRITE_DISCARD (which is what you should be doing for a dynamic VB) the entire contents of the vertex buffer are invalidated. So you can't just copy in the data for one instance at a time, you have to copy in the data for all instances.

##### Share on other sites

[quote name='vipeout' timestamp='1311356682' post='4839043']
The idea is to not update whole buffer, when I want to update one given instance. As the data is a table I assumed the [] operator should work fine. The data gets changed when I debug that part, and the the instance moves. What happens are the blinking and "copying itself" I described earlier.

Ah, well that's your problem. When you map with D3D11_MAP_WRITE_DISCARD (which is what you should be doing for a dynamic VB) the entire contents of the vertex buffer are invalidated. So you can't just copy in the data for one instance at a time, you have to copy in the data for all instances.
[/quote]

Aw, so bad of me, I've been reading the meaning of flags, just forgot that :/. Thanks for showing this to me. I'll try fixing this ASAP, but I've got 1 question then: What do you do if you have thousands of instances? Buffer can get quite big isn't there a way to not write whole buffer each frame, when just 1% of it changed?

EDIT: Worked perfectly. Thank you very much, been trying to solve it even trying to add padding values to the buffer structs (I've been thinking it could be reading "dirty" data, not the one I've assigned).

##### Share on other sites
If you don't want to update the whole buffer, then you can't use DISCARD. For dynamic buffers, the driver will create multiple buffers behind the scenes and cycle through them whenever you update them so that you avoid any synchronization issues with the GPU (since you don't want to write to an area of memory while the GPU is accessing it). This fits in nicely with the semantics of DISCARD, since the the driver can cycle to the next buffer since the contents are undefined by the spec. If you don't want to update the entire buffer you can use NO_OVERWRITE, but when you do that you can only update a portion of the buffer that the GPU isn't currently using. This can work for adding new instances, but not for updating existing instances.

• 10
• 12
• 10
• 10
• 11
• ### Similar Content

• Hi, right now building my engine in visual studio involves a shader compiling step to build hlsl 5.0 shaders. I have a separate project which only includes shader sources and the compiler is the visual studio integrated fxc compiler. I like this method because on any PC that has visual studio installed, I can just download the solution from GitHub and everything just builds without additional dependencies and using the latest version of the compiler. I also like it because the shaders are included in the solution explorer and easy to browse, and double-click to open (opening files can be really a pain in the ass in visual studio run in admin mode). Also it's nice that VS displays the build output/errors in the output window.
Anyone with some experience in this?

• Hello!
Have a problem with reflection shader for D3D11:
1>engine_render_d3d11_system.obj : error LNK2001: unresolved external symbol IID_ID3D11ShaderReflection
#include <D3Dcompiler.h>
#include <D3DCompiler.inl>
#pragma comment(lib, "D3DCompiler.lib")
//#pragma comment(lib, "D3DCompiler_47.lib")
As MSDN tells me but still no fortune. I think lot of people did that already, what I missing?
where recommend to use SDK headers and libs before Wind SDK, but I am not using DirectX SDK for this project at all, should I?

• Hi there, this is my first post in what looks to be a very interesting forum.
I am using DirectXTK to put together my 2D game engine but would like to use the GPU depth buffer in order to avoid sorting back-to-front on the CPU and I think I also want to use GPU instancing, so can I do that with SpriteBatch or am I looking at implementing my own sprite rendering?

• I am trying to draw a screen-aligned quad with arbitrary sizes.

currently I just send 4 vertices to the vertex shader like so:
pDevCon->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP);
pDevCon->Draw(4, 0);

then in the vertex shader I am doing this:
float4 main(uint vI : SV_VERTEXID) : SV_POSITION
{
float2 texcoord = float2(vI & 1, vI >> 1);
return float4((texcoord.x - 0.5f) * 2, -(texcoord.y - 0.5f) * 2, 0, 1);
}
that gets me a screen-sized quad...ok .. what's the correct way to get arbitrary sizes?...I have messed around with various numbers, but I think I don't quite get something in these relationships.
one thing I tried is:

float4 quad = float4((texcoord.x - (xpos/screensizex)) * (width/screensizex), -(texcoord.y - (ypos/screensizey)) * (height/screensizey), 0, 1);

.. where xpos and ypos is number of pixels from upper right corner..width and height is the desired size of the quad in pixels
this gets me somewhat close, but not right.. a bit too small..so I'm missing something ..any ideas?

.
• By Stewie.G
Hi,
I've been trying to implement a gaussian blur recently, it would seem the best way to achieve this is by running a bur on one axis, then another blur on the other axis.
I think I have successfully implemented the blur part per axis, but now I have to blend both calls with a proper BlendState, at least I think this is where my problem is.
Here are my passes:
D3DX11_TECHNIQUE_DESC techDesc; mBlockEffect->mTech->GetDesc( &techDesc ); for(UINT p = 0; p < techDesc.Passes; ++p) { deviceContext->IASetVertexBuffers(0, 2, bufferPointers, stride, offset); deviceContext->IASetIndexBuffer(mIB, DXGI_FORMAT_R32_UINT, 0); mBlockEffect->mTech->GetPassByIndex(p)->Apply(0, deviceContext); deviceContext->DrawIndexedInstanced(36, mNumberOfActiveCubes, 0, 0, 0); } No blur

PS_BlurV

PS_BlurH

P0 + P1

As you can see, it does not work at all.
I think the issue is in my BlendState, but I am not sure.
I've seen many articles going with the render to texture approach, but I've also seen articles where both shaders were called in succession, and it worked just fine, I'd like to go with that second approach. Unfortunately, the code was in OpenGL where the syntax for running multiple passes is quite different (http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/). So I need some help doing the same in HLSL :-)

Thanks!