# DX11 [DirectX11] Instancing

I've managed to get basic instancing working - what I mean is that I can easily add instances of a given model and they all render correctly. For now I can only change the positions, but the buffers already include data for rotations and scales. Problem starts, when I set the instance buffer as a dynamic one, allow CPU to write there and try to change the data (position of the model) each frame.

 Texture2D color_map : register( t0 ); SamplerState sample_type : register( s0 ); cbuffer world_view_proj : register( b0 ) { matrix world; matrix view; matrix projection; }; struct Vertex_Input_Type { float4 position : POSITION; float2 tex : TEXCOORD0; float3 normal : NORMAL; float3 tangent : TANGENT; float3 binormal : BINORMAL; float3 instance_pos : TEXCOORD1; float3 instance_rot : TEXCOORD2; float3 instance_scale : TEXCOORD3; }; struct Pixel_Input_Type { float4 position : SV_POSITION; float2 tex : TEXCOORD0; }; Pixel_Input_Type VS( Vertex_Input_Type input ) { Pixel_Input_Type output; input.position.w = 1.0f; input.position.x += input.instance_pos.x; input.position.y += input.instance_pos.y; input.position.z += input.instance_pos.z; output.position = mul( input.position, world ); output.position = mul( output.position, view ); output.position = mul( output.position, projection ); output.tex = input.tex; return output; } float4 PS( Pixel_Input_Type input ) : SV_TARGET { return color_map.Sample( sample_type, input.tex ); } technique11 Render { pass P0 { SetVertexShader( CompileShader( vs_4_0, VS() ) ); SetGeometryShader( 0 ); SetPixelShader( CompileShader( ps_4_0, PS() ) ); } } 

Creation of the layout:
 D3D11_INPUT_ELEMENT_DESC polygon_layout[8]; polygon_layout[0].SemanticName = "POSITION"; polygon_layout[0].SemanticIndex = 0; polygon_layout[0].Format = DXGI_FORMAT_R32G32B32A32_FLOAT; polygon_layout[0].InputSlot = 0; polygon_layout[0].AlignedByteOffset = 0; polygon_layout[0].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[0].InstanceDataStepRate = 0; polygon_layout[1].SemanticName = "TEXCOORD"; polygon_layout[1].SemanticIndex = 0; polygon_layout[1].Format = DXGI_FORMAT_R32G32_FLOAT; polygon_layout[1].InputSlot = 0; polygon_layout[1].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[1].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[1].InstanceDataStepRate = 0; polygon_layout[2].SemanticName = "NORMAL"; polygon_layout[2].SemanticIndex = 0; polygon_layout[2].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[2].InputSlot = 0; polygon_layout[2].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[2].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[2].InstanceDataStepRate = 0; polygon_layout[3].SemanticName = "TANGENT"; polygon_layout[3].SemanticIndex = 0; polygon_layout[3].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[3].InputSlot = 0; polygon_layout[3].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[3].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[3].InstanceDataStepRate = 0; polygon_layout[4].SemanticName = "BINORMAL"; polygon_layout[4].SemanticIndex = 0; polygon_layout[4].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[4].InputSlot = 0; polygon_layout[4].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[4].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygon_layout[4].InstanceDataStepRate = 0; // INSTANCED DATA // position polygon_layout[5].SemanticName = "TEXCOORD"; polygon_layout[5].SemanticIndex = 1; polygon_layout[5].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[5].InputSlot = 1; polygon_layout[5].AlignedByteOffset = 0; polygon_layout[5].InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA; polygon_layout[5].InstanceDataStepRate = 1; // rotation polygon_layout[6].SemanticName = "TEXCOORD"; polygon_layout[6].SemanticIndex = 2; polygon_layout[6].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[6].InputSlot = 1; polygon_layout[6].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[6].InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA; polygon_layout[6].InstanceDataStepRate = 1; // scale polygon_layout[7].SemanticName = "TEXCOORD"; polygon_layout[7].SemanticIndex = 3; polygon_layout[7].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygon_layout[7].InputSlot = 1; polygon_layout[7].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; polygon_layout[7].InputSlotClass = D3D11_INPUT_PER_INSTANCE_DATA; polygon_layout[7].InstanceDataStepRate = 1; unsigned int num_elements = ARRAYSIZE( polygon_layout ); ID3DX11EffectTechnique* technique; technique = m_effect->GetTechniqueByName( "Render" ); ID3DX11EffectPass* pass = technique->GetPassByIndex( 0U ); D3DX11_PASS_SHADER_DESC pass_desc; D3DX11_EFFECT_SHADER_DESC shader_desc; pass->GetVertexShaderDesc( &pass_desc ); pass_desc.pShaderVariable->GetShaderDesc( pass_desc.ShaderIndex, &shader_desc ); if( FAILED( device->CreateInputLayout( polygon_layout, num_elements, shader_desc.pBytecode, shader_desc.BytecodeLength, &m_layout ) ) ) return false; 

The effect file gets loaded properly, textures show up as expected, thus I've cut it out. Creating the vertex buffer:
vertices - table that contains the vertex data of type Vertex_Type
 D3D11_BUFFER_DESC vertex_buf_desc; D3D11_SUBRESOURCE_DATA vertex_data; vertex_buf_desc.Usage = D3D11_USAGE_DEFAULT; vertex_buf_desc.ByteWidth = sizeof( Vertex_Type ) * vertex_count; vertex_buf_desc.BindFlags = D3D11_BIND_VERTEX_BUFFER; vertex_buf_desc.CPUAccessFlags = 0; vertex_buf_desc.MiscFlags = 0; vertex_buf_desc.StructureByteStride = 0; vertex_data.pSysMem = vertices; vertex_data.SysMemPitch = 0; vertex_data.SysMemSlicePitch = 0; if( FAILED( device->CreateBuffer( &vertex_buf_desc, &vertex_data, &m_vertex_buf ) ) ) { delete[] vertices; return false; } delete[] vertices; 

And here the instance buffer:
instances - table that contains the instance data of type Model_Instance_Type
 D3D11_BUFFER_DESC instance_buf_desc; instance_buf_desc.Usage = D3D11_USAGE_DYNAMIC; instance_buf_desc.ByteWidth = sizeof( Model_Instance_Type ) * m_model_instance_list.size(); instance_buf_desc.BindFlags = D3D11_BIND_VERTEX_BUFFER; instance_buf_desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; instance_buf_desc.MiscFlags = 0; instance_buf_desc.StructureByteStride = 0; D3D11_SUBRESOURCE_DATA instance_data; instance_data.pSysMem = instances; instance_data.SysMemPitch = 0; instance_data.SysMemSlicePitch = 0; if( FAILED( device->CreateBuffer( &instance_buf_desc, &instance_data, &m_instance_buf ) ) ) { delete[] instances; return false; } delete[] instances; 

Everything works perfectly until I try to access the data in the instance buffer and change it (on a per-frame basis). Here is the function that does that, along with the structs I'm using (in case there's an error).

 D3D11_MAPPED_SUBRESOURCE mapped_subresource; if( FAILED( device_context->Map( m_instance_buf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource ) ) ) return false; Model_Instance_Type* instance_data = static_cast<Model_Instance_Type*>( mapped_subresource.pData ); instance_data[index].pos = XMFLOAT3( posX, posY, posZ ); instance_data[index].rot = XMFLOAT3( rotX, rotY, rotZ ); instance_data[index].scale = XMFLOAT3( scaleX, scaleY, scaleZ ); device_context->Unmap( m_instance_buf, 0 ); 

I think that the functions that actually render the model and set up the shader variables will be needed:
 void IceModel::Render( ID3D11DeviceContext* device_context ) { RenderBuffers( device_context ); XMFLOAT4X4 world, view, projection; XMMATRIX xna_world, xna_view, xna_projection; GetWorldMatrix( xna_world ); XMStoreFloat4x4( &world, xna_world ); GetViewMatrix( xna_view ); XMStoreFloat4x4( &view, xna_view ); GetProjectionMatrix( xna_projection ); XMStoreFloat4x4( &projection, xna_projection ); shader->Render( device_context, m_model.size(), m_model_instance_list.size(), world, view, projection, m_tex ); } void Model::RenderBuffers( ID3D11DeviceContext* device_context ) { unsigned int strides[] = { sizeof( Vertex_Type ), sizeof( Model_Instance_Type ) }; unsigned int offsets[] = { 0, 0 }; ID3D11Buffer* buf_ptrs[] = { m_vertex_buf, m_instance_buf }; device_context->IASetVertexBuffers( 0, 2, buf_ptrs, strides, offsets ); device_context->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); } bool Shader2D::Render( ID3D11DeviceContext* device_context, const int& vertex_count, const int& instance_count, XMFLOAT4X4 world, XMFLOAT4X4 view, XMFLOAT4X4 projection, const std::vector<Texture*>& tex ) { XMMATRIX xna_world = XMLoadFloat4x4( &world ); XMMATRIX xna_view = XMLoadFloat4x4( &view ); XMMATRIX xna_projection = XMLoadFloat4x4( &projection ); ID3DX11EffectShaderResourceVariable* color_map = m_effect->GetVariableByName( "color_map" )->AsShaderResource(); if( FAILED( color_map->SetResource( tex[1]->GetTexture() ) ) ) return false; ID3DX11EffectSamplerVariable* sample_type = m_effect->GetVariableByName( "sample_type" )->AsSampler(); if( FAILED( sample_type->SetSampler( 0, m_sample_state ) ) ) return false; ID3DX11EffectMatrixVariable* world_matrix = m_effect->GetVariableByName( "world" )->AsMatrix(); if( FAILED( world_matrix->SetMatrix( reinterpret_cast<float*>( &xna_world ) ) ) ) return false; ID3DX11EffectMatrixVariable* view_matrix = m_effect->GetVariableByName( "view" )->AsMatrix(); if( FAILED( view_matrix->SetMatrix( reinterpret_cast<float*>( &xna_view ) ) ) ) return false; ID3DX11EffectMatrixVariable* projection_matrix = m_effect->GetVariableByName( "projection" )->AsMatrix(); if( FAILED( projection_matrix->SetMatrix( reinterpret_cast<float*>( &xna_projection ) ) ) ) return false; RenderShader( device_context, vertex_count, instance_count ); return true; } void Shader::RenderShader( ID3D11DeviceContext* device_context, const int& vertex_count, const int& instance_count ) { device_context->IASetInputLayout( m_layout ); ID3DX11EffectTechnique* technique = m_effect->GetTechniqueByName( "Render" ); D3DX11_TECHNIQUE_DESC tech_desc; technique->GetDesc( &tech_desc ); ID3DX11EffectPass* pass; for( unsigned int i = 0; i < tech_desc.Passes; ++i ) { pass = technique->GetPassByIndex( i ); if( pass ) { pass->Apply( 0, device_context ); device_context->DrawInstanced( vertex_count, instance_count, 0, 0 ); } } } 

 struct Vertex_Type { XMFLOAT4 pos; XMFLOAT2 tex; XMFLOAT3 normal; XMFLOAT3 tangent; XMFLOAT3 binormal; }; struct Model_Instance_Type { XMFLOAT3 pos; XMFLOAT3 rot; XMFLOAT3 scale; }; 

Now about how it's not working. The model I want to move (the one I'm updating with new position) renders ideally, moves, no artifacts. However all other instanced objects are blinking, like they were rendered each 2nd frame so it's clearly seen that they're not rendered properly. If that wasn't enough, the instance I'm moving, not only renders in the proper spot, but it keeps rendering itself in the original position with the same kind of blinking. I'm completely lost on this, since if I get that well - when you update the instance buffer data, the old content gets overwritten. So how come the object renders itself at the original position?

I know that it's a lot of code, but I thought that I understood instancing as for static objects it works (I can set as many instances of each model as I wish with any coordinates) and this just destroys the day. If there's something more You need to know, please ask as I'd really like to get this going.

 D3D11_MAPPED_SUBRESOURCE mapped_subresource; if( FAILED( device_context->Map( m_instance_buf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource ) ) ) return false; Model_Instance_Type* instance_data = static_cast<Model_Instance_Type*>( mapped_subresource.pData ); instance_data[index].pos = XMFLOAT3( posX, posY, posZ ); instance_data[index].rot = XMFLOAT3( rotX, rotY, rotZ ); instance_data[index].scale = XMFLOAT3( scaleX, scaleY, scaleZ ); device_context->Unmap( m_instance_buf, 0 ); 

Could you post all of the code for this part? You have an index which suggests you're using a loop, but there's no loop here.

That's not really a loop index, but rather instance index in the vector, that contains them (raw data, not the instance buffer). Here is the whole function:

 bool Model::UpdateInstance( const int& index, const float& posX, const float& posY, const float& posZ, const float& rotX, const float& rotY, const float& rotZ, const float& scaleX, const float& scaleY, const float& scaleZ ) { m_model_instance_list[index]->pos = XMFLOAT3( posX, posY, posZ ); m_model_instance_list[index]->rot = XMFLOAT3( rotX, rotY, rotZ ); m_model_instance_list[index]->scale = XMFLOAT3( scaleX, scaleY, scaleZ ); D3D11_MAPPED_SUBRESOURCE mapped_subresource; if( FAILED( device_context->Map( m_instance_buf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped_subresource ) ) ) return false; Model_Instance_Type* instance_data = static_cast<Model_Instance_Type*>( mapped_subresource.pData ); instance_data[index].pos = XMFLOAT3( posX, posY, posZ ); instance_data[index].rot = XMFLOAT3( rotX, rotY, rotZ ); instance_data[index].scale = XMFLOAT3( scaleX, scaleY, scaleZ ); device_context->Unmap( m_instance_buf, 0 ); return true; } 

m_model_instance_list :
 std::vector<Model_Instance_Type*> m_model_instance_list;

The idea is to not update whole buffer, when I want to update one given instance. As the data is a table I assumed the [] operator should work fine. The data gets changed when I debug that part, and the the instance moves. What happens are the blinking and "copying itself" I described earlier.

EDIT: No idea if that helps, but I'm unsure of the memory alignments. While I did manage to get around the XMMATRIX requirements (storing them as XMFLOAT4X4 and using Store and Load functions), I might have an error when describing layout. How I understood it is, that the instance part of a layout does not need (nor have to) be aligned to the per vertex data - thus the 1st thing from instance vector - position, has 0 as a byte alignment. Also made InputSlot : 0 for per-vertex, 1 for per-instance, as there are 2 vertex buffers (haven't seen D3D11_BIND_INSTANCE_BUFFER flag, so I used vertex).

The idea is to not update whole buffer, when I want to update one given instance. As the data is a table I assumed the [] operator should work fine. The data gets changed when I debug that part, and the the instance moves. What happens are the blinking and "copying itself" I described earlier.

Ah, well that's your problem. When you map with D3D11_MAP_WRITE_DISCARD (which is what you should be doing for a dynamic VB) the entire contents of the vertex buffer are invalidated. So you can't just copy in the data for one instance at a time, you have to copy in the data for all instances.

The idea is to not update whole buffer, when I want to update one given instance. As the data is a table I assumed the [] operator should work fine. The data gets changed when I debug that part, and the the instance moves. What happens are the blinking and "copying itself" I described earlier.

Ah, well that's your problem. When you map with D3D11_MAP_WRITE_DISCARD (which is what you should be doing for a dynamic VB) the entire contents of the vertex buffer are invalidated. So you can't just copy in the data for one instance at a time, you have to copy in the data for all instances.
Aw, so bad of me, I've been reading the meaning of flags, just forgot that :/. Thanks for showing this to me. I'll try fixing this ASAP, but I've got 1 question then: What do you do if you have thousands of instances? Buffer can get quite big isn't there a way to not write whole buffer each frame, when just 1% of it changed?

EDIT: Worked perfectly. Thank you very much, been trying to solve it even trying to add padding values to the buffer structs (I've been thinking it could be reading "dirty" data, not the one I've assigned).

If you don't want to update the whole buffer, then you can't use DISCARD. For dynamic buffers, the driver will create multiple buffers behind the scenes and cycle through them whenever you update them so that you avoid any synchronization issues with the GPU (since you don't want to write to an area of memory while the GPU is accessing it). This fits in nicely with the semantics of DISCARD, since the the driver can cycle to the next buffer since the contents are undefined by the spec. If you don't want to update the entire buffer you can use NO_OVERWRITE, but when you do that you can only update a portion of the buffer that the GPU isn't currently using. This can work for adding new instances, but not for updating existing instances.

