Sign in to follow this  
Ximsu

Instancing with ID3DXMESH?

Recommended Posts

First things first, hello GameDev.net! Long time browser here, love the site, finally registered when I had a question that I couldn't find the answer to while browsing the rest of the internet, so here goes!

I've been working on instancing with DirectX 9 using the following article http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter03.html (3.3.4 Batching with the Geometry Instancing API), however up to this point I've been using ID3DXMESH to load and render my meshes from a .x file. Because of that I never actually will call SetStreamSource() with the stream number at 0.

I tried setting all the others like so

 

D3DDEV->SetStreamSourceFreq( 0, D3DSTREAMSOURCE_INDEXEDDATA | GetNumInstances() );
D3DDEV->SetStreamSourceFreq( 1, D3DSTREAMSOURCE_INSTANCEDATA | 1 );
D3DDEV->SetStreamSource( 1, InstanceBuffer, 0, sizeof(OBJECT_INSTANCE) );

and then calling DrawSubset on my mesh, to see if it would work, but so far I haven't been too succesfull.

 

So, does anyone have any ideas on what I should try, do I need to use the get functions of my mesh to render it manually? I still need to debug other parts of the code to see if they're at fault, but any tips in the right direction would be appreciated!

 

 

 

Share this post


Link to post
Share on other sites

based on what they're doing in the article. looks like you'll want to use the d3dxmesh api to get a pointer to the mesh's vb and ib. that gets your VB and IB for your "geometry packet" as they refer to it. and then use draw indexed primitive once you setup your buffers, source stream, and frequency.

Share this post


Link to post
Share on other sites

Thanks for the replies! Taking the responses into use, and some meddling with my own, one of the biggest things I was missing was making sure the format of my Mesh matched what I needed to pass to, so the big things I needed was just a call to CloneMesh after loaded it. Also, for anyone running into this topic with a similar issue, the first reply to this topic http://www.gamedev.net/topic/591634-directx-9-hardware-instancing/ has most of the things I needed.

Share this post


Link to post
Share on other sites

Could you post your source code and HLSL related to this instancing?

 

I want to try the same thing you are. I have many objects that use the same ID3DXMESH, but sometimes different subsets of it. This could be the "batching" solution I am looking for.

 

Is your InstanceBuffer VB dynamic, do you reuse it, or do you create one each frame? What pool is it in?

Share this post


Link to post
Share on other sites

Not sure mine is the ideal approach, but I'll post what I have. I'm using one dynamic vertex buffer, at a size equal to a const UINT MaxInstances.

 

First here are my vertex declarations

//used to clone the mesh
const D3DVERTEXELEMENT9 VBD_GEOMETRYDATA[] =
{
	{0, 0,  D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0},
	{0, 12,  D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL,   0},
	{0, 24,  D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0},
	D3DDECL_END()
};

//this declaration is actually used for rendering
extern LPDIRECT3DVERTEXDECLARATION9 DECL_GEOMETRYPACKET;
const D3DVERTEXELEMENT9 VBD_GEOMETRYPACKET[] = 
{
	{0, 0,  D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0},
	{0, 12,  D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL,   0},
	{0, 24,  D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0},
	{1, 0,  D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 1},
	{1, 16, D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 2},
	{1, 32, D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 3},
	{1, 48, D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 4},
	D3DDECL_END()
};

//class describing a single element of the instance buffer
struct OBJECT_INSTANCE
{
	//the four rows of the world matrix of an instance
	float Row1[4];
	float Row2[4];
	float Row3[4];
	float Row4[4];
};

Here the function to create the instance buffer (should only be called once, but outside of performance wont break anything if its called multiple times)

??

void CreateInstanceBuffer()
{
        //checks if the buffer already exists and releases it if it does
	ReleaseInstanceBuffer();

        BufferSize = MaxInstances*sizeof(OBJECT_INSTANCE);
	D3DDEV->CreateVertexBuffer(BufferSize, D3DUSAGE_DYNAMIC, NULL, D3DPOOL_DEFAULT, &InstanceBuffer, NULL);
	D3DDEV->CreateVertexDeclaration(VBD_GEOMETRYPACKET, &DECL_GEOMETRYPACKET);
}

Then after loaded the mesh, clone it with your geometry declaration

//when you load the tempmesh is the mesh you wish to instance
D3DXMESH newmesh;
tempmesh->CloneMesh(D3DXMESH_MANAGED, VBD_GEOMETRYDATA, D3DDEV, &_newmesh)

//store the vertex and index buffers once
newmesh->GetVertexBuffer(&MeshVB);
newmesh->GetIndexBuffer(&MeshIB);

//if you no longer need the old mesh release it and replace it with the new once
tempmesh->Release();
tempmesh = newmesh;

Here is the code I use to lock the instance buffer, note that AddIndex keeps track of where to add the next instance (and hence will also keep track of how many instance you have currently added to the buffer) when I'm done rendering one mesh I just set AddIndex to 0, and it overwrites whatever data is already there the next time I start adding instances (also note that the instance buffer should be locked and the data stored in OBJECT_INSTANCE *LockedBuffer)

bool AddInstanceToLockedBuffer(D3DXMATRIX World)
{ 
	//the buffer is not locked or you have reached the max instances
	if (!BufferLocked) return false;
	if (AddIndex >= MaxInstances) return false;

	D3DXMATRIX InverseWorld;
	D3DXMatrixInverse(&InverseWorld, 0, World);

        /*----------------------------------------------
        fill LockedBuffer[AddIndex] with World
        ----------------------------------------------*/

	//increment the instance buffers current index
	AddIndex++;

	return true;
}

UINT GetNumInstances()
{
	return AddIndex;
}

void ResetInstanceBuffer()
{
	AddIndex = 0;
}

Lastly is rendering

//set the instancing parameters
D3DDEV->SetVertexDeclaration(DECL_GEOMETRYPACKET);
D3DDEV->SetIndices(MeshIB);

D3DDEV->SetStreamSourceFreq( 0, D3DSTREAMSOURCE_INDEXEDDATA | GetNumInstances());
D3DDEV->SetStreamSource(0, _vb, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 0));

D3DDEV->SetStreamSourceFreq( 1, D3DSTREAMSOURCE_INSTANCEDATA | 1 );
D3DDEV->SetStreamSource(1, InstanceBuffer, 0, D3DXGetDeclVertexSize
                               (VBD_GEOMETRYPACKET, 1));

//set all the properties of your effect

//render the mesh
UINT numpasses;
Effect->Begin(&numpasses, 0);

for (int i=0; i<numpasses; i++)
{
        Effect->BeginPass(i);

	D3DDEV->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, _Mesh->GetNumVertices(),
                                      0, Mesh->GetNumFaces());

	Effect->EndPass();
}

Effect->End();

//reset instancing parameters
D3DDEV->SetStreamSourceFreq(0, 1);
D3DDEV->SetStreamSourceFreq(1, 1);

Pretty long, but hope that helps someone out. Also if anyone has any suggestions that would be cool too!

 

Edit: Probably noticed I haven't gotten around to rendering multiple subsets of the mesh yet, I will be doing that next. One of the links posted in one of the above replies has some stuff on that though.

Edited by Ximsu

Share this post


Link to post
Share on other sites

Thanks.

 

 

 


D3DDEV->CreateVertexBuffer(BufferSize, D3DUSAGE_DYNAMIC, NULL, D3DPOOL_DEFAULT, &InstanceBuffer, NULL);

 

You can probably add the D3DUSAGE_WRITEONLY flag next to dynamic.

 

What about using an std::vector<OBJECT_INSTANCE> InstancesArray and add all instances to it, then memcpy &InstancesArray[0] to InstanceBuffer with InstancesArray.size()?

Edited by Tispe

Share this post


Link to post
Share on other sites

I wanted to post this in my previous post but at the time I was not quite sure if it was relevant. Now after looking at some code at msdn I want to ask some questions.

 


D3DDEV->SetStreamSourceFreq( 0, D3DSTREAMSOURCE_INDEXEDDATA | GetNumInstances());
D3DDEV->SetStreamSource(0, _vb, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 0));

D3DDEV->SetStreamSourceFreq( 1, D3DSTREAMSOURCE_INSTANCEDATA | 1 );
D3DDEV->SetStreamSource(1, InstanceBuffer, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 1));

 

Why are you using VBD_GEOMETRYPACKET for both Stream sources? The example from http://msdn.microsoft.com/en-us/library/windows/desktop/bb173349(v=vs.85).aspx has two declarations, one for vertex data and one for instance data:

// Set up the geometry data stream
pd3dDevice->SetStreamSourceFreq(0, (D3DSTREAMSOURCE_INDEXEDDATA | g_numInstancesToDraw));
pd3dDevice->SetStreamSource(0, g_VB_Geometry, 0, D3DXGetDeclVertexSize( g_VBDecl_Geometry, 0 ));

// Set up the instance data stream
pd3dDevice->SetStreamSourceFreq(1, (D3DSTREAMSOURCE_INSTANCEDATA | 1));
pd3dDevice->SetStreamSource(1, g_VB_InstanceData, 0, D3DXGetDeclVertexSize( g_VBDecl_InstanceData, 1 ));

The vertex data and instance data would then automatically merge to form the input for the vertex shader:

struct vsInput
{
  // stream 0
  float4 position : POSITION;
  float3 normal   : NORMAL;

  // stream 1
  float4 model_matrix0 : TEXCOORD0;
  float4 model_matrix1 : TEXCOORD1;
  float4 model_matrix2 : TEXCOORD2;
  float4 model_matrix3 : TEXCOORD3;
  float4 instance_color: D3DCOLOR;
};

Right?

Share this post


Link to post
Share on other sites

There's not reason in particular to use one or two streams, VBD_GEOMETRYPACKET (which contains both geometry and instance data) is needed for D3DDEV->SetDecleration() msdn had one more vertex declaration for instances that I don't.

 

If you look at the two calls to this function

D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 0)

D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 1)

I have 0 set for the second parameter in the first call and 1 in the second call, this just refers to the stream, the function will return the vertex size of that stream, rather than the entire declaration, so in essence its functioning as two different declarations merged into one. The only reason I have the vertex declaration for the mesh is for cloning it, if you didn't need to do that, you would only technically need the one vertex declaration.

 

Regardless its a matter of preference, if you want another declaration for instancing like msdn it won't make any difference, though may be useful for you elsewhere in your code.

Share this post


Link to post
Share on other sites


The vertex data and instance data would then automatically merge to form the input for the vertex shader:

 

Not automatically, no. Ximsu got it right  and the MSDN page is misleading (and there's also a community contribution at the bottom pointing it out): You can only set [i]one[/i] declaration at a time.

Share this post


Link to post
Share on other sites

Ah yes. SetVertexDeclaration is not per stream, but as a whole. So it is just a matter of matching the final declaration as the sum of all stream components?

 

Also, something tells me D3DXGetDeclVertexSize() has a redundant parameter. Why not just use sizeof(MYVERTEXSTRUCT)?

 

From the docs:

When calling SetStreamSource, the stride is normally required to be equal to the vertex size. However, there are times when you may want to draw multiple instances of the same or similar geometry (such as when using instancing to draw). For this case, use a zero stride to tell the runtime not to increment the vertex buffer offset (ie: use the same vertex data for all instances).

 

This leads me to believe that the parameter should be zero?

Edited by Tispe

Share this post


Link to post
Share on other sites

When calling SetStreamSource, the stride is normally required to be equal to the vertex size. However, there are times when you may want to draw multiple instances of the same or similar geometry (such as when using instancing to draw). For this case, use a zero stride to tell the runtime not to increment the vertex buffer offset (ie: use the same vertex data for all instances).

Not that I've tried, but that doesn't quite make sense to me, honestly. At least I can't think of a sensible use case.

The parameter is not redundant, read Ximsu's post again.

The stride does not need to match the declaration, but in this case (and if you actually have a struct) it should be sizeof, yes.

So it is just a matter of matching the final declaration as the sum of all stream components?


Yes, though I would rather use the word concatenation (of the arrays). You can actually write a helper function if you need to (just make sure you don't accidentally copy the D3DDECL_END() marker in the middle wink.png)

Share this post


Link to post
Share on other sites

I still don't get why you need the first parameter if setting the second (stream) will return the vertex size of that stream.?

 

Or the other way around, how can the helper function D3DXGetDeclVertexSize() know anything about the stream without having access to the d3dDevice?

 

 

D3DDEV->SetStreamSource(0, _vb, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 0));

D3DDEV->SetStreamSource(1, InstanceBuffer, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 1));

 

To me this looks like that Stream0 and Stream1 will output the same vertex format/size:

Stream0: Pos, Norm, UV, Matrix

Stream1: Pos, Norm, UV, Matrix

 

instead of:

Stream0: Pos, Norm, UV

Stream1: Matrix

 

EDIT:

I think I got it now :)

D3DXGetDeclVertexSize uses the index passed by the second argument, enters the declaration passed in the first argument, then returns the chunk size from there :)

Edited by Tispe

Share this post


Link to post
Share on other sites

You're edit is correct, the declaration has all the size information, but as one of the above posts mention, what I was using D3DXGetDeclVertexSize() for was not necessary, using either Mesh->GetNumBytesPerVertex(), or sizeof( /*you're vertex data structure*/ ) would work as well, the vertex declaration is only needed for SetDeclaration(). Also, thanks for the tips in the above posts.

Edited by Ximsu

Share this post


Link to post
Share on other sites

I'm halfway of implementing this now, hope that I finish a working demo today.

 

I scratched the idea of instancing mesh subsets for now. Right now a certain pair of gloves for example is split into 4 different ID3DXMESHes instead of 1 mesh and 4 subsets. Then 4 vectors are created to instance each mesh. That way, when 10 characters use the same pair of gloves, they "signup" in the instancing vectors for those mehses they need.

 

For example, Player1 might need to draw all 4 subsets, so he will add an OBJECT_INSTANCE to each vector. Player2 might have very long sleves and thus parts of the gloves is hidden, he only needs to draw 2 meshes, and so he sings up only for those by adding OBJECT_INSTANCEs to only two of the instance vectors.

Share this post


Link to post
Share on other sites

Just did a demo, I went from 400fps to 100fps using instancing with only one character present....

 

Is there alot of overhead using this method? I must be doing something wrong, because I have only 24 draw calls in 400fps which becomes 100fps using instancing on a GTX690....

 

Maybe it is because I am locking and copy data to the instancing buffer 24 times between beginScene() and endScene()? Should I just do that outside before I beginScene()? using 24 different buffers?

Share this post


Link to post
Share on other sites

Can't say for sure on you're performance drop, but I do have a few recommendations. First I would try different method of locking/creating your instance buffer, see if you can generate any sort of performance gain there, hopefully you will be able to recover most of that 300fps drop. However, with only one character present in your demo, you are not going to see the benefit of instancing, as you won't have any more draw calls than without instancing, but with instancing you will need to lock your instance buffer, so another thing you can try is using several characters, to see if the instancing method has any benefit. I'm still working out this stuff in my implementation right now (actually multiple subsets right now), so if any one else has some suggestions for Tispe, I would find them handy as well!

Share this post


Link to post
Share on other sites

Hmm. I moved some stuff around in code but the problem still exist. My fps starts at like 140, then it drops down over time to like 90fps, then when I move camera around it can go up and down between 80fps and 130fps...

 

How can I profile in PIX? do I just have to examine the timeline for a frame capture? I have a strong gut feeling my InstanceBuffer locks and memcopies are the culprits.

 

EDIT: In release mode I get stable 200fps. Idk if it should be higher with only 24 drawcalls and 7 texture changes.

	for(std::map<DWORD, std::shared_ptr<InstancedMesh> >::iterator it = Scene.InstancedMeshes.begin(); it != Scene.InstancedMeshes.end(); it++){
		for(size_t j = 0;j<(*it).second->MeshSubsets.size();j++){
			if((*it).second->MeshSubsets[j]->Members.size() == 0) continue;

			if(pCurrentTex != (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex){									//Prevent to set new texture if it already is set
				DeviceInterface.GetDevice()->SetTexture(0, (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex);
				pCurrentTex = (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex;
			}
			
			LPDIRECT3DVERTEXBUFFER9 MeshVB;
			LPDIRECT3DINDEXBUFFER9 MeshIB;
			(*it).second->MeshSubsets[j]->Mesh->pMesh->GetVertexBuffer(&MeshVB);
			(*it).second->MeshSubsets[j]->Mesh->pMesh->GetIndexBuffer(&MeshIB);

			DeviceInterface.GetDevice()->SetIndices(MeshIB);

			//Copy isntance data over
			VOID* pVoid;
			Scene.InstanceBuffer->Lock(0, sizeof(InstanceMember) * (*it).second->MeshSubsets[j]->Members.size(), (void**)&pVoid, 0);
			memcpy(pVoid, &(*it).second->MeshSubsets[j]->Members[0], sizeof(InstanceMember) * (*it).second->MeshSubsets[j]->Members.size());
			Scene.InstanceBuffer->Unlock();

			// Set up the geometry data stream
			DeviceInterface.GetDevice()->SetStreamSourceFreq(0, (D3DSTREAMSOURCE_INDEXEDDATA | (*it).second->MeshSubsets[j]->Members.size()));
			DeviceInterface.GetDevice()->SetStreamSource(0, MeshVB, 0, D3DXGetDeclVertexSize(DeviceInterface.GetInstancedDeclaration(), 0 ));

			// Set up the instance data stream
			DeviceInterface.GetDevice()->SetStreamSourceFreq(1, (D3DSTREAMSOURCE_INSTANCEDATA | 1));
			DeviceInterface.GetDevice()->SetStreamSource(1, Scene.InstanceBuffer, 0, D3DXGetDeclVertexSize(DeviceInterface.GetInstancedDeclaration(), 1 ));

			DeviceInterface.GetDevice()->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 
				(*it).second->MeshSubsets[j]->Mesh->pMesh->GetNumVertices(), 0, 
				(*it).second->MeshSubsets[j]->Mesh->pMesh->GetNumFaces());
		}
	}

	DeviceInterface.GetDevice()->SetStreamSourceFreq(0,1);
	DeviceInterface.GetDevice()->SetStreamSourceFreq(1,1);
Edited by Tispe

Share this post


Link to post
Share on other sites



I scratched the idea of instancing mesh subsets for now. Right now a certain pair of gloves for example is split into 4 different ID3DXMESHes instead of 1 mesh and 4 subsets. Then 4 vectors are created to instance each mesh. That way, when 10 characters use the same pair of gloves, they "signup" in the instancing vectors for those mehses they need.
 
For example, Player1 might need to draw all 4 subsets, so he will add an OBJECT_INSTANCE to each vector. Player2 might have very long sleves and thus parts of the gloves is hidden, he only needs to draw 2 meshes, and so he sings up only for those by adding OBJECT_INSTANCEs to only two of the instance vectors.

 

 

this may be the more "d3d" way to go about it.

 

mesh subsets are primarily intended to group triangles based on material and texture, not for controlling how much of a mesh is drawn, although it would work perfectly fine.

 

they're intended for use on models that use multiple texture maps. so a hammer with a wood handle could be one mesh, and the handle and head would be separate subsets, with wood and steel textures respectively. modeling software tends to create these types of models.

 

BTW, did you use material and texture to set / control the subset numbers? or did you edit them in a modeler, etc?   just curious....

 

splitting gloves into 4 meshes instead of one mesh with 4 subsets should run faster. the next step would be to lose the d3dxmesh as data structure, and switch to a struct of vb, ib, numverts, and numtris.

Share this post


Link to post
Share on other sites

Just did some testing in release mode. First test I only draw one subset for each texture. Second test was to draw all subsets for all textures.

 

7 draw calls, 7 texture swaps, 7 instancebuffer updates, 180-230fps, 100 instances
7 draw calls, 7 texture swaps, 7 instancebuffer updates, 60-70fps, 1000 instances
 
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 100-130fps, 100 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 70-90fps, 300 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 60-80fps, 500 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 30-50fps, 1000 instances
 
My character needs 28 draw calls to get all subsets drawn. I am pondering if I can merge them to 7 draw calls by placing meshes with the same texture in the same VB. Maybe a method such as skipping/scratching vertices in VS that does not need to draw the current mesh. That way I can reduce the instancebuffer updates from 28 to 7 aswell.
 
Something wierd happens when window loses focus, the fps drops and won't come back up. Like at launch it has 230 fps, but after going back and forth to another window it stays at 180.
Edited by Tispe

Share this post


Link to post
Share on other sites

My EVGA Precision X tells me that my application is only using 45% of GPU1 and 10% of GPU2 with 37fps, 1000 instances and 28 state changes/draw calls and buffer updates.

 

Using NVIDIAS DX10 SkinnedInstancing demo, I am able to get the GPU1 usage to 99% by adjusting LOD to max distance. That is 10,000 instances and 163 draw calls. This gives me about 18fps on the demo. GPU2 usage is at about 10%. Turning instances down to 1,000 I get 60 fps on 64 draw calls. I am trying to force adaptive vsynch but it sticks to either 120hz or 60hz or lower.

 

So the DX10 demo has 60+ fps with 1000 instances and 64 draw calls with 60% GPU1 usage and 10% GPU2 usage. Where I only get 37fps using DX9 and 45% GPU1 usage.

 

I hope I can achieve 60+ fps on my application by perhaps creating multiple instance buffers instead of sharing just one. That way I don't have to lock and copy so much during rendering.

 

How I can scale to use more of GPU2 is something I am clueless on, if anyone has ideas please let me know.

Edited by Tispe

Share this post


Link to post
Share on other sites

Good news everyone! I am now running 1000 Skinned Instanced Meshes on DX9 hardware at 120fps! 

 

I managed to recapture most of my fps!!! fps goes 350+ with 10 instances.

 

Instead of sharing one instance buffer for all meshes which needed to be updated for each draw call during rendering, I created a Pool of vertex buffers. Each mesh has its own instance buffer which is updated before rendering.

 

The buffer pool:

	for(int i=0;i<1024;i++){				//68*1024*1024 = 68MB data for all buffers
		IDirect3DVertexBuffer9 *pTempBuf;
		hr = d3ddev->CreateVertexBuffer(BufferSize, D3DUSAGE_DYNAMIC, NULL, D3DPOOL_DEFAULT, &pTempBuf, NULL);

		if( hr!=D3D_OK ){
			MessageBox(NULL, L"Failed make Instance Buffer", L"Error", MB_OK);
			return false;
		} else {
			std::shared_ptr<IDirect3DVertexBuffer9> spNewBuffer(pTempBuf, [](IDirect3DVertexBuffer9 *p) {p->Release();});
			InstBuffers.push_back(spNewBuffer);
		}
	}

Updating the buffers before rendering:

//Copy member arrays to instance buffers
	for(std::map<DWORD, std::shared_ptr<InstancedMesh> >::iterator it = Scene.InstancedMeshes.begin(); it != Scene.InstancedMeshes.end(); it++){
		for(size_t j = 0;j<(*it).second->MeshSubsets.size();j++){
			if((*it).second->MeshSubsets[j]->Members.size() == 0) continue;

			void* pVoid;
			(*it).second->MeshSubsets[j]->spInstBuffer->Lock(0, sizeof(InstanceMember) * (*it).second->MeshSubsets[j]->Members.size(), (void**)&pVoid, 0);
			memcpy(pVoid, &(*it).second->MeshSubsets[j]->Members[0], sizeof(InstanceMember) * (*it).second->MeshSubsets[j]->Members.size());
			(*it).second->MeshSubsets[j]->spInstBuffer->Unlock();
		}
	} 

The render routine:

	// Set up the instance data stream
	DeviceInterface.GetDevice()->SetStreamSourceFreq(1, (D3DSTREAMSOURCE_INSTANCEDATA | 1));

	LPDIRECT3DTEXTURE9 pCurrentTex = NULL;

	for(std::map<DWORD, std::shared_ptr<InstancedMesh> >::iterator it = Scene.InstancedMeshes.begin(); it != Scene.InstancedMeshes.end(); it++){
		for(size_t j = 0;j<(*it).second->MeshSubsets.size();j++){
			if((*it).second->MeshSubsets[j]->Members.size() == 0) continue;

			if(pCurrentTex != (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex){									//Prevent to set new texture if it already is set
				DeviceInterface.GetDevice()->SetTexture(0, (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex);
				pCurrentTex = (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex;
			}
			
			LPDIRECT3DVERTEXBUFFER9 MeshVB;
			LPDIRECT3DINDEXBUFFER9 MeshIB;
			(*it).second->MeshSubsets[j]->Mesh->pMesh->GetVertexBuffer(&MeshVB);
			(*it).second->MeshSubsets[j]->Mesh->pMesh->GetIndexBuffer(&MeshIB);

			DeviceInterface.GetDevice()->SetIndices(MeshIB);

			// Set up the instance data stream
			DeviceInterface.GetDevice()->SetStreamSource(1, (*it).second->MeshSubsets[j]->spInstBuffer.get(), 0, D3DXGetDeclVertexSize(DeviceInterface.GetInstancedDeclaration(), 1 ));

			// Set up the geometry data stream
			DeviceInterface.GetDevice()->SetStreamSourceFreq(0, (D3DSTREAMSOURCE_INDEXEDDATA | (*it).second->MeshSubsets[j]->Members.size()));
			DeviceInterface.GetDevice()->SetStreamSource(0, MeshVB, 0, D3DXGetDeclVertexSize(DeviceInterface.GetInstancedDeclaration(), 0 ));

			DeviceInterface.GetDevice()->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 
				(*it).second->MeshSubsets[j]->Mesh->pMesh->GetNumVertices(), 0, 
				(*it).second->MeshSubsets[j]->Mesh->pMesh->GetNumFaces());
		}
	}

	DeviceInterface.GetDevice()->SetStreamSourceFreq(0,1);
	DeviceInterface.GetDevice()->SetStreamSourceFreq(1,1); 

Remember to clear the references:

Scene.InstancedMeshes.clear();			//clear references to buffers, to not crash when resetting device elsewhere (InstBuffers.clear()) 

..

Edited by Tispe

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this