Instancing with ID3DXMESH?

Started by
20 comments, last by Tispe 10 years, 8 months ago

Ah yes. SetVertexDeclaration is not per stream, but as a whole. So it is just a matter of matching the final declaration as the sum of all stream components?

Also, something tells me D3DXGetDeclVertexSize() has a redundant parameter. Why not just use sizeof(MYVERTEXSTRUCT)?

From the docs:

When calling SetStreamSource, the stride is normally required to be equal to the vertex size. However, there are times when you may want to draw multiple instances of the same or similar geometry (such as when using instancing to draw). For this case, use a zero stride to tell the runtime not to increment the vertex buffer offset (ie: use the same vertex data for all instances).

This leads me to believe that the parameter should be zero?

Advertisement

When calling SetStreamSource, the stride is normally required to be equal to the vertex size. However, there are times when you may want to draw multiple instances of the same or similar geometry (such as when using instancing to draw). For this case, use a zero stride to tell the runtime not to increment the vertex buffer offset (ie: use the same vertex data for all instances).

Not that I've tried, but that doesn't quite make sense to me, honestly. At least I can't think of a sensible use case.

The parameter is not redundant, read Ximsu's post again.

The stride does not need to match the declaration, but in this case (and if you actually have a struct) it should be sizeof, yes.

So it is just a matter of matching the final declaration as the sum of all stream components?


Yes, though I would rather use the word concatenation (of the arrays). You can actually write a helper function if you need to (just make sure you don't accidentally copy the D3DDECL_END() marker in the middle wink.png)

I still don't get why you need the first parameter if setting the second (stream) will return the vertex size of that stream.?

Or the other way around, how can the helper function D3DXGetDeclVertexSize() know anything about the stream without having access to the d3dDevice?

D3DDEV->SetStreamSource(0, _vb, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 0));

D3DDEV->SetStreamSource(1, InstanceBuffer, 0, D3DXGetDeclVertexSize(VBD_GEOMETRYPACKET, 1));

To me this looks like that Stream0 and Stream1 will output the same vertex format/size:

Stream0: Pos, Norm, UV, Matrix

Stream1: Pos, Norm, UV, Matrix

instead of:

Stream0: Pos, Norm, UV

Stream1: Matrix

EDIT:

I think I got it now :)

D3DXGetDeclVertexSize uses the index passed by the second argument, enters the declaration passed in the first argument, then returns the chunk size from there :)

You're edit is correct, the declaration has all the size information, but as one of the above posts mention, what I was using D3DXGetDeclVertexSize() for was not necessary, using either Mesh->GetNumBytesPerVertex(), or sizeof( /*you're vertex data structure*/ ) would work as well, the vertex declaration is only needed for SetDeclaration(). Also, thanks for the tips in the above posts.

I'm halfway of implementing this now, hope that I finish a working demo today.

I scratched the idea of instancing mesh subsets for now. Right now a certain pair of gloves for example is split into 4 different ID3DXMESHes instead of 1 mesh and 4 subsets. Then 4 vectors are created to instance each mesh. That way, when 10 characters use the same pair of gloves, they "signup" in the instancing vectors for those mehses they need.

For example, Player1 might need to draw all 4 subsets, so he will add an OBJECT_INSTANCE to each vector. Player2 might have very long sleves and thus parts of the gloves is hidden, he only needs to draw 2 meshes, and so he sings up only for those by adding OBJECT_INSTANCEs to only two of the instance vectors.

Just did a demo, I went from 400fps to 100fps using instancing with only one character present....

Is there alot of overhead using this method? I must be doing something wrong, because I have only 24 draw calls in 400fps which becomes 100fps using instancing on a GTX690....

Maybe it is because I am locking and copy data to the instancing buffer 24 times between beginScene() and endScene()? Should I just do that outside before I beginScene()? using 24 different buffers?

Can't say for sure on you're performance drop, but I do have a few recommendations. First I would try different method of locking/creating your instance buffer, see if you can generate any sort of performance gain there, hopefully you will be able to recover most of that 300fps drop. However, with only one character present in your demo, you are not going to see the benefit of instancing, as you won't have any more draw calls than without instancing, but with instancing you will need to lock your instance buffer, so another thing you can try is using several characters, to see if the instancing method has any benefit. I'm still working out this stuff in my implementation right now (actually multiple subsets right now), so if any one else has some suggestions for Tispe, I would find them handy as well!

Hmm. I moved some stuff around in code but the problem still exist. My fps starts at like 140, then it drops down over time to like 90fps, then when I move camera around it can go up and down between 80fps and 130fps...

How can I profile in PIX? do I just have to examine the timeline for a frame capture? I have a strong gut feeling my InstanceBuffer locks and memcopies are the culprits.

EDIT: In release mode I get stable 200fps. Idk if it should be higher with only 24 drawcalls and 7 texture changes.


	for(std::map<DWORD, std::shared_ptr<InstancedMesh> >::iterator it = Scene.InstancedMeshes.begin(); it != Scene.InstancedMeshes.end(); it++){
		for(size_t j = 0;j<(*it).second->MeshSubsets.size();j++){
			if((*it).second->MeshSubsets[j]->Members.size() == 0) continue;

			if(pCurrentTex != (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex){									//Prevent to set new texture if it already is set
				DeviceInterface.GetDevice()->SetTexture(0, (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex);
				pCurrentTex = (*it).second->MeshSubsets[j]->Mesh->spTexture->pTex;
			}
			
			LPDIRECT3DVERTEXBUFFER9 MeshVB;
			LPDIRECT3DINDEXBUFFER9 MeshIB;
			(*it).second->MeshSubsets[j]->Mesh->pMesh->GetVertexBuffer(&MeshVB);
			(*it).second->MeshSubsets[j]->Mesh->pMesh->GetIndexBuffer(&MeshIB);

			DeviceInterface.GetDevice()->SetIndices(MeshIB);

			//Copy isntance data over
			VOID* pVoid;
			Scene.InstanceBuffer->Lock(0, sizeof(InstanceMember) * (*it).second->MeshSubsets[j]->Members.size(), (void**)&pVoid, 0);
			memcpy(pVoid, &(*it).second->MeshSubsets[j]->Members[0], sizeof(InstanceMember) * (*it).second->MeshSubsets[j]->Members.size());
			Scene.InstanceBuffer->Unlock();

			// Set up the geometry data stream
			DeviceInterface.GetDevice()->SetStreamSourceFreq(0, (D3DSTREAMSOURCE_INDEXEDDATA | (*it).second->MeshSubsets[j]->Members.size()));
			DeviceInterface.GetDevice()->SetStreamSource(0, MeshVB, 0, D3DXGetDeclVertexSize(DeviceInterface.GetInstancedDeclaration(), 0 ));

			// Set up the instance data stream
			DeviceInterface.GetDevice()->SetStreamSourceFreq(1, (D3DSTREAMSOURCE_INSTANCEDATA | 1));
			DeviceInterface.GetDevice()->SetStreamSource(1, Scene.InstanceBuffer, 0, D3DXGetDeclVertexSize(DeviceInterface.GetInstancedDeclaration(), 1 ));

			DeviceInterface.GetDevice()->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 
				(*it).second->MeshSubsets[j]->Mesh->pMesh->GetNumVertices(), 0, 
				(*it).second->MeshSubsets[j]->Mesh->pMesh->GetNumFaces());
		}
	}

	DeviceInterface.GetDevice()->SetStreamSourceFreq(0,1);
	DeviceInterface.GetDevice()->SetStreamSourceFreq(1,1);



I scratched the idea of instancing mesh subsets for now. Right now a certain pair of gloves for example is split into 4 different ID3DXMESHes instead of 1 mesh and 4 subsets. Then 4 vectors are created to instance each mesh. That way, when 10 characters use the same pair of gloves, they "signup" in the instancing vectors for those mehses they need.

For example, Player1 might need to draw all 4 subsets, so he will add an OBJECT_INSTANCE to each vector. Player2 might have very long sleves and thus parts of the gloves is hidden, he only needs to draw 2 meshes, and so he sings up only for those by adding OBJECT_INSTANCEs to only two of the instance vectors.

this may be the more "d3d" way to go about it.

mesh subsets are primarily intended to group triangles based on material and texture, not for controlling how much of a mesh is drawn, although it would work perfectly fine.

they're intended for use on models that use multiple texture maps. so a hammer with a wood handle could be one mesh, and the handle and head would be separate subsets, with wood and steel textures respectively. modeling software tends to create these types of models.

BTW, did you use material and texture to set / control the subset numbers? or did you edit them in a modeler, etc? just curious....

splitting gloves into 4 meshes instead of one mesh with 4 subsets should run faster. the next step would be to lose the d3dxmesh as data structure, and switch to a struct of vb, ib, numverts, and numtris.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Just did some testing in release mode. First test I only draw one subset for each texture. Second test was to draw all subsets for all textures.

7 draw calls, 7 texture swaps, 7 instancebuffer updates, 180-230fps, 100 instances
7 draw calls, 7 texture swaps, 7 instancebuffer updates, 60-70fps, 1000 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 100-130fps, 100 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 70-90fps, 300 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 60-80fps, 500 instances
28 draw calls, 7 texture swaps, 28 instancebuffer updates, 30-50fps, 1000 instances
My character needs 28 draw calls to get all subsets drawn. I am pondering if I can merge them to 7 draw calls by placing meshes with the same texture in the same VB. Maybe a method such as skipping/scratching vertices in VS that does not need to draw the current mesh. That way I can reduce the instancebuffer updates from 28 to 7 aswell.
Something wierd happens when window loses focus, the fps drops and won't come back up. Like at launch it has 230 fps, but after going back and forth to another window it stays at 180.

This topic is closed to new replies.

Advertisement