# Very Frustrating hardware Instancing problem

## Recommended Posts

frankoguy    122
I've been programming an HLSL shader-driven game engine for the past two years. I've been working on this thing in total since 2002--mostly as a learning experience. But now, it's becoming much more than that. So basically, I have a lot of time and effort invested in the problem below: The problem I'm having has to do with a particle system I wrote. Unfortunately I cannot use the PointSprite rendering feature (where I can just render each vertex as a billboarded point-sprite) because of some customized vertex and pixel shader visual effects that need to be handled in the shaders: In other words, the effect is not generic enough to use with the generic pointsprite rendering features of Direct3D; billboarding is not the only effect I need to immitate, so pointsprites won't do. Basically, each one of these particles represents a missile fired from a space ship. The original particle system works great and does exactly what I want it to do under all circumstances: but the drawback is I'm executing a seperate DrawIndexedPrimitive(...) for each particle to be represented. Then I read about hardware instancing, and have since included some macros and preprocessor ifdefs to conditionally compile the code to take advantage of this feature, which will in turn allow me to use my original shader code and keep the original visual effects in place. Here are the important details of my hardware instancing technique: //different (important) class methods executed: //please try and excuse my class names: they are a part of my shader driven resource management system: //CDXHelper, CDXEffectInterface are my own class names and are not a part of the DirectX SDK. So the order in which arguments should be passed to my own version vs the order in which these arguments are actually passed to methods of ID3DXEffect and IDirect3DDevice9 methods are usually very different: just note what the arguments are:
virtual void CreateInstances()
{
WORD PositionIndex=0;
WORD DiffuseOffset=sizeof(float)*3;
WORD TexCoordIndex=DiffuseOffset+sizeof(DWORD);
D3DVERTEXELEMENTV InstanceData[] =
{
{m_GeometryStreamIndex,PositionIndex,D3DDECLTYPE_FLOAT3,D3DDECLMETHOD_DEFAULT,D3DDECLUSAGE_POSITION,0},
{m_GeometryStreamIndex,DiffuseOffset,D3DDECLTYPE_D3DCOLOR,D3DDECLMETHOD_DEFAULT,D3DDECLUSAGE_COLOR,0},
{m_GeometryStreamIndex,TexCoordIndex,D3DDECLTYPE_FLOAT2,D3DDECLMETHOD_DEFAULT,D3DDECLUSAGE_TEXCOORD,0},
{m_InstanceStreamIndex, 0,	D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 1},
D3DDECL_END()
};
m_pDeclInstance=CDXHelper::CreateVertexDeclaration(InstanceData);
m_SizeofInstance=D3DXGetDeclVertexSize(InstanceData,m_InstanceStreamIndex);
m_SizeofVertex=D3DXGetDeclVertexSize(InstanceData,m_GeometryStreamIndex);
CDXHelper::CreateVertexBuffer(m_MaxInstances*m_SizeofInstance,
D3DUSAGE_DYNAMIC|D3DUSAGE_WRITEONLY,
0,//empty FVF on purpose
D3DPOOL_DEFAULT,
&m_pVBInstances);
}
virtual inline void SetStatesForTechnique()
{
////since this is called per instance, this should remain
////totally empty
using namespace USER;
CMissile* pParticle=static_cast<CMissile*>(m_stpvEffect_pParams);
CDXEffectInterface::SetTexture("g_ColorMap0",pParticle->m_pTexCurrent);
CDXEffectInterface::SetTexture("g_ColorMap1",pParticle->m_pTexCurrent2);
{
LPD3DXMATRIX pmWorld=(LPD3DXMATRIX)CDXHelper::GetParams();
CDXHelper::SetVertexDeclaration(this->m_pDeclInstance);//11/14/06
CDXHelper::SetIndices(m_pIBGeometry);//11/14/06
LPD3DXMATRIX pmWorld=(LPD3DXMATRIX)CDXHelper::GetParams();
CDXHelper::AlignWorldToView(*pmWorld,pmWorld);
CDXEffectInterface::SetMatrix("g_mWorld",pmWorld);
CDXEffectInterface::SetMatrix("g_mView",pmWorld+1);
CDXEffectInterface::SetMatrix("g_mViewProj",pmWorld+4);
}
}
virtual void PrepareStreamsForTechnique()
{
//m_GeometryStreamIndex=8
//m_InstanceStreamIndex=9
CDXHelper::SetVertexDeclaration(this->m_pDeclInstance);//11/14/06
CDXHelper::SetStreamSource(m_pVBGeometry,m_SizeofVertex,m_GeometryStreamIndex,0);
GUARDERRORDXVOID(GetD3DDevice()->SetStreamSourceFreq(
m_GeometryStreamIndex,
D3DSTREAMSOURCE_INDEXEDDATA | m_NumInstancesToRender //vs_3_0
));
CDXHelper::SetStreamSource(m_pVBInstances,m_SizeofInstance,m_InstanceStreamIndex,0);
GUARDERRORDXVOID(GetD3DDevice()->SetStreamSourceFreq(
m_InstanceStreamIndex,
D3DSTREAMSOURCE_INSTANCEDATA|1ul
));
CDXHelper::SetIndices(m_pIBGeometry);
}
virtual void RenderForTechnique()
{
/***
The assumption I'm making here is this:
in The above method (PrepareStreamsForTechnique), my SetStreamSourceFreq(params) set up the below DrawIndexedPrimitive call
to draw the standard billboard:  (-1,1,0) (1,1,0),(1,-1,0), (-1,-1,0) "m_NumInstancesToRender" times) in a single call, only
having to specify the four vertices (of a single billboard), and the two primitives to draw (that make up that billboard) (D3DPT_TRIANGLELIST primitive type)
***/
CDXHelper::DrawIndexedPrimitive(m_NumVertices,(m_NumIndices/3));//out 11/12/06..consider leaving in...since I don't
}


//MY HLSL code below:
float4 VS_PSystemWeapon3(
in float4 PosIn : POSITION, //stream index8  (streamsourceFreq of D3DSTREAMSOURCE_INDEXEDDATA|NumParticlesToDraw)
inout float2 TexIn : TEXCOORD0, //stream index 8 (same)
in float4 WorldPosIn : TEXCOORD1, //w = scale, (uses stream index 9), StreamSourceFreq of D3DSTREAMSOURCE_INSTANCEDATA|1ul
out float4 ColorOut : TEXCOORD1
) : POSITION
{
//so basically, PosIn, and TexIn are a part of streamsource 8 (not specifying ColorIn, since colorIn in this shader isn't used)
//and the instance stream (9) is the float4(x,y,z,particle size) of each particle
//This is the entire HLSL vertex shader:
//MY PROBLEM IS THIS:
/***
After I specified all the correct hardware instancing commands listed above this shader,
The below commands do exactly what I think they will do, except for one VERY big problem:
The hardware instancing never occurs.
Given two instanced particles to draw (for example):
1st execution
PosIn: (-1,1,0,1)
WorldPosIn (particle #1 world position and scale)
2nd execution:
PosIn: (1,1,0,1)
WorldPosIn (particle #2 world position and scale)
3rd execution:
PosIn: (1,-1,0,1)
WorldPosIn (junk junk junk junk)

The above is behaving as if I never executed the d3ddevice-&gt;SetStreamSourceFreq(source,freq) commands
But in a debug trace, I verified that these commands were executed and were never reset to a freq of one
(verified via GetStreamSourceFreq)

The desired behavior:
1st execution:
PosIn: (-1,1,0,1)
WorldPosIn (particle #1 world position and scale)
2nd execution:
PosIn: (1,1,0,1)
WorldPosIn (particle #1 world position and scale)
3rd execution:
PosIn: (1,-1,0,1)
WorldPosIn (particle #1 world position and scale)
4th execution:
PosIn: (-1,-1,0,1)
WorldPosIn (particle #1 world position and scale)
(next instance)
1st execution:
PosIn: (-1,1,0,1)
WorldPosIn (particle #2 world position and scale)
2nd execution:
PosIn: (1,1,0,1)
WorldPosIn (particle #2 world position and scale)
3rd execution:
PosIn: (1,-1,0,1)
WorldPosIn (particle #2 world position and scale)
4th execution:
PosIn: (-1,-1,0,1)
WorldPosIn (particle #2 world position and scale)

My graphics card hardware (on a Qosmio laptop) is NVIDIA GeForce Go 6600.  This problem is eating away at me, since the DirectX sdk's hardware instancing sample program does work correctly.  This would only lead me to believe that somehow I'm setting some kind of renderstate parameter switch incorrectly.  I would appreciate any/all suggestions for solving this problem.  Thanks.
***/
//I'm listing all the shader code for total clarity:
//Begin important instancing issues:
float3 vPos={WorldPosIn.x,WorldPosIn.y,WorldPosIn.z};
g_mWorld._41=vPos.x;
g_mWorld._42=vPos.y;
g_mWorld._43=vPos.z;
//End important instancing issues
float4x4 mFinalTransform=mul(g_mWorld,g_mViewProj);
float3 vCamToPos=vPos-g_vCamPos;
float3 vLookAt={g_mView._13,g_mView._23,g_mView._33};

float fValue=length(vCamToPos);

3.9269908169872415480783042290994f,
2.3561944901923449288469825374596f,
5.4977871437821381673096259207391f,
0.78539816339744830961566084581988f};
//g_BBIndex%=4;
float2 temp=PosIn;
temp.x=(temp.x+1.0f)*0.5f;
temp.y=(temp.y+1.0f)*0.5f;

//ColorOut=clamp(fValue,0.0f,1.0f);//out 11/10/05
ColorOut=(float4)0;
ColorOut.y=1.0f-saturate(fValue/5000.0f);//from 5000.0f to 20000.0f
return mul(PosIn2,mFinalTransform);
}

//just for clarity: (works fine)
float4 PS_PSystemWeapon2(
in float2 TexIn : TEXCOORD0,//TEXCOORD0,
in float4 ColorIn : TEXCOORD1//TEXCOORD1
) : COLOR
{
//return tex2D(PSystemWeapon_Sampler,TexIn);//old way
float4 Color2=tex2D(PSystemWeapon_Sampler,TexIn);
float4 Color=tex2D(PSystemWeapon_Sampler2,TexIn);
#if(!defined(USE_DIFFUSE_WEAPON_EFFECT))
return lerp(Color,Color2,ColorIn.y);//out 11/10/05
#else
float4 Color3=lerp(Color,Color2,ColorIn.y);
float4 Color4=Color3;
//Color4+=Color3*ColorIn.x*0.8f;
Color4+=Color3*ColorIn.x;
return Color4;
#endif
#undef USE_DIFFUSE_WEAPON_EFFECT
}
technique Tech_PSystemWeapon_Optimized
{
//my renderstates which work great in my previous non-instancing technique (not listed here by me)
pass
{
ZEnable=true;
ZWriteEnable=true;
AlphaBlendEnable=true;
AlphaTestEnable=true;
AlphaFunc=greaterequal;
AlphaRef=0x55;
SrcBlend=SrcAlpha;
DestBlend=InvSrcAlpha;
//below: put back in if necessary:
//ColorOp[0]=SelectArg1;
//ColorArg1[0]=Texture;
//AlphaOp[0]=SelectArg1;
//AlphaArg1[0]=Texture;
cullmode=none;//make ccw later
VertexShader=compile vs_3_0 VS_PSystemWeapon3(); //11/12/06..for CPSystemWeapon_RenderOptimizer implementation vs_3_0+ is needed
}
}


The problem as a whole is that no particles are drawn and no crash occurs: In fact, only four vertices are actually processed, rather than 4 vertices times NumInstances (four vertices per world particle position--inside instance stream--is what I want). Thanks for your time and suggestions: I really appreciate them. [Edited by - frankoguy on November 29, 2006 10:39:31 PM]

##### Share on other sites
A particle system doesn't seem like a good candidate for instancing, unless you were actually using complex meshes for each particle. If you're doing a billboard particle system, you're probably going to be better off just using a vertex buffer with discard locking and writing the quads into it each frame.

Also, you should read up on the forum markup tags available so your code is actually readable.

##### Share on other sites
frankoguy    122

I thought about this option as well: why not just write out the quads into a vertex stream along with the world position info (inside each quad vertex--many other similar ideas exist, of course), instead of using instancing?

Mainly because I just can't get hardware instancing to work at all, and this bugs me: By taking advantage of instancing, I could reduce the number of DrawPrimitive calls to a very small amount (from simple to complex representations), by applying this technique (assuming I can get it to work) to my other particle systems. Since at any given point in time, many different kinds of particles from different particle systems of mine are constantly in play: some of them have complex representations (meshes), and some do not (billboard).

Is there anyone else who could enable hardware instancing (in the way I described in my previous explanation), without obtaining any kind of Direct3D error, and still have the HLSL shader behave incorrectly: one quad processed at the same frequency/(rate of initialization) from my "instance" vertex buffer?

Again, it's as if I'm not even making the call to SetStreamSourceFreq() at all, when in fact, GetStreamSourceFreq() verified the new frequency values at several different points in the rendering loop. And, according to the device caps, vs_3_0/ps_3_0 is supported, and the technique passed validation.

So, in short, my source code doesn't complain at all, but my HLSL input vertex elements are behaving as if all the calls to SetStreamSourceFreq() don't work: as if they were never called: and most importantly (since I don't know the REAL solution to this problem--there are always other alternatives to writing a particle system in this manner), this would occur on all my hardware instancing attempts in the future too. And this alarms me, because if this technique works for other people but not for me, my rendering code isn't as efficient.

Basically, I still don't have a real answer to the problem.

Help! :(

##### Share on other sites
Well, I'll be honest: Without the source code being marked up properly so it's legible, I'm not going to read it. If you edit your original post to add the tags, or add a new post with the tags, then I might, in which case there's a much higher likelyhood of me (or someone else) seeing anything wrong with it.

And yes, I can understand that you would want to get the technique right so you can make use of it elsewhere. I do know that, at least on my hardware (7800), instancing does work - I've got a cube of 512 animated characters rendering using it.

##### Share on other sites
frankoguy    122
Jason,

I re-edited my previous post with the code listing to include the use of the indicated tags to wrap my C++ and HLSL source code in.

I believe the code looks much more readable now. If not, please let me know.

Thanks.

##### Share on other sites
Yes, much more legible now. I did look through it, but unfortunately, nothing jumped out at me.

You might want to try something more straightforward, like rendering a mesh with different colors or the like.

## Create an account

Register a new account