# An idea for renderstate managements

This topic is 4751 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I am quite new to the idea of state management, but I know that it is pretty important for keeping speed. I was wandering about what is the best of rendering while keeping state changes to a minimal. I have come up with a theory of state management that works on a stack system. Any comments are greatly apriciated! Basically, my engine consists of entities like meshs, etc. You can set certain properties on these entities - wireframe, fullbright, and so on. My idea was to create another thread and have a continuous loop on this thread, that orders the entities for rendering. The loop on this thread will find out all of the state changes that are used. Then it will create stacks to render the entities, depending on the states used. It's kind of hard to explain, so below is a diagram that explains what I mean. In this example, the renderer would push the first state change on, push the second state, and push the third. Render all these entities, pop back to the second state, render these entities, pop to the first and then render the remaining entities. The same would happen for the second stack in the diagram (illustrated as a pyramid-type thing). ------------------------------------------------------ So, does this sound like a good idea? The reordering would take place on a second thread so hopefully - speed wouldn't be lost toooo much. Thanks, aCiD2

##### Share on other sites
This seems very similar to what we used in Midtown Madness 3.
Basicly we packed states that was likely to change at the same time into 32-bit words, to scan for a change we first tested the words against eachother, if they differed we tested the individual states (we actually went further than that).
After all optimizations applied, the state manager was still one of the more costly processes in the game, at whoppingly 4% of the total CPU time on average (we had no single function showing over 2% of the CPU).

I.e.

Word0: aaaa.bbbb bbbb.ccdd eeee.eeee eeff.ffff (a = a state, b = state likely to change if a changed etc)
Word1: ggg... and so on.

if (Previous.Word0 != Current.Word0){
if (Previous.aa != Current.aa)
SetRenderState(Current.aa)

Previous.Word0 = Current.Word0
}

I've since then abandoned the idea, it's to much logic code involved.
I now have a system that has "States", a State contains all render states, texture states, sampler states, index buffer etc. Basicly it contain all the information that you can set in DirectX. It also have a flag (bit) for each item in the State.
The renderer contains a POINTER to the previous state, when a new State are to be applied, it's POINTER is sent.
Then using a map I check if the State change PREV-CUR already exists.
If it doesn't I generate the code (actual assembled machine code) that set the Items that has changed from PREV-CUR. It looks something like "push, push, push, call SetRenderState, push, push, call SetIndices, ret".
The code is then stored in the map, and executed for the state change.
An engine's items are quite often linearly rendered so the cache doesn't grow big at all. The flags I mentioned is for things that the user want's needs to set every time before a DP call (in my case I have a Material-language, that does this for me).

Here's my apply-state method:
voidRenderer::applyState(State* const state){//	Find state difference	StateCache::iterator it = m_stateCache.find(std::make_pair(m_prevState, state));//	If state difference is invalid	if ((it != m_stateCache.end()) && (!it->second->isValid())){ // I allow hot-loading of materials and states, need to recreate the diff if this happens. This test could be skipped by adding more code into the hot-loading mechanism	//	Remove difference from cache		it->second->release();		m_stateCache.erase(it);		it = m_stateCache.end();	}//	If no state difference is in the cache	if (it == m_stateCache.end()){	//	Create a new difference		StateDifference* diff = StateDifference::create(m_prevState, state); // Genereates the asm code		if (!diff){		//	If we couldn't create a difference, apply the whole state! SLOW AS HELL!			logWarning("Couldn't create state difference!");			state->apply();			m_prevState = state;			return;		}	//	Insert state difference to the cache		std::pair<StateCache::iterator, bool> res = m_stateCache.insert(std::make_pair(std::make_pair(m_prevState, state), diff));		if (!res.second){			diff->release();			logWarning("Couldn't add state difference to the cache!");			state->apply();			m_prevState = state;			return;		}		it = res.first;	}//	Apply state difference	it->second->apply(); // Basicly a call into the generated asm code	m_prevState = state;}

Here's a sample material

// Horizon map materialvariables		vertexBuffer,				vertexBufferStride,					vertexDecl				vertexWorldConstsmethod			"TheOnlyOne"pass			"One"//	ShaderVertexShader{	;c0 = ProjectionMatrix0	;c1 = ProjectionMatrix1	;c2 = ProjectionMatrix2	;c3 = ProjectionMatrix3	;c4 = Z-near, Z-Far, 1 / (zFar - zNear), 0.5	;c5 = CameraPosX, Y, Z, ?	;c6 = 2.0f, 1.0f, ?, ?	;c7 = 2.0 / DestWidth, -2.0 / DestHeight, -1, 1	vs.1.1	 	dcl_position	v0	; x, y, z	dcl_texcoord	v1  ; u, v	; Unpack position	mad		r0, v0.xyz, c6.x, -c6.y	; Project	mul		r4, r0.x, c0	mad		r4, r0.y, c1, r4	mad		oPos, r0.z, c2, r4	; Output texcoord	mov		oT0, v1}PixelShader   "Data\\Shader\\SingleTexture.psh"//	Shader dataVertexConsts	vertexWorldConsts//	Geometry VertexDecl		vertexDeclVertexBuffer	0, vertexBuffer, 0,	vertexBufferStrideCullMode		NoneZWriteEnable	falseTexture			0, "Data\\Texture\\Horizon.dds"AddressU		0, ClampAddressV		0, ClampMagFilter		0, LinearMinFilter		0 LinearMipFilter		0 LinearAlphaTestEnable trueAlphaRef		255AlphaFunc		Equal

and the code to use this

void HorizonMap::draw(void){	Variable vars[4];	vars[0] = m_vertices;	vars[1] = sizeof(Vertex);	vars[2] = m_decl;	vars[3] = getCommonVertexShaderConsts();	uint passCount = m_material->getPassCount();	uint pass;	for (pass = 0; pass < passCount; ++ pass){		if (m_material->apply(pass, vars))			m_material->engine()->getDevice()->DrawPrimitive(D3DPT_TRIANGLESTRIP, 0, m_triangleCount);	}}

##### Share on other sites
I didn't realise state changes were so expensive - I'd assumed that the time to check if a current state is already what you want is slower than just doing the SetRenderState call. Guess I'm wrong, but if you don't use a pure D3D device doesn't it check state chages before doing them anyway? Certainly the debug version will tell you about redundant changes.

The nicest way would just be to have an array od DWORD showing the current value for each state. Then you could just do:
if(states[D3DRS_ZFUNC]!=newValue) SetRenderState(...)
However there are obviously problems that make this impossible. But if for every D3D renderstate you use you create your own mapping to it e.g D3DRS_ZFUNC = MYRS_ZFUNC, then you can number your own states sequentially and just have that array and use the mapping. Is this acceptable because apart from typing in the (long) list of renderstate mappings it seems like a fast method to me...

##### Share on other sites
Well, in gl state changes are fast but in d3d9 they're not but will be in d3d10. To cure the frequent state changes you could batch items based on same texture or pack multiple textures into one but that's for old school stuff I think. I used to do some wild logic thingy to prevent redundant state changes but I dropped it. I set a lot of states because I multipass so much with each pass having different state settings.

##### Share on other sites
Quote:
 I didn't realise state changes were so expensive

It all depends on how many "materials" you have. In MM3 I think we ended up using around 70 materials.

The problem is that there are so many states. Consider setting all these states before calling a bunch of DP's. On the X-Box many state changes was just a move into a table i.e:

mov eax, 0x12345678
mov [ebx + 0x1234], eax

But still doing 300+ of theses moves 70 times each frames does kill the performance.

Render states:D3DRS_ZENABLE = 7,    D3DRS_FILLMODE = 8,    D3DRS_SHADEMODE = 9,    D3DRS_ZWRITEENABLE = 14,    D3DRS_ALPHATESTENABLE = 15,    D3DRS_LASTPIXEL = 16,    D3DRS_SRCBLEND = 19,    D3DRS_DESTBLEND = 20,    D3DRS_CULLMODE = 22,    D3DRS_ZFUNC = 23,    D3DRS_ALPHAREF = 24,    D3DRS_ALPHAFUNC = 25,    D3DRS_DITHERENABLE = 26,    D3DRS_ALPHABLENDENABLE = 27,    D3DRS_FOGENABLE = 28,    D3DRS_SPECULARENABLE = 29,    D3DRS_FOGCOLOR = 34,    D3DRS_FOGTABLEMODE = 35,    D3DRS_FOGSTART = 36,    D3DRS_FOGEND = 37,    D3DRS_FOGDENSITY = 38,    D3DRS_RANGEFOGENABLE = 48,    D3DRS_STENCILENABLE = 52,    D3DRS_STENCILFAIL = 53,    D3DRS_STENCILZFAIL = 54,    D3DRS_STENCILPASS = 55,    D3DRS_STENCILFUNC = 56,    D3DRS_STENCILREF = 57,    D3DRS_STENCILMASK = 58,    D3DRS_STENCILWRITEMASK = 59,    D3DRS_TEXTUREFACTOR = 60,    D3DRS_WRAP0 = 128,    D3DRS_WRAP1 = 129,    D3DRS_WRAP2 = 130,    D3DRS_WRAP3 = 131,    D3DRS_WRAP4 = 132,    D3DRS_WRAP5 = 133,    D3DRS_WRAP6 = 134,    D3DRS_WRAP7 = 135,    D3DRS_CLIPPING = 136,    D3DRS_LIGHTING = 137,    D3DRS_AMBIENT = 139,    D3DRS_FOGVERTEXMODE = 140,    D3DRS_COLORVERTEX = 141,    D3DRS_LOCALVIEWER = 142,    D3DRS_NORMALIZENORMALS = 143,    D3DRS_DIFFUSEMATERIALSOURCE = 145,    D3DRS_SPECULARMATERIALSOURCE = 146,    D3DRS_AMBIENTMATERIALSOURCE = 147,    D3DRS_EMISSIVEMATERIALSOURCE = 148,    D3DRS_VERTEXBLEND = 151,    D3DRS_CLIPPLANEENABLE = 152,    D3DRS_POINTSIZE = 154,    D3DRS_POINTSIZE_MIN = 155,    D3DRS_POINTSPRITEENABLE = 156,    D3DRS_POINTSCALEENABLE = 157,    D3DRS_POINTSCALE_A = 158,    D3DRS_POINTSCALE_B = 159,    D3DRS_POINTSCALE_C = 160,    D3DRS_MULTISAMPLEANTIALIAS = 161,    D3DRS_MULTISAMPLEMASK = 162,    D3DRS_PATCHEDGESTYLE = 163,    D3DRS_DEBUGMONITORTOKEN = 165,    D3DRS_POINTSIZE_MAX = 166,    D3DRS_INDEXEDVERTEXBLENDENABLE = 167,    D3DRS_COLORWRITEENABLE = 168,    D3DRS_TWEENFACTOR = 170,    D3DRS_BLENDOP = 171,    D3DRS_POSITIONDEGREE = 172,    D3DRS_NORMALDEGREE = 173,    D3DRS_SCISSORTESTENABLE = 174,    D3DRS_SLOPESCALEDEPTHBIAS = 175,    D3DRS_ANTIALIASEDLINEENABLE = 176,    D3DRS_MINTESSELLATIONLEVEL = 178,    D3DRS_MAXTESSELLATIONLEVEL = 179,    D3DRS_ADAPTIVETESS_X = 180,    D3DRS_ADAPTIVETESS_Y = 181,    D3DRS_ADAPTIVETESS_Z = 182,    D3DRS_ADAPTIVETESS_W = 183,    D3DRS_ENABLEADAPTIVETESSELLATION = 184,    D3DRS_TWOSIDEDSTENCILMODE = 185,    D3DRS_CCW_STENCILFAIL = 186,    D3DRS_CCW_STENCILZFAIL = 187,    D3DRS_CCW_STENCILPASS = 188,    D3DRS_CCW_STENCILFUNC = 189,    D3DRS_COLORWRITEENABLE1 = 190,    D3DRS_COLORWRITEENABLE2 = 191,    D3DRS_COLORWRITEENABLE3 = 192,    D3DRS_BLENDFACTOR = 193,    D3DRS_SRGBWRITEENABLE = 194,    D3DRS_DEPTHBIAS = 195,    D3DRS_WRAP8 = 198,    D3DRS_WRAP9 = 199,    D3DRS_WRAP10 = 200,    D3DRS_WRAP11 = 201,    D3DRS_WRAP12 = 202,    D3DRS_WRAP13 = 203,    D3DRS_WRAP14 = 204,    D3DRS_WRAP15 = 205,    D3DRS_SEPARATEALPHABLENDENABLE = 206,    D3DRS_SRCBLENDALPHA = 207,    D3DRS_DESTBLENDALPHA = 208,    D3DRS_BLENDOPALPHA = 209,Sampler states, atleast 8 of theese.    D3DSAMP_ADDRESSU = 1,    D3DSAMP_ADDRESSV = 2,    D3DSAMP_ADDRESSW = 3,    D3DSAMP_BORDERCOLOR = 4,    D3DSAMP_MAGFILTER = 5,    D3DSAMP_MINFILTER = 6,    D3DSAMP_MIPFILTER = 7,    D3DSAMP_MIPMAPLODBIAS = 8,    D3DSAMP_MAXMIPLEVEL = 9,    D3DSAMP_MAXANISOTROPY = 10,    D3DSAMP_SRGBTEXTURE = 11,    D3DSAMP_ELEMENTINDEX = 12,    D3DSAMP_DMAPOFFSET = 13,Texture stage states, atleast 8 of theese    D3DTSS_COLOROP = 1,    D3DTSS_COLORARG1 = 2,    D3DTSS_COLORARG2 = 3,    D3DTSS_ALPHAOP = 4,    D3DTSS_ALPHAARG1 = 5,    D3DTSS_ALPHAARG2 = 6,    D3DTSS_BUMPENVMAT00 = 7,    D3DTSS_BUMPENVMAT01 = 8,    D3DTSS_BUMPENVMAT10 = 9,    D3DTSS_BUMPENVMAT11 = 10,    D3DTSS_TEXCOORDINDEX = 11,    D3DTSS_BUMPENVLSCALE = 22,    D3DTSS_BUMPENVLOFFSET = 23,    D3DTSS_TEXTURETRANSFORMFLAGS = 24,    D3DTSS_COLORARG0 = 26,    D3DTSS_ALPHAARG0 = 27,    D3DTSS_RESULTARG = 28,    D3DTSS_CONSTANT = 32,Vertex streams, atleast 8.Shader constants.And so on

Ofcourse you can say things like, my engine is not allowed to change state D3DTSS_BUMPENVMAT01 so I can skip that one and so on.
But if someone suddenly uses one of the "banned" states, all hell breaks loose. All rendering will/can break and you need to add that state to the list of allowed states. All rendering code need to set that state to the desired value. A state manager removes that hassle.

I just declare a State (it's automatically setup to the default DirectX State), then change the values I need. When rendering I call applyState, set some of the dynamic states (with or without testing the previous state), then call DP.
If the code executed just before my rendering changes State XXX, the state manager detects that and during my rendering I got the desired state.

##### Share on other sites
Quote:
Original post by eq
Quote:
 I didn't realise state changes were so expensive

It all depends on how many "materials" you have. In MM3 I think we ended up using around 70 materials.

The problem is that there are so many states. Consider setting all these states before calling a bunch of DP's. On the X-Box many state changes was just a move into a table i.e:

mov eax, 0x12345678
mov [ebx + 0x1234], eax

But still doing 300+ of theses moves 70 times each frames does kill the performance.
I don't use materials at all, but I do change states quite a bit. I end up having to set ALL the states I might alter in each object's render to make sure some other state hasn't been changed to screw stuff up. It's not good because e.g if the terrain render uses a new state, every other object has to explicity make sure that state is now correct!

Doing 300+ 2-MOV ops each frame is absolutely undetectable. That's probably about the same as one call to sprintf()!

##### Share on other sites
Quote:
 Doing 300+ 2-MOV ops

That was on an X-box, i.e 733Mhz Pentium III.
On the PC a render state change is doing ALOT more. In your code it's pushing the arguments and do a call into D3D (normal STD-call). Inside D3D I *think* it's batching up the render states that you modifed and when the DP call is made, all states updates are sent to the driver.
In the game I refered to the CPU time spent setting render states (using essentially just a move) was around 15% without any state management. The 300+ states was probably an understatement. I recall that we did 300+ DP calls and I expect that we changed quite a few states in between (same material, but different texture, transforms etc). The reason for having such a huge number of different materials etc, was only because of the memory limitations (compressed alot od data differently to maximize memory usage). I expect a PC game to use less different mesh types and thus the number of state-switches is smaller.
If you don't need a state manager, fine, don't do it. In my current engine I probably don't need one either. I just got the idea on how to code a very efficient manger when doing the one in MM3, and there it WAS needed.
For an uncached state to state transition, my code, tests every state, saving the ones that differ (as code). The assembling is more or less just a memcpy which is very fast (the building is probably faster than calling ALL functions). After building the code, I execute it, which only calls the state changing functions that is needed. If the state transition is in the cache the code is called (currently all state transitions is cached within the first frame). The thing that I like the most is as I previously stated, you can change ANY state without ever caring about the other rendering parts.

• 10
• 19
• 14
• 19
• 15