Sign in to follow this  

An idea for renderstate managements

This topic is 4590 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am quite new to the idea of state management, but I know that it is pretty important for keeping speed. I was wandering about what is the best of rendering while keeping state changes to a minimal. I have come up with a theory of state management that works on a stack system. Any comments are greatly apriciated! Basically, my engine consists of entities like meshs, etc. You can set certain properties on these entities - wireframe, fullbright, and so on. My idea was to create another thread and have a continuous loop on this thread, that orders the entities for rendering. The loop on this thread will find out all of the state changes that are used. Then it will create stacks to render the entities, depending on the states used. It's kind of hard to explain, so below is a diagram that explains what I mean. In this example, the renderer would push the first state change on, push the second state, and push the third. Render all these entities, pop back to the second state, render these entities, pop to the first and then render the remaining entities. The same would happen for the second stack in the diagram (illustrated as a pyramid-type thing). ------------------------------------------------------ So, does this sound like a good idea? The reordering would take place on a second thread so hopefully - speed wouldn't be lost toooo much. Thanks, aCiD2

Share this post


Link to post
Share on other sites
This seems very similar to what we used in Midtown Madness 3.
Basicly we packed states that was likely to change at the same time into 32-bit words, to scan for a change we first tested the words against eachother, if they differed we tested the individual states (we actually went further than that).
After all optimizations applied, the state manager was still one of the more costly processes in the game, at whoppingly 4% of the total CPU time on average (we had no single function showing over 2% of the CPU).

I.e.

Word0: aaaa.bbbb bbbb.ccdd eeee.eeee eeff.ffff (a = a state, b = state likely to change if a changed etc)
Word1: ggg... and so on.

if (Previous.Word0 != Current.Word0){
if (Previous.aa != Current.aa)
SetRenderState(Current.aa)

Previous.Word0 = Current.Word0
}

I've since then abandoned the idea, it's to much logic code involved.
I now have a system that has "States", a State contains all render states, texture states, sampler states, index buffer etc. Basicly it contain all the information that you can set in DirectX. It also have a flag (bit) for each item in the State.
The renderer contains a POINTER to the previous state, when a new State are to be applied, it's POINTER is sent.
Then using a map I check if the State change PREV-CUR already exists.
If it doesn't I generate the code (actual assembled machine code) that set the Items that has changed from PREV-CUR. It looks something like "push, push, push, call SetRenderState, push, push, call SetIndices, ret".
The code is then stored in the map, and executed for the state change.
An engine's items are quite often linearly rendered so the cache doesn't grow big at all. The flags I mentioned is for things that the user want's needs to set every time before a DP call (in my case I have a Material-language, that does this for me).

Here's my apply-state method:

void
Renderer::applyState(State* const state)
{
// Find state difference
StateCache::iterator it = m_stateCache.find(std::make_pair(m_prevState, state));
// If state difference is invalid
if ((it != m_stateCache.end()) && (!it->second->isValid())){ // I allow hot-loading of materials and states, need to recreate the diff if this happens. This test could be skipped by adding more code into the hot-loading mechanism
// Remove difference from cache
it->second->release();
m_stateCache.erase(it);
it = m_stateCache.end();
}
// If no state difference is in the cache
if (it == m_stateCache.end()){
// Create a new difference
StateDifference* diff = StateDifference::create(m_prevState, state); // Genereates the asm code
if (!diff){
// If we couldn't create a difference, apply the whole state! SLOW AS HELL!
logWarning("Couldn't create state difference!");
state->apply();
m_prevState = state;
return;
}
// Insert state difference to the cache
std::pair<StateCache::iterator, bool> res = m_stateCache.insert(std::make_pair(std::make_pair(m_prevState, state), diff));
if (!res.second){
diff->release();
logWarning("Couldn't add state difference to the cache!");
state->apply();
m_prevState = state;
return;
}
it = res.first;
}
// Apply state difference
it->second->apply(); // Basicly a call into the generated asm code
m_prevState = state;
}







Here's a sample material


// Horizon map material

variables vertexBuffer,
vertexBufferStride,
vertexDecl
vertexWorldConsts

method "TheOnlyOne"

pass "One"

// Shader
VertexShader
{
;c0 = ProjectionMatrix0
;c1 = ProjectionMatrix1
;c2 = ProjectionMatrix2
;c3 = ProjectionMatrix3
;c4 = Z-near, Z-Far, 1 / (zFar - zNear), 0.5
;c5 = CameraPosX, Y, Z, ?
;c6 = 2.0f, 1.0f, ?, ?
;c7 = 2.0 / DestWidth, -2.0 / DestHeight, -1, 1

vs.1.1

dcl_position v0 ; x, y, z
dcl_texcoord v1 ; u, v

; Unpack position
mad r0, v0.xyz, c6.x, -c6.y

; Project
mul r4, r0.x, c0
mad r4, r0.y, c1, r4
mad oPos, r0.z, c2, r4

; Output texcoord
mov oT0, v1

}

PixelShader "Data\\Shader\\SingleTexture.psh"

// Shader data
VertexConsts vertexWorldConsts

// Geometry
VertexDecl vertexDecl
VertexBuffer 0, vertexBuffer, 0, vertexBufferStride

CullMode None
ZWriteEnable false

Texture 0, "Data\\Texture\\Horizon.dds"
AddressU 0, Clamp
AddressV 0, Clamp

MagFilter 0, Linear
MinFilter 0 Linear
MipFilter 0 Linear

AlphaTestEnable true
AlphaRef 255
AlphaFunc Equal







and the code to use this


void
HorizonMap::draw(void)
{
Variable vars[4];
vars[0] = m_vertices;
vars[1] = sizeof(Vertex);
vars[2] = m_decl;
vars[3] = getCommonVertexShaderConsts();
uint passCount = m_material->getPassCount();
uint pass;
for (pass = 0; pass < passCount; ++ pass){
if (m_material->apply(pass, vars))
m_material->engine()->getDevice()->DrawPrimitive(D3DPT_TRIANGLESTRIP, 0, m_triangleCount);
}
}

Share this post


Link to post
Share on other sites
I didn't realise state changes were so expensive - I'd assumed that the time to check if a current state is already what you want is slower than just doing the SetRenderState call. Guess I'm wrong, but if you don't use a pure D3D device doesn't it check state chages before doing them anyway? Certainly the debug version will tell you about redundant changes.

The nicest way would just be to have an array od DWORD showing the current value for each state. Then you could just do:
if(states[D3DRS_ZFUNC]!=newValue) SetRenderState(...)
However there are obviously problems that make this impossible. But if for every D3D renderstate you use you create your own mapping to it e.g D3DRS_ZFUNC = MYRS_ZFUNC, then you can number your own states sequentially and just have that array and use the mapping. Is this acceptable because apart from typing in the (long) list of renderstate mappings it seems like a fast method to me...

Share this post


Link to post
Share on other sites
Well, in gl state changes are fast but in d3d9 they're not but will be in d3d10. To cure the frequent state changes you could batch items based on same texture or pack multiple textures into one but that's for old school stuff I think. I used to do some wild logic thingy to prevent redundant state changes but I dropped it. I set a lot of states because I multipass so much with each pass having different state settings.

Share this post


Link to post
Share on other sites
Quote:
I didn't realise state changes were so expensive

It all depends on how many "materials" you have. In MM3 I think we ended up using around 70 materials.

The problem is that there are so many states. Consider setting all these states before calling a bunch of DP's. On the X-Box many state changes was just a move into a table i.e:

mov eax, 0x12345678
mov [ebx + 0x1234], eax

But still doing 300+ of theses moves 70 times each frames does kill the performance.


Render states:
D3DRS_ZENABLE = 7,
D3DRS_FILLMODE = 8,
D3DRS_SHADEMODE = 9,
D3DRS_ZWRITEENABLE = 14,
D3DRS_ALPHATESTENABLE = 15,
D3DRS_LASTPIXEL = 16,
D3DRS_SRCBLEND = 19,
D3DRS_DESTBLEND = 20,
D3DRS_CULLMODE = 22,
D3DRS_ZFUNC = 23,
D3DRS_ALPHAREF = 24,
D3DRS_ALPHAFUNC = 25,
D3DRS_DITHERENABLE = 26,
D3DRS_ALPHABLENDENABLE = 27,
D3DRS_FOGENABLE = 28,
D3DRS_SPECULARENABLE = 29,
D3DRS_FOGCOLOR = 34,
D3DRS_FOGTABLEMODE = 35,
D3DRS_FOGSTART = 36,
D3DRS_FOGEND = 37,
D3DRS_FOGDENSITY = 38,
D3DRS_RANGEFOGENABLE = 48,
D3DRS_STENCILENABLE = 52,
D3DRS_STENCILFAIL = 53,
D3DRS_STENCILZFAIL = 54,
D3DRS_STENCILPASS = 55,
D3DRS_STENCILFUNC = 56,
D3DRS_STENCILREF = 57,
D3DRS_STENCILMASK = 58,
D3DRS_STENCILWRITEMASK = 59,
D3DRS_TEXTUREFACTOR = 60,
D3DRS_WRAP0 = 128,
D3DRS_WRAP1 = 129,
D3DRS_WRAP2 = 130,
D3DRS_WRAP3 = 131,
D3DRS_WRAP4 = 132,
D3DRS_WRAP5 = 133,
D3DRS_WRAP6 = 134,
D3DRS_WRAP7 = 135,
D3DRS_CLIPPING = 136,
D3DRS_LIGHTING = 137,
D3DRS_AMBIENT = 139,
D3DRS_FOGVERTEXMODE = 140,
D3DRS_COLORVERTEX = 141,
D3DRS_LOCALVIEWER = 142,
D3DRS_NORMALIZENORMALS = 143,
D3DRS_DIFFUSEMATERIALSOURCE = 145,
D3DRS_SPECULARMATERIALSOURCE = 146,
D3DRS_AMBIENTMATERIALSOURCE = 147,
D3DRS_EMISSIVEMATERIALSOURCE = 148,
D3DRS_VERTEXBLEND = 151,
D3DRS_CLIPPLANEENABLE = 152,
D3DRS_POINTSIZE = 154,
D3DRS_POINTSIZE_MIN = 155,
D3DRS_POINTSPRITEENABLE = 156,
D3DRS_POINTSCALEENABLE = 157,
D3DRS_POINTSCALE_A = 158,
D3DRS_POINTSCALE_B = 159,
D3DRS_POINTSCALE_C = 160,
D3DRS_MULTISAMPLEANTIALIAS = 161,
D3DRS_MULTISAMPLEMASK = 162,
D3DRS_PATCHEDGESTYLE = 163,
D3DRS_DEBUGMONITORTOKEN = 165,
D3DRS_POINTSIZE_MAX = 166,
D3DRS_INDEXEDVERTEXBLENDENABLE = 167,
D3DRS_COLORWRITEENABLE = 168,
D3DRS_TWEENFACTOR = 170,
D3DRS_BLENDOP = 171,
D3DRS_POSITIONDEGREE = 172,
D3DRS_NORMALDEGREE = 173,
D3DRS_SCISSORTESTENABLE = 174,
D3DRS_SLOPESCALEDEPTHBIAS = 175,
D3DRS_ANTIALIASEDLINEENABLE = 176,
D3DRS_MINTESSELLATIONLEVEL = 178,
D3DRS_MAXTESSELLATIONLEVEL = 179,
D3DRS_ADAPTIVETESS_X = 180,
D3DRS_ADAPTIVETESS_Y = 181,
D3DRS_ADAPTIVETESS_Z = 182,
D3DRS_ADAPTIVETESS_W = 183,
D3DRS_ENABLEADAPTIVETESSELLATION = 184,
D3DRS_TWOSIDEDSTENCILMODE = 185,
D3DRS_CCW_STENCILFAIL = 186,
D3DRS_CCW_STENCILZFAIL = 187,
D3DRS_CCW_STENCILPASS = 188,
D3DRS_CCW_STENCILFUNC = 189,
D3DRS_COLORWRITEENABLE1 = 190,
D3DRS_COLORWRITEENABLE2 = 191,
D3DRS_COLORWRITEENABLE3 = 192,
D3DRS_BLENDFACTOR = 193,
D3DRS_SRGBWRITEENABLE = 194,
D3DRS_DEPTHBIAS = 195,
D3DRS_WRAP8 = 198,
D3DRS_WRAP9 = 199,
D3DRS_WRAP10 = 200,
D3DRS_WRAP11 = 201,
D3DRS_WRAP12 = 202,
D3DRS_WRAP13 = 203,
D3DRS_WRAP14 = 204,
D3DRS_WRAP15 = 205,
D3DRS_SEPARATEALPHABLENDENABLE = 206,
D3DRS_SRCBLENDALPHA = 207,
D3DRS_DESTBLENDALPHA = 208,
D3DRS_BLENDOPALPHA = 209,
Sampler states, atleast 8 of theese.
D3DSAMP_ADDRESSU = 1,
D3DSAMP_ADDRESSV = 2,
D3DSAMP_ADDRESSW = 3,
D3DSAMP_BORDERCOLOR = 4,
D3DSAMP_MAGFILTER = 5,
D3DSAMP_MINFILTER = 6,
D3DSAMP_MIPFILTER = 7,
D3DSAMP_MIPMAPLODBIAS = 8,
D3DSAMP_MAXMIPLEVEL = 9,
D3DSAMP_MAXANISOTROPY = 10,
D3DSAMP_SRGBTEXTURE = 11,
D3DSAMP_ELEMENTINDEX = 12,
D3DSAMP_DMAPOFFSET = 13,
Texture stage states, atleast 8 of theese
D3DTSS_COLOROP = 1,
D3DTSS_COLORARG1 = 2,
D3DTSS_COLORARG2 = 3,
D3DTSS_ALPHAOP = 4,
D3DTSS_ALPHAARG1 = 5,
D3DTSS_ALPHAARG2 = 6,
D3DTSS_BUMPENVMAT00 = 7,
D3DTSS_BUMPENVMAT01 = 8,
D3DTSS_BUMPENVMAT10 = 9,
D3DTSS_BUMPENVMAT11 = 10,
D3DTSS_TEXCOORDINDEX = 11,
D3DTSS_BUMPENVLSCALE = 22,
D3DTSS_BUMPENVLOFFSET = 23,
D3DTSS_TEXTURETRANSFORMFLAGS = 24,
D3DTSS_COLORARG0 = 26,
D3DTSS_ALPHAARG0 = 27,
D3DTSS_RESULTARG = 28,
D3DTSS_CONSTANT = 32,
Vertex streams, atleast 8.
Shader constants.
And so on






Ofcourse you can say things like, my engine is not allowed to change state D3DTSS_BUMPENVMAT01 so I can skip that one and so on.
But if someone suddenly uses one of the "banned" states, all hell breaks loose. All rendering will/can break and you need to add that state to the list of allowed states. All rendering code need to set that state to the desired value. A state manager removes that hassle.

I just declare a State (it's automatically setup to the default DirectX State), then change the values I need. When rendering I call applyState, set some of the dynamic states (with or without testing the previous state), then call DP.
If the code executed just before my rendering changes State XXX, the state manager detects that and during my rendering I got the desired state.

Share this post


Link to post
Share on other sites
Quote:
Original post by eq
Quote:
I didn't realise state changes were so expensive

It all depends on how many "materials" you have. In MM3 I think we ended up using around 70 materials.

The problem is that there are so many states. Consider setting all these states before calling a bunch of DP's. On the X-Box many state changes was just a move into a table i.e:

mov eax, 0x12345678
mov [ebx + 0x1234], eax

But still doing 300+ of theses moves 70 times each frames does kill the performance.
I don't use materials at all, but I do change states quite a bit. I end up having to set ALL the states I might alter in each object's render to make sure some other state hasn't been changed to screw stuff up. It's not good because e.g if the terrain render uses a new state, every other object has to explicity make sure that state is now correct!

Doing 300+ 2-MOV ops each frame is absolutely undetectable. That's probably about the same as one call to sprintf()!

Share this post


Link to post
Share on other sites
Quote:
Doing 300+ 2-MOV ops

That was on an X-box, i.e 733Mhz Pentium III.
On the PC a render state change is doing ALOT more. In your code it's pushing the arguments and do a call into D3D (normal STD-call). Inside D3D I *think* it's batching up the render states that you modifed and when the DP call is made, all states updates are sent to the driver.
In the game I refered to the CPU time spent setting render states (using essentially just a move) was around 15% without any state management. The 300+ states was probably an understatement. I recall that we did 300+ DP calls and I expect that we changed quite a few states in between (same material, but different texture, transforms etc). The reason for having such a huge number of different materials etc, was only because of the memory limitations (compressed alot od data differently to maximize memory usage). I expect a PC game to use less different mesh types and thus the number of state-switches is smaller.
If you don't need a state manager, fine, don't do it. In my current engine I probably don't need one either. I just got the idea on how to code a very efficient manger when doing the one in MM3, and there it WAS needed.
For an uncached state to state transition, my code, tests every state, saving the ones that differ (as code). The assembling is more or less just a memcpy which is very fast (the building is probably faster than calling ALL functions). After building the code, I execute it, which only calls the state changing functions that is needed. If the state transition is in the cache the code is called (currently all state transitions is cached within the first frame). The thing that I like the most is as I previously stated, you can change ANY state without ever caring about the other rendering parts.

Share this post


Link to post
Share on other sites

This topic is 4590 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this