Reducing state changes

Started by
3 comments, last by L. Spiro 11 years, 8 months ago
Do you use any subsystem to help reduce state changes (like ignore a call to bind a texture/shader/render target if it is already bound)? How did you designed it?

Currently I keep track of what is bound to every resource/shader/render target slots using arrays and before binding a new resource I check if it's already in the array, however the code is kind of messy because I'm trying to batch all resource bindings to a single API call so for example if 3 textures need to be bound to the PS I only have to call PSSetShaderResources() once.

Does calling PSSetShaderResources(0, D3D11_COMMONSHADER_INPUT_RESOURCE_SLOT_COUNT, resources); increases driver overhead even if 'resources' is an array where all pointers but one are NULL because I actually only need to bind a single texture.
Advertisement
most expensive GPU operation is binding a texture to a slot. I order the rendered objects by an algorithm which produses leat possible texture slots resets.
The render state changes are cheap, and to my surprise, the redner target binding too.
I follow a similar method to what you described in Hieroglyph 3, except I do it with all of the states (for shader stages and fixed stages too). And you are right, it does get kind of messy at times. The general concept that I followed was to have a single class that represents the state of a pipeline stage. Then I keep two copies of that object - one for the desired state, and one for the current state.

Then I track at state setting time if there is a difference between the two. If so, then the next time I bind the states for a draw call, then I push the API changes accordingly. That is the best way I have found to do this management. However, this doesn't try to find the best order of object rendering or anything like that - something like that is left up to either the application itself, or the objects being rendered.
Currently I keep track of what is bound to every resource/shader/render target slots using arrays and before binding a new resource I check if it's already in the array
I used to use this method (my 'device' class would cache all states), but have ended up changing it quite a lot.

I group my state-changes into logical groups (e.g. a "material" group might set some cbuffers, some textures and a blend-mode). Each type of render-state is allocated a bit in a bit-field (states with multiple slots get multiple bits), e.g. when designing the bitfield:StateName NumBits
VertexStreams, 1
ShaderPrograms, 1
DepthStencil, 1
PsConstantBuffer, 14
PsTexture, 16
Each group of state-changes (aka state-group) can then have a mask indicating which states it's going to set.

Each "render-item" then contains a draw-call and a collection of state-groups. I run a sorting function over a collection of render-items to get an appropriate order (e.g. back-to-front for a transparent pass, or sort by expensive states for an regular pass, etc), but this sorting step is decoupled/optional.

When submitting a collection of render-items, I do initialize a 'state cache' to track which states I've set as I go (e.g. [font=courier new,courier,monospace]State* stateCache[numStates] = {};[/font]), but it's just a local variable, not a persistent cache. I also have another array that contains 'default' values for all states, which are used if a render-item's state-group doesn't contain a value for a specific state (this is provided as a 'default' state-group for the current pass). This completely changes the abstraction of the device being a state-machine (i.e. if you don't change a state, it's got whatever value the last user left it in) and instead makes all states explicit and deterministic (i.e. it doesn't matter who used the device before you), which I think is an important feature for a rendering API.
When iterating through each render-item in the collection, I first have to iterate through each of that item's state-groups. As each state-group is processed, it's bitmask is ORed together and any state present in that mask is ignored -- this allows a render-item to contain 'layers' of state-groups containing the same state, and the 'top' instance of that state's value will be used while the lower ones are ignored. Any state within a state-group that passes this test then undergoes the regular redundancy test and is passed to the device / written to a command buffer.
After iterating through the render-item's state-groups, I find any states that weren't set and aren't in their default state, again using the bitmasks:for(...each state in each state-group of the current render-item...)
if( (statesSet & state.bit)!=state.bit //'layering' test. Earlier state-groups take precedence.
&& stateCache[state.idx]!=&state ) //regular cache test, don't set redundant states.
{
stateCache[state.idx] = &state;
statesSet |= state.bit;
Submit(state.cmd);
}
needsReset = dirtyStates & ~statesSet;
dirtyStates = statesSet;//for the next render-item
Any bits that come up in the needsReset mask then have their states set back to the default values, and then the render-item's draw-call is submitted.
most expensive GPU operation is binding a texture to a slot.
Is that the CPU cost of issuing the command, or the GPU impact on render times? This probably differs between API, GPU model, driver version, specific application...
I concur that textures are the most costly thing not to redundancy-check in DirectX 11. Although I only have one card for testing DirectX 11, it is likely to be universal because textures are the most costly on bandwidth in most cases.
Setting shaders is more expensive in DirectX 9 but DirectX 11 has eliminated the internal state changes that caused that to be the case, and has made bandwidth the most prominent issue instead of internal state changes, flushes, etc.

My method for redundancy-checking is to allow the user to set states at any time, but all that happens is that my own variables get changed and nothing is actually issued to DirectX 11.
Then I have one function for rendering, and within it I check my own state copy against the state copy last time things were drawn and send only commands to DirectX 11 that are not redundant.

The reason I wait until the actual draw call is 2-fold:
#1: You can add a “set default render states” function with no impact on performance as we do at work (but I do not personally do within my engine). Like what Hodgman does.
#2: It is the only way to set multiple textures at once, avoiding unnecessary calls to PSSetShaderResources().


And then I have a render queue to ensure that redundancies are maximized, resulting in the fewest possible state changes between draw calls.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement