Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Reducing state changes


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 TiagoCosta   Crossbones+   -  Reputation: 2343

Like
0Likes
Like

Posted 04 September 2012 - 09:12 AM

Do you use any subsystem to help reduce state changes (like ignore a call to bind a texture/shader/render target if it is already bound)? How did you designed it?

Currently I keep track of what is bound to every resource/shader/render target slots using arrays and before binding a new resource I check if it's already in the array, however the code is kind of messy because I'm trying to batch all resource bindings to a single API call so for example if 3 textures need to be bound to the PS I only have to call PSSetShaderResources() once.

Does calling
PSSetShaderResources(0, D3D11_COMMONSHADER_INPUT_RESOURCE_SLOT_COUNT, resources);
increases driver overhead even if 'resources' is an array where all pointers but one are NULL because I actually only need to bind a single texture.

Edited by TiagoCosta, 04 September 2012 - 09:12 AM.


Sponsor:

#2 JohnnyCode   Members   -  Reputation: 267

Like
0Likes
Like

Posted 04 September 2012 - 10:08 AM

most expensive GPU operation is binding a texture to a slot. I order the rendered objects by an algorithm which produses leat possible texture slots resets.
The render state changes are cheap, and to my surprise, the redner target binding too.

#3 Jason Z   Crossbones+   -  Reputation: 5163

Like
0Likes
Like

Posted 04 September 2012 - 10:26 AM

I follow a similar method to what you described in Hieroglyph 3, except I do it with all of the states (for shader stages and fixed stages too). And you are right, it does get kind of messy at times. The general concept that I followed was to have a single class that represents the state of a pipeline stage. Then I keep two copies of that object - one for the desired state, and one for the current state.

Then I track at state setting time if there is a difference between the two. If so, then the next time I bind the states for a draw call, then I push the API changes accordingly. That is the best way I have found to do this management. However, this doesn't try to find the best order of object rendering or anything like that - something like that is left up to either the application itself, or the objects being rendered.

#4 Hodgman   Moderators   -  Reputation: 31122

Like
3Likes
Like

Posted 04 September 2012 - 10:58 AM

Currently I keep track of what is bound to every resource/shader/render target slots using arrays and before binding a new resource I check if it's already in the array

I used to use this method (my 'device' class would cache all states), but have ended up changing it quite a lot.

I group my state-changes into logical groups (e.g. a "material" group might set some cbuffers, some textures and a blend-mode). Each type of render-state is allocated a bit in a bit-field (states with multiple slots get multiple bits), e.g. when designing the bitfield:
StateName         NumBits
 VertexStreams,    1 
 ShaderPrograms,   1 
 DepthStencil,     1 
 PsConstantBuffer, 14
 PsTexture,        16
Each group of state-changes (aka state-group) can then have a mask indicating which states it's going to set.

Each "render-item" then contains a draw-call and a collection of state-groups. I run a sorting function over a collection of render-items to get an appropriate order (e.g. back-to-front for a transparent pass, or sort by expensive states for an regular pass, etc), but this sorting step is decoupled/optional.

When submitting a collection of render-items, I do initialize a 'state cache' to track which states I've set as I go (e.g. State* stateCache[numStates] = {};), but it's just a local variable, not a persistent cache. I also have another array that contains 'default' values for all states, which are used if a render-item's state-group doesn't contain a value for a specific state (this is provided as a 'default' state-group for the current pass). This completely changes the abstraction of the device being a state-machine (i.e. if you don't change a state, it's got whatever value the last user left it in) and instead makes all states explicit and deterministic (i.e. it doesn't matter who used the device before you), which I think is an important feature for a rendering API.
When iterating through each render-item in the collection, I first have to iterate through each of that item's state-groups. As each state-group is processed, it's bitmask is ORed together and any state present in that mask is ignored -- this allows a render-item to contain 'layers' of state-groups containing the same state, and the 'top' instance of that state's value will be used while the lower ones are ignored. Any state within a state-group that passes this test then undergoes the regular redundancy test and is passed to the device / written to a command buffer.
After iterating through the render-item's state-groups, I find any states that weren't set and aren't in their default state, again using the bitmasks:
for(...each state in each state-group of the current render-item...)
    if( (statesSet & state.bit)!=state.bit //'layering' test. Earlier state-groups take precedence.
        && stateCache[state.idx]!=&state ) //regular cache test, don't set redundant states.
    {
      stateCache[state.idx] = &state;
      statesSet |= state.bit;
      Submit(state.cmd);
    }
needsReset = dirtyStates & ~statesSet;
dirtyStates = statesSet;//for the next render-item
Any bits that come up in the needsReset mask then have their states set back to the default values, and then the render-item's draw-call is submitted.

most expensive GPU operation is binding a texture to a slot.

Is that the CPU cost of issuing the command, or the GPU impact on render times? This probably differs between API, GPU model, driver version, specific application...

Edited by Hodgman, 04 September 2012 - 11:02 AM.


#5 L. Spiro   Crossbones+   -  Reputation: 14027

Like
2Likes
Like

Posted 04 September 2012 - 05:57 PM

I concur that textures are the most costly thing not to redundancy-check in DirectX 11. Although I only have one card for testing DirectX 11, it is likely to be universal because textures are the most costly on bandwidth in most cases.
Setting shaders is more expensive in DirectX 9 but DirectX 11 has eliminated the internal state changes that caused that to be the case, and has made bandwidth the most prominent issue instead of internal state changes, flushes, etc.

My method for redundancy-checking is to allow the user to set states at any time, but all that happens is that my own variables get changed and nothing is actually issued to DirectX 11.
Then I have one function for rendering, and within it I check my own state copy against the state copy last time things were drawn and send only commands to DirectX 11 that are not redundant.

The reason I wait until the actual draw call is 2-fold:
#1: You can add a “set default render states” function with no impact on performance as we do at work (but I do not personally do within my engine). Like what Hodgman does.
#2: It is the only way to set multiple textures at once, avoiding unnecessary calls to PSSetShaderResources().


And then I have a render queue to ensure that redundancies are maximized, resulting in the fewest possible state changes between draw calls.


L. Spiro

Edited by L. Spiro, 04 September 2012 - 06:00 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS