Stateblocks useful?

Started by
5 comments, last by Krohm 17 years, 11 months ago
I've recently read a GDC paper here in which ATi said state blocks hurt performance. When I first read of state blocks, I liked them pretty much. I was planning to use them heavily but this somewhat disappointed me. Do I understand this functionality is not mature yet (or it's just ATi screwing up again even in D3D)?

Previously "Krohm"

Advertisement
I imagine this refers to something like what's mentioned in this presentation. It's easy to create state blocks with a lot of redundant state changes, and that'd be bad for performance.
No doubt for that, but this is not mentioned at all in the other presentation so I guess it comes from a different problem.

Anyone had real world experiences with state blocks?

Previously "Krohm"

I did use them. All my texture stages were defined using them (I didn't use effects then). All I can say is that they work. The rest is a matter of optimisation, and I never got as far as recognising them as a bottleneck. There's a chance that they did contribute an overhead.
The whole D3D sample framework and D3DX makes heavy use of stateblocks. Personally I use them to store and restore states for my hooking solutions.

IIRC a driver can include native support for stateblocks if it runs in pure mode. In this case stateblocks can help to reduce some of the overhead.
Why stateblocks can/should be good:

The render states available in Direct3D represent an idealised version of the way graphics hardware works.

Real world graphics hardware often combines many real states into one place, for [contrived] example enabling Z writes, setting the alpha test function, and changing cull mode is three separate render states in D3D, but may be 3 bits in a single register on the actual graphics hardware.

Additionally, the individual states that are presented by D3D sometimes have no direct equivilent in hardware so the functionality of those states has to be emulated using combinations of other states.

One of the jobs of a graphics cards' device driver is to 'translate' between the ideal D3D state and the actual state used by the hardware - in D3D, your SetRenderState() calls are [usually] cached. Those render states are passed to the graphics device driver when you actually draw something (e.g. DrawPrimitive etc).

The translation of the render states set with SetRenderState() happens every time you draw.

Since the translation of a single state can actually affect and is affected by other states, it can involve translating the *whole* of the state currently set for the device. Translating all/most of the hardware state every time a draw call is made can be wasteful...

The [original] idea behind state blocks is the translation between D3D state and the real hardware state can be done once when the state block is created and then when a state block is applied, the driver can use the pre-translated version of the state and so save lots of time inside the driver.

State blocks can be particularly beneficial with fixed function texture stage states (SetTextureStageState) where the TSS setup is even further abstracted from hardware than other D3D render state.


Why stateblocks can be bad:

If a device driver doesn't support state blocks natively, then when a state block is recorded, D3D simply stores up a list of the render states (and their values) in that block.

When D3D comes to apply the state block (on a driver that doesn't support them natively), it effectively just calls SetRenderState() for each state in the recorded state block.

The problem with this is a state block usually represents *all* the state required to render something - so that's lots of individual SetRenderState() calls (that will all get translated - individually!).

When people use state blocks, they tend to be far less agressive about minimising unnecessary state changes, particularly when a state block is meant to represent a complete unit of device state without much/any translation.

Since the state in a state block always gets applied, if it turns into a bunch of SetRenderState calls, having redundant state in there can cost more than calling SetRenderState individually.

Finally, since modern applications use pixel shaders, SetTextureStageState() is less prevalent so there the translation work that /does/ need to still happen is a lot simpler than it was when state blocks were first introduced.


ATI:

They were once terrible for state block support, IIRC to the point of a release of the Rage Pro drivers (going back a tad ;o) pretending they supported state blocks to D3D, but never actually recording/replaying anything.

But modern ATI drivers have always usually /worked/ with state blocks; however, there's probably a good reason why they don't like state blocks these days either - likely something to do with the internal workings of their drivers.


"real world" context if required: I've used state blocks in (amongst others) a commercial product (2000, Hasbro Interactive), and not used state blocks in (amongst others) another commercial product (2005, Atari).

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

Quote:Original post by ET3D
I did use them. All my texture stages were defined using them (I didn't use effects then). All I can say is that they work. The rest is a matter of optimisation, and I never got as far as recognising them as a bottleneck. There's a chance that they did contribute an overhead.

That's sure. In case I manage to do some experiments, I'll post the results since I see this is a bit uncler. I would expect there's a break even point, as always.
Quote:Original post by Demirug
IIRC a driver can include native support for stateblocks if it runs in pure mode.

Thank you for pointing out this. I'll remember to go for pure devices directly.
Quote:Original post by S1CA
Since the translation of a single state can actually affect and is affected by other states, it can involve translating the *whole* of the state currently set for the device. Translating all/most of the hardware state every time a draw call is made can be wasteful...

The [original] idea behind state blocks is the translation between D3D state and the real hardware state can be done once when the state block is created and then when a state block is applied, the driver can use the pre-translated version of the state and so save lots of time inside the driver.

It definetly makes sense, something like uniform accessing and such!
Quote:Original post by S1CA
The problem with this is a state block usually represents *all* the state required to render something - so that's lots of individual SetRenderState() calls (that will all get translated - individually!).

I understand. In other words, they get 'unrolled' but the validation is done each time something changes, so it's really doing N calls + N validations instead of N+1... simply awful!
Quote:Original post by S1CA
But modern ATI drivers have always usually /worked/ with state blocks; however, there's probably a good reason why they don't like state blocks these days either - likely something to do with the internal workings of their drivers.

I see they're screwing up again. :(

I understand there's an issue of being CPU bound or not. Considering I'll try to minimize state changes, maybe I'll first go for a simplier approach using individual calls. This may be a win with the next driver model in which calls are expected to be less expensive.

Thank you very much!

Previously "Krohm"

This topic is closed to new replies.

Advertisement