Clever & efficient managing render states in DX10

Started by
3 comments, last by ET3D 15 years, 9 months ago
Hi guys, While adding support for DX10 to my graphics engine which is DX9 style I need to decide how to manage render states in a best way. The old style way of handling render states was simple - just set each individual render state separately. To support both DX9 and DX10 I'll obviously need to refactor render state management to be DX10 compatible i.e. encapsulate groups of them into raster / blending / depth etc. state groups. While grouping render states should lead to less DX API calls (I guess this was the main purpose of the DX10 change in that area) all that seems a bit inconvenient to use because it requires me to create all render states up front i.e. before starting rendering the scene. Otherwise it will be too inefficient I guess. Now I was wondering how do people usually manage these extra objects. Do you store render states of each type in a separate hash table to find desired render state quickier or do you rely on DX10 built-in lookup (the docs clearly state that if same state already exists you'll be given pointer to it; just the ref count increases)? Also, is that - in real-life, in more complex scenes / games - no problem that you need to know all render states up front to be able to create them? I'm assuming creating render states just when they're needed is not an option because of efficiency reasons.
Maciej Sawitus
my blog | my games
Advertisement
Quote:Original post by MickeyMouse
While grouping render states should lead to less DX API calls (I guess this was the main purpose of the DX10 change in that area) all that seems a bit inconvenient to use because it requires me to create all render states up front i.e. before starting rendering the scene. Otherwise it will be too inefficient I guess.


That's not the only reason for the grouping.

Actually the creation of states before hand allows the validation (making sure the states are in range) and sometimes command buffer optimizations (generating hardware commands in advance so that at the time you set the states all that is required is just pushing a pointer to an array or doing a memcopy of all the commands, but it's internal driver soup). Creating the states itself can fail (for validation), setting the state at runtime should not fail (except for the case where the device was removed).

If you do a statistical analysis of what commands you send each frame, you'll see that most of the states are repeated either several times per frame or across frames when the same objects get rendered again and again and again. That validation cost and creation of the hardware command is then repeated in the case of D3D9 old interface (or even OpenGL), but can be spared in d3d10 and with some clever management on your side.

What you SHOULDN'T do :
- have a mechanism that recreates state objects each frame or each time an object is drawn. The state object creation comes with some overhead that if it is not counterbalanced by the savings described above is a net loss. Recreating state objects constantly at runtime is the mistake of a naive d3d9->d3d10 conversion. Creating state objects on a "need basis" is still acceptable (given that not too much of it happen per one frame else it's going to stutter).
- have an expensive lookup function that would match some unrelated d3d9-ish states with an existing d3d10 state object. Though I'm sure it can be made more efficient than the recreation above, it can also be made less efficient or not as efficient as possible, because of the expensive lookup per object drawn. Ideally your material system would have a list of state objects that it uses and simply have to defer a few pointers to apply it at runtime. State objects may or may not be shared between materials with some internal refcount. Depends on the total cost of having those objects around.

Add to those guidelines any variations that applies to your unique engine..

Hope that helps,
LeGreg
Well, I tend to do something along these lines:

I have a pool of device state objects stored in my renderer, that get created upon demand.

Whenever an object/material is created, it queries the pool of objects if the one that it desires exists or not. If it does, then it grabs a pointer to that object and is happy from that point on. If it doesn't, then it creates a new render state object, adds it to the pool, and grabs the pointer.


I wasn't aware that DX10 did that sort of thing for you automatically behind the scenes. But, if you are supporting both DX9 and DX10, then you can share the same idea of render state objects between them. I might just have to investigate this for myself.

I doubt that there is much overhead to creating a render state object and never using it, since, I can't imagine that they are very big. But, if you can avoid doing that, then its probably for the best. There is a limit (4096 I believe) on the number of these objects you can create (though that seems like a ridiculously large limit that you'll never reach).

Its also a bit more efficient, since you can check if an object uses the same render states as the previous one with a simple comparison of the pointer of the shared render state object, instead of having to check each state individually. I imagine that the D3D10 driver does that for you as well.
Thanks guys,

The point about state validation is indeed a good one.

Still it feels highly inconvenient in some scenarios, for example:

1. When doing texture streaming from disc I sometimes lock higher mip-levels and so need to modify highest LOD to be used (passed as a D3DSAMP_MAXMIPLEVEL sampler state).

It can potentially be any mip-level and therefore I'll need to create bunch of sampler states only differing at that single variable.

2. Doing multi-level reflections I'm using few stencil tricks. Again, here I would need a bunch of depth-stencil states with different stencil mask / ref values combinations.

Also, concerning your point about high inefficiency of the lookup:

Quote:Original post by LeGreg
What you SHOULDN'T do :
- have an expensive lookup function that would match some unrelated d3d9-ish states with an existing d3d10 state object. Though I'm sure it can be made more efficient than the recreation above, it can also be made less efficient or not as efficient as possible, because of the expensive lookup per object drawn.

I'm actually thinking whether that wouldn't be viable solution. It can be implemented as a single map look up to some cache for each DX10 render state type (blending / sampler / depth-stencil etc.) that changed since last time... which doesn't have to be that bad.

The advantages in my case would be simple:
- wouldn't need to manage render state objects outside of the graphics library
- wouldn't need to refactor the whole code that's using render states (i.e. stay DX9 / OpenGL style compatible in that area)

Disadvantages are obvious:
- extra lookup per draw call per DX10 render state type is needed (in real-life though objects would already be presorted based on render state, so actual lookup would happen rarely)
- render states would be created on demand, which might introduce some hitching when a bunch of new render states needs to be created
- no initialization time pre-validation of the render state
Maciej Sawitus
my blog | my games
Sadly, this sounds reasonable to me as an intial solution. Retrofitting code is always a problem. Still I think that in the long term it's better that you look into creating up front as many as the states as you can. This can be initially integrated with the states cache, and perhaps later the D3D9/OpenGL state handling could be changed to make D3D10 state handling more efficient.

Note that the stencil ref is not part of the state, so at least this saves you one value to consider.

This topic is closed to new replies.

Advertisement