Render state caching problems

Started by
5 comments, last by paic 15 years, 8 months ago
Hello, I just implemented the render state cache (no compiling or debugging done yet). The general concept seems simple: keep an array of values for all D3D's render states, texture stage states (for up to 8 stages) and sampler states (for up to 8 samplers). Each time a state is about to be changed, we first compare the incoming value against cached value, if not equal than we cache the new value and then call D3D API to set the state. If state is already set, we return immediately. However, when implementing all this, I had two problems: 1) All cached states must be initialized to default values according to D3D documentation. This seems very un-elegant, because it's a huge list of assignments to array elements in my Graphics class constructor. I had no problem typing all that, but it made me wonder if that's what I really had to put up with, or if there was a better way. For all of you out there - is that how you did this? My device is created with pure flag, so I couldn't loop through all states and just call the API to get the current state value to initialize the cache with. 2) When capturing state blocks, I have to use D3D API calls directly, otherwise some important states might end up not being in the state block, if they happen to be equal to current device states at the time of capture. Again, the results in a bunch of un-elegant code that has to set each device state individually, check return value of each API call, throw error if failed, etc. I did a quick work-around by creating "unchecked" versions of state setter functions to remedy this, reducing cache-bypassing state setting to one line of code for each state. An even larger problem is when I have to apply those captured state blocks. The device states will get changed when I call Apply(), but the internal cached states will not. So I am forced to do a bunch of array element assignment once again, after a call to Apply. There are quite a few places where I have to apply state blocks, so this results in lots of code. The logical solution to this problem is to create my own state block class, that will hold a pointer to D3D state block object in addition to having a vector of cached states that will get written into cached array elements when the state is applied. However, all of this together sounds like it will produce a huge overhead - not only are state blocks slow in DirectX as is, but I will also have to loop through my own vector<> and write a bunch of values to memory each time I apply a state. Again, is that how you implemented this, or is there a better way?
Advertisement
Hi,
I can't really answer your questions, but I can explain how I'm doing things.

I'm not using a "cache" for render/sampler/texturestage states. There's simply too much to handle. What I do is using a StateBlock class which is something like :

class StateBlock{public:    StateBlock(void);    ~StateBlock(void);    void AddRenderState(D3DRENDERSTATETYPE state, DWORD value);    void AddSamplerState(D3DSAMPLERSTATETYPE state, DWORD value, DWORD sampler);    void AddTextureStageState(D3DTEXTURESTAGESTATETYPE state, DWORD value, DWORD sampler);    void Finalize(void);    void Apply(void);    void Capture(void);    void Restore(void);private:    ID3DXStateBlock *   m_pStateBlock;   ///< the actual set of states    ID3DXStateBlock *   m_pBackup;       ///< the stateblock used to capture states and restore them later}


I simplified a lot the class. It stores a list of little structures storing states, and when I call Finalize, it builds the ID3DXStateBlock's. Then the 3 methods Apply(), Capture() and Restore() are self explicit :)

The point is : I only cache the current StateBlock used. I don't care about single states. I just manipulate the global set of states as a whole. It's pretty easy to implement and manipulate, and for the moment, it has good performance. I'm using typical scene of ~500k tris, and at max, I have ~20 different StateBlocks.
Quote:
1) All cached states must be initialized to default values according to D3D documentation. This seems very un-elegant, because it's a huge list of assignments to array elements in my Graphics class constructor. I had no problem typing all that, but it made me wonder if that's what I really had to put up with, or if there was a better way. For all of you out there - is that how you did this? My device is created with pure flag, so I couldn't loop through all states and just call the API to get the current state value to initialize the cache with.

D3D render state are DWORD if i remember right...you could just memset your cached values with 0xffffffff.
Doing this way, the first time you set a render state, you'll receive a warning (something like ignoring redundat state change) but I think you can afford a single redundant state change..from there on, your cache will be in sync with d3d internal states.
I'd personally opt for something like Giallanon suggests. That or some sort of bitfield to signify a cache element is "dirty" and exempt from checking - the dirty concept can work nicely for the lost-device scenario as well as first-time loading...

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

But that doesn't solve the problems of state blocks. And in DirectX, it's better to set a stateblock containing a few redundant state changes, than setting each state separately with SetRenderState() and such. Even if the latter avoids any redundant change.
So, sounds like the best way is not to track every single state to call API when it gets changed, but instead attempt to put every single state change used throughout the game or the engine in a D3D state block. Which I've heard is also how D3D10 handles this.

I am still struggling to understand how render state blocks fit into the rendering pipeline, but so far I have arrived at the following (my engine is 2D):

// Each actor is a sprite, so rendering it would result in a quad added to the rendered que.// Quads in the render que would need to be sorted by Z Order and Material.Actor	Position // Applied to verts of the quad as POSITION0.	         // Specified as world units, cached into units used with ortho projection.	Z Order // Applied to verts of the quad in POSITION0	        // but since I don't use Z-buffer, only useful	        // to renderer for determining when to break a batch	Color // Applied to verts of the quad as COLOR0, if vertex declaration has a slot	SpriteInstance // Sprite used, which contains texture coordinates in animation frames		Animation			Frame				TextureCoords[] // One set for every texture. How many textures depends	                			// on vertex declaration, which is a part of Material						// specified as float U,V, but can be set from pixel values						// applied to verts of the quad as TEXTUREn	AnimationInstance // Specifies which Animation to use in Sprite, and which frame of that animation	Rotation // Specified in degrees. When changes,	         // call D3DXMatrixTransformation2D to update internal SRT matrix	Pivot // Specified in world units, cached into ortho units.	      // When changes, call D3DXMatrixTransformation2D to update SRT	Scale // Specified as percentage. When changes,	      // call D3DXMatrixTransformation2D to update SRT	SRT Matrix // Not exposed, but used for collision detection and rendering	MaterialInstance // Stored in pool of MaterialInstances. Every time some material property gets			 // changed, a new material instance may have to be created		Material			VertexDeclaration	// Used to figure out the contents of quad's vertices			Effect 			// Optional. If not using an effect, use fixed pipeline				D3DEffect	// Not exposed			Effect Technique // Used only when effect is specified		Properties[] // Values of Effect's exported constants. If material has no effect, not used.		             // Exposed, but compiled into internal D3D state block which is not exposed			     // Some properties with special semantics, like WorldViewProj,		             // would be set by engine on every frame		Textures[] // Exposed, but compiled into internal D3D state block which is not exposed		States[] // Exposed, but compiled into internal		         // D3D state block which is not exposed		D3DStateBlock // Not exposed, contains all states		              // that must be set before object is rendered		              // this includes API calls resulting from Properties, Textures and States


Does this look like something that can be accomplished? I really want a flexible way to render, but all of this looks like it will take years to implement. There are also blank spots which I don't know how to implement. For example, I know that it's possible for Effect code to set rendering states. But then, how do I figure out whether or not it did this, and which states it changed, so I may include those in un-exposed D3D state block in the Material, and then also prevent it from setting those states again?

So anyway, before an actor can be rendered, I would have to re-create the material instance's state block.

1. Device.BeginStateBlock

2. Force Effect somehow to change all states needed for selected Technique, if Effect is specified, so they can be recoreded in the state block

3. Loop through all Properties and set an Effect constant for each. This will cause SetShaderConstant calls to be recorded in the state block

4. Loop through all Textures and set all of them, to cause SetTexture calls to be recorded in the state block

5. Loop through all States and set them to cause SetRenderState/SetSamplerState/SetTextureStageState to be recorded in the state block

6. Device.EndStateBlock

Each time actor's material instance changes because we set a Property, a Texture, a State or Technique, the internal state block will be marked as "dirty". When attempting to render this object next time around, steps above will be followed to update it.
A few things not in order :

When using .fx effects, you can indeed change states. But when using the ID3DXEffect in your engine, you can also specify a few flags so that the effect will restore the states as they were before using it.

About state blocks. They should only be created during initialization. They are not meant to be recreated / modified each frame ! The point of a stateblock is to set many render/sampler/texturestage states using only one API call (IDirect3DStateBlock9::Apply)
It's not as flexible as setting states individually, using SetRenderState and stuff, but it's a lot faster.
Think about your engine : most of the sprites you are drawing will use the exact same states (alpha blending, alpha testing, etc.) Then you'll probably have a particle system which uses a shader and probably another set of states. Etc.
So during initialization, create a few stateblocks for the different kind of entities of your engine and before rendering, group entities by the stateblock they're using, and draw them, only changing stateblock when needed.

For example, in an extremely simplified game, you'd have 3 stateblocks : one for sprites, one for effects, one for GUI. Then rendering would look something like :
- Set sprites stateblock    - Render all sprites (set textures, set matrices, etc.)- Set effect stateblock    - Set shader         - Render all effects- Set gui stateblock    - Render GUI



But again, that's just the way I'd do it. If you use many different states, using stateblocks might not be the best option.

This topic is closed to new replies.

Advertisement