• entries
422
1540
• views
488534

# Not enough hours in the day...

149 views

I hinted at it in my last post, but I took some time to contemplate methods of material based sorting for Direct3D based rendering.

I'm sure there are various papers/articles available online, but I quite like these sorts of problems - so I sat down with a blank bit of paper and tried to work out how I'd do it [smile]

-- Why optimize state changes? --

Simply put, they're a relatively slow set of operations to call during the high-performance part of your rendering code. Less of them means, in theory, more speed.

According to Circlesoft's KB entry a call to SetTexture( ) will hurt you to the tune of 2500-3100 cycles. It's difficult to determine real-world times (e.g. number of microseconds) due to it being very dependent on the underlying hardware. Other material-related calls can be somewhat more expensive, particularly changing which shader program is being used.

-- An example --

To keep things simple, I'm gonna cover an unoptimized rendering path and an optimized rendering path in terms of textures. Obviously there'll be various other variables (e.g. changing shaders and their constants) in a real system, but for the purpose of what I want to show we'll ignore them [wink]

   Mat0 = Diffuse0 + Normals0 + Specular0 + CubeEnvMap0   Mat1 = Diffuse1 + Normals0 + Specular1   Mat2 = Diffuse0 + Normals1 + Specular1   Mat3 = Diffuse1 + Normals2 + Specular0   Mat4 = Diffuse2 + Normals1 + CubeEnvMap0

So, we have the following 9 textures loaded into VRAM: Diffuse0, Diffuse1, Diffuse2, Normals0, Normals1, Normals2, Specular0, Specular1, CubeEnvMap0.

Before we can render anything that has a given material associated with it, we must change the textures (via SetTexture()) to those specified above. We'll be assuming that each frame starts from a "default" unconfigured state and that changing a texture (or removing it) is 1 operation.

In the above diagram it shows, along each arrow, how many changes are necessary to get from material-to-material.

Based on the above diagram it takes 20 steps to traverse from by setting up each material completely, irrespective of it's previous configuration:Reset->Mat0->Mat1->Mat2->Mat3->Mat4

If we were to only change those states that needed to be changed, following the same path yields only 16 steps.

Alternatively, if we were to take the most optimal route that starts from Reset and visits each material once we'd end up with only 12 steps and traverse in the following order: Reset->Mat1->Mat2->Mat0->Mat3->Mat4.

Keeping the initial order of materials the same, but changing only those that are required saves 20% overall. Allowing us to re-arrange the order that the materials are used gives us another 25% - a total saving of 40% [grin]

-- A better example --

Obviously we're going to be rendering some meshes (or just general geometry) that actually puts these materials to use. If we were to take the ID3DXMesh implementation as an example, the geometry can be stored all as one block yet partitioned up into multiple materials.

   Mesh0 = Mat0 + Mat4 + Mat2   Mesh1 = Mat1 + Mat3   Mesh2 = Mat4

And, for a given scene we might render the objects multiple times - maybe something like this:

   Mesh0 : (Mat0->Mat4->Mat2)   Mesh0 : (Mat0->Mat4->Mat2)   Mesh0 : (Mat0->Mat4->Mat2)   Mesh0 : (Mat0->Mat4->Mat2)   Mesh1 : (Mat1->Mat3)   Mesh1 : (Mat1->Mat3)   Mesh2 : (Mat4)

Notice that the above list is in a simple manner - render each one until you don't need any more of them [smile], and is essentially the simplest most trivial way to go about rendering.

Now, if I put the previous list together with the materials list for each mesh, we get a flow something like:
Reset->(Mat0 -> Mat4 -> Mat2) -> (Mat0 -> Mat4 -> Mat2) -> (Mat0 -> Mat4 -> Mat2) -> (Mat0 -> Mat4 -> Mat2) -> (Mat1 -> Mat3) -> (Mat1 -> Mat3) -> (Mat4).

Looking at the previous material to material diagram, the above sequence would require 45 texture-changing steps to execute by only changing those texture slots that needed to be changed.

Although, based on the optimal pattern for visiting all materials (mentioned above) being Reset->Mat1->Mat2->Mat0->Mat3->Mat4 we can re-arrange drawing to be:
   Mesh1 : (Mat1)   Mesh1 : (Mat1)   Mesh0 : (Mat2)   Mesh0 : (Mat2)   Mesh0 : (Mat2)   Mesh0 : (Mat2)   Mesh0 : (Mat0)   Mesh0 : (Mat0)   Mesh0 : (Mat0)   Mesh0 : (Mat0)   Mesh1 : (Mat3)   Mesh1 : (Mat3)   Mesh0 : (Mat4)   Mesh0 : (Mat4)   Mesh0 : (Mat4)   Mesh0 : (Mat4)   Mesh2 : (Mat4)

Which should be only 12 texture changes. The improvement is a cool 73% [grin]. Based on the originally quoted cycle-count for a SetTexture() call, the difference is between 112,500 to 139,500 for the trivial implementation and 30,000 to 37,200 for the optimized sequence.

On my system, a very crude estimate puts 139,500 cycles in at 45 microseconds and 37,200 cycles in at 12 microseconds. What difference that has on the frame-rate is anyones guess [grin]

-- Summary --

I've tried to keep this pretty simple by focusing only on the texture changing aspects - but obviously there are other parameters (SetStreamSource(), SetIndices(), Set*Shader()...) that feature heavily in this process. It is likely that the final mesh-rendering listing above is not optimal for another type of state change. You would then need to weigh up whether the optimality of texture changing outweighs the sub-optimality of the other state changing.

It's not simple[attention]

Hopefully this will prove useful for people.
Till next time.
Jack

There are no comments to display.

## Create an account

Register a new account