Frostbite rendering architecture question.

Started by
75 comments, last by n3Xus 12 years, 8 months ago
Hello.
I was reading old Dice presentation from GDC 07 about Frostbite engine rendering architecture (slideshare). Most of it is pretty easy to understand, but I can't seem to comprehend how do they designed high-level states for render system (how do they pass dynamic number of state combinations to shading system and how to find solution for that combinations efficiently) (page 35).


* User queues up render blocks
Geometry & high--level state combinations
* Looks up solutions for the state combinations
Pipeline created these offline
* Blocks dispatched by backend to D3D/GCM Blocks dispatched by backend to D3D/GCM
Blocks are sorted (category & depth)
Backend sets platform--specific states and shaders
Determined by pipeline for that solution
Thin & dumb Thin & dumb
Draw


Can anyone help me with some pseudocode that implements? Many thanks.
Advertisement
I don't know how Dice does it, but to represent high-level states like that, I use a 64-bit bitfield.
For the look-up, to find a shader that satisfies some particular combination of states, I use a linear search.
ShaderPrograms* FindProgramsForStates( u64 stateBitfield )
{
for( int i=0; i != m_permutations.size(); ++i )
{
u64 shaderStates = m_permutations->supportedStates;
if( (stateBitfield & shaderStates) == shaderStates )
return m_permutations->shaderPrograms;
}
//a valid permutation should always be found, this code should never be reached.
assert(false);
return NULL;
}

For example, let's say we've got two shader options, and particular bits associated with them:u64 F_DiffuseTexture = 1<<0;
u64 F_VertexColours = 1<<1;
We've also got a simple HLSL program that supports these options:float4 ps_main( Interpolator IN ) : COLOR
{
float4 diffuse = float4(1,1,1,1);

#ifdef F_DiffuseTexture
diffuse *= tex2D( Diffuse, IN.uv );
#endif

#ifdef F_VertexColours
diffuse *= IN.color;
#endif

return diffuse;
}

For every combination of these options, a "shader permutation" will be compiled. Each permutation has a compiled shader program and a bit-mask saying which options it was compiled with:permutations[0].supportedStates = 3 (F_DiffuseTexture|F_VertexColours)
permutations[0].shaderProgram = (the above code compiled with "/D F_DiffuseTexture /D F_VertexColours")

permutations[1].supportedStates = 2 (F_VertexColours)
permutations[1].shaderProgram = (the above code compiled with "/D F_VertexColours")

permutations[2].supportedStates = 1 (F_DiffuseTexture)
permutations[2].shaderProgram = (the above code compiled with "/D F_DiffuseTexture")

permutations[3].supportedStates = 0
permutations[3].shaderProgram = (the above code compiled with no defines)


In order for the linear-search to work properly, the permutations have to be sorted in a particular order -- permutations with more options set need to come before ones with less options set. Also, there must always be a permutation with zero options set (which will be the last in the sorted list) so that a compatible permutation will always be found by the [font="Courier New"]for[/font] loop (because the loop's [font="Courier New"]if[/font] condition will always be true for "[font="Courier New"](stateBitfield & 0) == 0[/font]").
e.g. The "F_DiffuseTexture|F_VertexColours" permutation comes first, because it's got two options set. The "F_VertexColours" permutation and the "F_DiffuseTexture" permutation come next, because they're tied for having one option set. The "no states" one always comes last, because, it's got zero options set.
Yes, except I think you have your array backwards, Hodgman. The actual fallback case would be an instance that supports *everything* (ie every bit is on). And you want permutations with fewer bits flipped on first. Picking those first because that suggests that the shader is a closer match to the incoming draw call (and probably faster). Edit: And that would also alter the if condition of your loop, you should be checking if the result of the bitwise-and is the incoming value, not the one stored in the permutation.
You've misunderstood the workings of that loop.

If your fallback is a shader that implements all features then your rendering will be corrupt. The selected program must only implement features that the user asked for, and no more.
e.g. if the user hasn't bound a diffuse texture, but you select a program that was compiled with [font="Courier New"]F_DiffuseTexture[/font] defined, then the shader will try to sample from a texture which has not been bound!

As it is, the loop already selects the shader that is the closest match to the incoming draw-call.
If you sort in the other direction (fewer bits on first), then the loop will always terminate on the first iteration. If you also change the [font="Courier New"]if[/font] condition as you suggest, then it will fail to select a program at all in cases where there is no permutation which exactly matches the current render-state (which is a common occurrence).
e.g. If the user has enabled alpha-testing, but the current shader is opaque (constant alpha value), then it will not contain any permutations for the alpha-testing option.
If you fix this by requiring every shader to support all 64 possible options (in order to guarantee an exact match to the current state), then you'll be searching through a list of ~10[sup]^19[/sup] permutations.
Now it is clear. Many thanks!
One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?

But even still, isnt the act of changing shaders a very expensive operation?

Whilst sorting to limit the number of changes is going to help, I can see it being problematic, as you're generally not just sorting in regard to shaders but also other things like, geometry, depth, transparency etc etc.
@Hodgamn

-Is there are straightforward way to add more shader options and its possible combinations? Or do I have to "calculate" them one by one myself?

-Using a u64 its possible to have 64 shader options, that can be turned on/off, right?

One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive.


Their system generates solutions offline. See page 34.

One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?


Indeed, I would recommend using a distributed build system for anyone going down this route. Unless you like waiting hours for shaders to compile. :P


But even still, isnt the act of changing shaders a very expensive operation?


It has some associated CPU and GPU overhead, sure. But you'll also save on GPU performance by precompiling the variations of your shader, rather than relying on branching at runtime. I know they also attempt to use instancing whenever they can in BF3, which helps reduce overheads
-Is there are straightforward way to add more shader options and its possible combinations? Or do I have to "calculate" them one by one myself?
Generating permutations is a pretty old problem, there's a lot of algorithms.

The most basic one is:
for( int permutation = 0; permutation != 1<<(maxOption+1); ++permutation )
{
// do something with the permutation:
if( permutation & 1<<0 )//has option 0 enabled
...
if( permutation & 1<<1 )//has option 1 enabled
...
}
-Using a u64 its possible to have 64 shader options, that can be turned on/off, right?[/quote]Yes, or alternatively, you can use several bits to represent an integer option.
One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?
Yeah it's a asset-build-time cost, not a runtime cost. At runtime you usually get better performance by eliminating branches from your shader programs.
[font="arial, verdana, tahoma, sans-serif"]
The number of permutations can be massive. If a graphics programmer uses 32 boolean options in the one file, they'll end up with 4 billion permutations! If they use 64 boolean options, that blows out to more permutations than there are grains of sand on the earth![/font]

There's a number of ways to mitigate this:
* Be aware that adding an option can double your build time. Don't go crazy with them.
* As assets are built, cache them on your network so other team-members don't also have to build them.
* Allow options to form a hierarchy. E.g. perhaps the 'parallax' option is nested inside the 'normal-map' option. This stops you from generating permutations that will never be used.
* If you have integer options, allow a valid range to be specified. e.g. if you support 1-to-6 lights, that requires 3 bits, which a naive solution would treat as having 8 combinations (instead of 6).
* Make your asset builder smarter by inspecting the dependency chain. By looking at the materials that use a shader, you can determine which options have actually been used by the artists. E.g., maybe your shader has a "has diffuse texture" option, but it turns out that this option is always enabled -- in that case, you don't need to generate the "no diffuse texture" permutations.

In practice, the worst offender I've had so far was a shader with ~240 permutations. The build time for it was about 2 minutes, which didn't require any of the above strategies besides the first wink.gif
But even still, isnt the act of changing shaders a very expensive operation?[/quote]All state-changes have some kind of cost associated with them, which you should try to minimise, sure. Back on GeForce7 (or earlier) hardware, changing shaders or shader-constants was very expensive CPU-wise, but I wouldn't be too afraid of it as to ruin your performance by forcing everything to use the same general-purpose shader.
Even if switching shaders is expensive, it's a cost you pay per object. On the other hand, a shader-branch, or some unused shader code can be a cost you pay per pixel (i.e. up to a million times per object). As with any optimisation trade-off, it depends on your circumstances.

This topic is closed to new replies.

Advertisement