Jump to content

  • Log In with Google      Sign In   
  • Create Account

We need your feedback on a survey! Each completed response supports our community and gives you a chance to win a $25 Amazon gift card!


Frostbite rendering architecture question.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
76 replies to this topic

#1 rapt0r   Members   -  Reputation: 111

Like
1Likes
Like

Posted 25 June 2011 - 03:44 AM

Hello.
I was reading old Dice presentation from GDC 07 about Frostbite engine rendering architecture (slideshare). Most of it is pretty easy to understand, but I can't seem to comprehend how do they designed high-level states for render system (how do they pass dynamic number of state combinations to shading system and how to find solution for that combinations efficiently) (page 35).

* User queues up  render blocks
   	Geometry & high--level state combinations
*  Looks up solutions for the state combinations 
      Pipeline created these offline 
* Blocks dispatched by backend to D3D/GCM Blocks dispatched by backend to D3D/GCM
      Blocks are sorted (category & depth) 
  	Backend sets platform--specific states and shaders 
 		Determined by pipeline for that solution 
 		Thin & dumb Thin & dumb
      Draw

Can anyone help me with some pseudocode that implements? Many thanks.

Sponsor:

#2 Hodgman   Moderators   -  Reputation: 32049

Like
11Likes
Like

Posted 25 June 2011 - 05:51 AM

I don't know how Dice does it, but to represent high-level states like that, I use a 64-bit bitfield.
For the look-up, to find a shader that satisfies some particular combination of states, I use a linear search.
ShaderPrograms* FindProgramsForStates( u64 stateBitfield )
{
  for( int i=0; i != m_permutations.size(); ++i )
  {
    u64 shaderStates = m_permutations[i]->supportedStates;
    if( (stateBitfield & shaderStates) == shaderStates )
      return m_permutations[i]->shaderPrograms;
  }
  //a valid permutation should always be found, this code should never be reached.
  assert(false);
  return NULL;
}
For example, let's say we've got two shader options, and particular bits associated with them:
u64 F_DiffuseTexture  = 1<<0;
u64 F_VertexColours   = 1<<1;
We've also got a simple HLSL program that supports these options:
float4 ps_main( Interpolator IN ) : COLOR
{
  float4 diffuse = float4(1,1,1,1);

#ifdef F_DiffuseTexture
  diffuse *= tex2D( Diffuse, IN.uv );
#endif

#ifdef F_VertexColours
  diffuse *= IN.color;
#endif

  return diffuse;
}
For every combination of these options, a "shader permutation" will be compiled. Each permutation has a compiled shader program and a bit-mask saying which options it was compiled with:
permutations[0].supportedStates = 3 (F_DiffuseTexture|F_VertexColours)
permutations[0].shaderProgram = (the above code compiled with "/D F_DiffuseTexture /D F_VertexColours")

permutations[1].supportedStates = 2 (F_VertexColours)
permutations[1].shaderProgram = (the above code compiled with "/D F_VertexColours")

permutations[2].supportedStates = 1 (F_DiffuseTexture)
permutations[2].shaderProgram = (the above code compiled with "/D F_DiffuseTexture")

permutations[3].supportedStates = 0
permutations[3].shaderProgram = (the above code compiled with no defines)

In order for the linear-search to work properly, the permutations have to be sorted in a particular order -- permutations with more options set need to come before ones with less options set. Also, there must always be a permutation with zero options set (which will be the last in the sorted list) so that a compatible permutation will always be found by the for loop (because the loop's if condition will always be true for "(stateBitfield & 0) == 0").
e.g. The "F_DiffuseTexture|F_VertexColours" permutation comes first, because it's got two options set. The "F_VertexColours" permutation and the "F_DiffuseTexture" permutation come next, because they're tied for having one option set. The "no states" one always comes last, because, it's got zero options set.

#3 osmanb   Crossbones+   -  Reputation: 1628

Like
0Likes
Like

Posted 25 June 2011 - 07:53 AM

Yes, except I think you have your array backwards, Hodgman. The actual fallback case would be an instance that supports *everything* (ie every bit is on). And you want permutations with fewer bits flipped on first. Picking those first because that suggests that the shader is a closer match to the incoming draw call (and probably faster). Edit: And that would also alter the if condition of your loop, you should be checking if the result of the bitwise-and is the incoming value, not the one stored in the permutation.

#4 Hodgman   Moderators   -  Reputation: 32049

Like
6Likes
Like

Posted 25 June 2011 - 10:16 AM

You've misunderstood the workings of that loop.

If your fallback is a shader that implements all features then your rendering will be corrupt. The selected program must only implement features that the user asked for, and no more.
e.g. if the user hasn't bound a diffuse texture, but you select a program that was compiled with F_DiffuseTexture defined, then the shader will try to sample from a texture which has not been bound!

As it is, the loop already selects the shader that is the closest match to the incoming draw-call.
If you sort in the other direction (fewer bits on first), then the loop will always terminate on the first iteration. If you also change the if condition as you suggest, then it will fail to select a program at all in cases where there is no permutation which exactly matches the current render-state (which is a common occurrence).
e.g. If the user has enabled alpha-testing, but the current shader is opaque (constant alpha value), then it will not contain any permutations for the alpha-testing option.
If you fix this by requiring every shader to support all 64 possible options (in order to guarantee an exact match to the current state), then you'll be searching through a list of ~10^19 permutations.

#5 rapt0r   Members   -  Reputation: 111

Like
1Likes
Like

Posted 26 June 2011 - 02:55 AM

Now it is clear. Many thanks!

#6 maya18222   Members   -  Reputation: 191

Like
0Likes
Like

Posted 26 June 2011 - 10:17 AM

One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?

But even still, isnt the act of changing shaders a very expensive operation?

Whilst sorting to limit the number of changes is going to help, I can see it being problematic, as you're generally not just sorting in regard to shaders but also other things like, geometry, depth, transparency etc etc.

#7 TiagoCosta   Crossbones+   -  Reputation: 2487

Like
0Likes
Like

Posted 26 June 2011 - 10:23 AM

@Hodgamn

-Is there are straightforward way to add more shader options and its possible combinations? Or do I have to "calculate" them one by one myself?

-Using a u64 its possible to have 64 shader options, that can be turned on/off, right?

Edited by TiagoCosta, 26 June 2011 - 02:51 PM.


#8 rapt0r   Members   -  Reputation: 111

Like
0Likes
Like

Posted 26 June 2011 - 10:27 AM

One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive.


Their system generates solutions offline. See page 34.

#9 MJP   Moderators   -  Reputation: 11850

Like
0Likes
Like

Posted 26 June 2011 - 01:01 PM

One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?


Indeed, I would recommend using a distributed build system for anyone going down this route. Unless you like waiting hours for shaders to compile. :P

But even still, isnt the act of changing shaders a very expensive operation?


It has some associated CPU and GPU overhead, sure. But you'll also save on GPU performance by precompiling the variations of your shader, rather than relying on branching at runtime. I know they also attempt to use instancing whenever they can in BF3, which helps reduce overheads

#10 Hodgman   Moderators   -  Reputation: 32049

Like
6Likes
Like

Posted 27 June 2011 - 01:38 AM

-Is there are straightforward way to add more shader options and its possible combinations? Or do I have to "calculate" them one by one myself?

Generating permutations is a pretty old problem, there's a lot of algorithms.

The most basic one is:
for( int permutation = 0; permutation != 1<<(maxOption+1); ++permutation )
{
  // do something with the permutation:
  if( permutation & 1<<0 )//has option 0 enabled
    ...
  if( permutation & 1<<1 )//has option 1 enabled
    ...
}

-Using a u64 its possible to have 64 shader options, that can be turned on/off, right?

Yes, or alternatively, you can use several bits to represent an integer option.

One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?

Yeah it's a asset-build-time cost, not a runtime cost. At runtime you usually get better performance by eliminating branches from your shader programs.

The number of permutations can be massive. If a graphics programmer uses 32 boolean options in the one file, they'll end up with 4 billion permutations! If they use 64 boolean options, that blows out to more permutations than there are grains of sand on the earth!


There's a number of ways to mitigate this:
* Be aware that adding an option can double your build time. Don't go crazy with them.
* As assets are built, cache them on your network so other team-members don't also have to build them.
* Allow options to form a hierarchy. E.g. perhaps the 'parallax' option is nested inside the 'normal-map' option. This stops you from generating permutations that will never be used.
* If you have integer options, allow a valid range to be specified. e.g. if you support 1-to-6 lights, that requires 3 bits, which a naive solution would treat as having 8 combinations (instead of 6).
* Make your asset builder smarter by inspecting the dependency chain. By looking at the materials that use a shader, you can determine which options have actually been used by the artists. E.g., maybe your shader has a "has diffuse texture" option, but it turns out that this option is always enabled -- in that case, you don't need to generate the "no diffuse texture" permutations.

In practice, the worst offender I've had so far was a shader with ~240 permutations. The build time for it was about 2 minutes, which didn't require any of the above strategies besides the first Posted Image

But even still, isnt the act of changing shaders a very expensive operation?

All state-changes have some kind of cost associated with them, which you should try to minimise, sure. Back on GeForce7 (or earlier) hardware, changing shaders or shader-constants was very expensive CPU-wise, but I wouldn't be too afraid of it as to ruin your performance by forcing everything to use the same general-purpose shader.
Even if switching shaders is expensive, it's a cost you pay per object. On the other hand, a shader-branch, or some unused shader code can be a cost you pay per pixel (i.e. up to a million times per object). As with any optimisation trade-off, it depends on your circumstances.

#11 David Neubelt   Members   -  Reputation: 794

Like
0Likes
Like

Posted 27 June 2011 - 07:54 PM

In practice, the worst offender I've had so far was a shader with ~240 permutations. The build time for it was about 2 minutes, which didn't require any of the above strategies besides the first Posted Image


Is your build distributed? 2 mins sounds fast for 240 shader compiles.



Graphics Programmer - Ready At Dawn Studios

#12 Krypt0n   Crossbones+   -  Reputation: 2686

Like
3Likes
Like

Posted 28 June 2011 - 09:07 AM


In practice, the worst offender I've had so far was a shader with ~240 permutations. The build time for it was about 2 minutes, which didn't require any of the above strategies besides the first Posted Image


Is your build distributed? 2 mins sounds fast for 240 shader compiles.

depends on the compiler you use, dx9 compiles with 200ms in average and dx10/11 with 1000ms in average, consoles can have crazy ass optimization and take minutes to compile.
Crysis(1) seem to have above 64bit of flags, but they obviously don't use all permutations and I saw fxc in my task manager while running it, probably some runtime compilation if anything is missing.
once you reach 16bit of shaders, it's smart to cache them. our shader DB (server) handles ~2000 requests/s, 99.9% of the shaders are in the DB, just those not found are recompiled. the nightly build takes like 15min, without a DB it would be 4h+ for 16bit combinations.

#13 rapt0r   Members   -  Reputation: 111

Like
0Likes
Like

Posted 28 June 2011 - 04:59 PM

One more question. Now very popular to make multi threaded renderers with command buffers (render block in frostbite, draw parts in killzone 2, etc).
In short the algorithm is as follows:
* Create render block (with geometry, shader, constants, texture list, flags,etc)
* Add to render queue
Later
* Sort, set and draw
The problem is that there are so many objects that are rendered in different ways, its not as simple as just passing an index and vertex buffer. And what happens with all the data for that object that needs to be rendered? May be somebody implemented similar systems. Thanks.

#14 Hodgman   Moderators   -  Reputation: 32049

Like
5Likes
Like

Posted 28 June 2011 - 08:22 PM

Is your build distributed? 2 mins sounds fast for 240 shader compiles.

No, and there wasn't a network cache either. As in Krypt0n's experience, each permutation only took about a second.
Once you took compiling for multiple platforms into account, the number got higher! I really would've loved to have had a distributed build system, or at least a network cache at the time though...

In short the algorithm is as follows:
* Create render block (with geometry, shader, constants, texture list, flags,etc)
* Add to render queue
The problem is that there are so many objects that are rendered in different ways, its not as simple as just passing an index and vertex buffer. And what happens with all the data for that object that needs to be rendered?

As you just wrote, you don't just pass an index and vertex buffer, you also pass shaders, contstants, textures, flags, etc... Posted Image

I've posted in some other threads here and here, but the approach I've taken is to create a "state group", which is a container that stores a (variable) number of render-states, such as blend modes, cbuffer bindings, texture bindings, high-level flags (like discussed in this thread), etc...

When you submit a draw-call, you also submit a stack of these state-groups. If the same state is set by multiple groups in the stack, then the version that's higher up in the stack will be used.
Typically the stack looks like:
* Per-instance group -- binds cbuffer containing world matrix, etc.
* Material group -- binds textures and material cbuffers.
* Shader group -- binds shader and cbuffers containing default values (which are usually overridden by material group).

When processing a group of submitted draw-calls, there's also a default group, which is implicitly put at the bottom of every stack. This group contains sensible defaults, like "AlphaBlend = false".

So I guess my "render blocks" are a draw-call paired with a stack of state-groups (and a sorting key, etc).

Generally, the pattern for drawing anything looks like:
At creation time: create any required cbuffers, create any required render-groups to bind them and set states.
At draw-time: copy new state into cbuffers, submit draw-calls paired with render-groups.


Depending on how you implement this stuff, it can either be very efficient or very slow. At an old job, we spent about 30% of our CPU time inside state-group related functions... In my current engine, it's about 1.5%. In my current implementation, I've made state-groups immutable, which makes management a lot simpler -- even though state-groups are variable size, they can't be resized after they've been created.

#15 rapt0r   Members   -  Reputation: 111

Like
0Likes
Like

Posted 29 June 2011 - 12:18 AM

Thank you for your reply,
I did not understand what is draw-call structure in your system?
* a draw-call (e.g. a structure that maps to DrawIndexedPrimitive etc, and an enum saying which one it's mapping to)
Why do you map draw functions?

#16 Krypt0n   Crossbones+   -  Reputation: 2686

Like
1Likes
Like

Posted 29 June 2011 - 01:54 AM

Thank you for your reply,
I did not understand what is draw-call structure in your system?

* a draw-call (e.g. a structure that maps to DrawIndexedPrimitive etc, and an enum saying which one it's mapping to)
Why do you map draw functions?


something worth to read http://realtimecollisiondetection.net/blog/?p=86 :)

#17 rdragon1   Crossbones+   -  Reputation: 1200

Like
0Likes
Like

Posted 29 June 2011 - 01:55 AM

But even still, isnt the act of changing shaders a very expensive operation?


Why do so many people think that switching shaders is some horrible thing to do? GPUs can overlap lots of different kinds of work at different stages of the pipeline with different costs. Fragment programs can be prefetched, and can be a virtually free operation, depending on how long the previous draw call takes to complete (if it's really fast, there's less opportunity to hide the time to load the new shader).

#18 TiagoCosta   Crossbones+   -  Reputation: 2487

Like
0Likes
Like

Posted 29 June 2011 - 05:05 AM

I found this rendering architecture post really interesting but I cannot think about a real case use.:blink:
In my projects I only use like 3/4 shaders one for transparent objects, one for opaque objects and one for terrain. (and some other shaders of HDR tone-mapping, shadow mapping, etc). Because every object has diffuse texture and normal/spec map.
Can someone please give me some examples of objects that need different shader permutations?
In this post we are talking about using 32/64 boolean options, and I can only think about some options like diffuse, normal, spec maps, parallax mapping and reflectivity...:huh:

#19 Hodgman   Moderators   -  Reputation: 32049

Like
1Likes
Like

Posted 29 June 2011 - 06:08 AM

I did not understand what is draw-call structure in your system?

In the first post, you mention the "render blocks -- Geometry & high--level state combinations".
"Geometry" isn't just a vertex buffer and an index buffer , because you can't draw buffers by themselves -- you also need the data that goes into calls like DrawIndexedPrimitives. e.g. at what offset do you start reading the buffer, how many items do you read from the buffer, what kind of primitives are you constructing, etc...

I treat the index/vertex buffers like all other states (they get put into state-groups), and then the "drawable" item that gets put into the queue is a structure that hold the arguments to DrawIndexedPrimitives/other draw calls.

Can someone please give me some examples of objects that need different shader permutations?

In a forward rendered game, you might have options for the number of point lights, number of spot lights, number of directional lights, etc... e.g. a draw-call might want to find a permutation specifically compiled for 1 directional light, 2 spot lights and 5 point lights, and perform all that lighting in one pass.

If your objects have to be drawn several times in the pipeline, e.g. light-pre-pass which requires the attribute pass and the material pass, you can generate permutations from both, from the same source file.

You can even write a shader that supports a forward-lighting pipeline, a deferred shading pipeline, and a light-pre-pass pipeline (with permutations for each pipeline being generated from the same source) -- the sample Horde3D shaders support multiple different pipelines like this.

If you use the same materials on skinned and non-skinned objects, then it's convenient simply write one shader, and let the skinned objects automatically choose the permutation that includes the animation code.

If not every object in your world needs normal maps, it's convenient to let you artists use the same shader on normal-mapped and non-normal-mapped objects, and have the right permutation be automatically chosen based on whether they've assigned a normal-map texture or not. Same goes for any other features your artists want to use from time to time, like per-vertex colouring, without either (a) forcing them to use it all the time, or (b) making them choose from a list of feature permutations manually.

If you start adding special behaviours or effects to objects, then it's very simple with a system like this in place. Say, you want to make certain objects have a yellow highlight when the player select them -- just add a shader option for a 'highlight' state. Then the game code can simply turn on and off a flag in the object's per-instance buffer, instead of manually swapping it's shader temporarily.

Out of my 64-bits, some are hard-coded to certain states (e.g. is there a vertex-colour stream, is there a 2nd UV stream, is a particular sampler bound, etc...), while the rest can be defined by shaders/game-play code however they want.

#20 Hiyar   Members   -  Reputation: 130

Like
0Likes
Like

Posted 29 June 2011 - 06:23 AM

Also, if you want to distribute different materials on your terrain in single pass (parallax occlusion for cliffs, simple textures for grass etc), you will have shader permutations.
I think they explain this in that paper.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS