Sign in to follow this  
_rapt0r

Frostbite rendering architecture question.

Recommended Posts

_rapt0r    111
Hello.
I was reading old Dice presentation from GDC 07 about Frostbite engine rendering architecture ([url="http://www.slideshare.net/repii/frostbite-rendering-architecture-and-realtime-procedural-shading-texturing-techniques-presentation?from=ss_embed"]slideshare[/url]). Most of it is pretty easy to understand, but I can't seem to comprehend how do they designed high-level states for render system (how do they pass dynamic number of state combinations to shading system and how to find solution for that combinations efficiently) (page 35).

[code]
* User queues up render blocks
Geometry & high--level state combinations
* Looks up solutions for the state combinations
Pipeline created these offline
* Blocks dispatched by backend to D3D/GCM Blocks dispatched by backend to D3D/GCM
Blocks are sorted (category & depth)
Backend sets platform--specific states and shaders
Determined by pipeline for that solution
Thin & dumb Thin & dumb
Draw
[/code]

Can anyone help me with some pseudocode that implements? Many thanks.

Share this post


Link to post
Share on other sites
osmanb    2082
Yes, except I think you have your array backwards, Hodgman. The actual fallback case would be an instance that supports *everything* (ie every bit is on). And you want permutations with fewer bits flipped on first. Picking those first because that suggests that the shader is a closer match to the incoming draw call (and probably faster). Edit: And that would also alter the if condition of your loop, you should be checking if the result of the bitwise-and is the incoming value, not the one stored in the permutation.

Share this post


Link to post
Share on other sites
maya18222    194
One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?

But even still, isnt the act of changing shaders a very expensive operation?

Whilst sorting to limit the number of changes is going to help, I can see it being problematic, as you're generally not just sorting in regard to shaders but also other things like, geometry, depth, transparency etc etc.

Share this post


Link to post
Share on other sites
Aqua Costa    3691
@Hodgamn

-Is there are straightforward way to add more shader options and its possible combinations? Or do I have to "calculate" them one by one myself?

-Using a u64 its possible to have 64 shader options, that can be turned on/off, right? Edited by TiagoCosta

Share this post


Link to post
Share on other sites
_rapt0r    111
[quote name='maya18222' timestamp='1309105022' post='4827924']
One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive.
[/quote]

Their system generates solutions offline. See page 34.

Share this post


Link to post
Share on other sites
MJP    19753
[quote name='maya18222' timestamp='1309105022' post='4827924']
One thing I've never understood about this approach is why it doesnt have a drastic negative impact on performance. Taking Hodgemans example, the number of permutations that have to be compiled could be massive. I assume you do a prepass of all shader combinations that are actually required to solve this?
[/quote]

Indeed, I would recommend using a distributed build system for anyone going down this route. Unless you like waiting hours for shaders to compile. :P

[quote name='maya18222' timestamp='1309105022' post='4827924']
But even still, isnt the act of changing shaders a very expensive operation?
[/quote]

It has some associated CPU and GPU overhead, sure. But you'll also save on GPU performance by precompiling the variations of your shader, rather than relying on branching at runtime. I know they also attempt to use instancing whenever they can in BF3, which helps reduce overheads

Share this post


Link to post
Share on other sites
David Neubelt    866
[quote name='Hodgman' timestamp='1309160306' post='4828140']
In practice, the worst offender I've had so far was a shader with ~240 permutations. The build time for it was about 2 minutes, which didn't require any of the above strategies besides the first [img]http://public.gamedev.net/public/style_emoticons/default/wink.gif[/img]
[/quote]

Is your build distributed? 2 mins sounds fast for 240 shader compiles.


Share this post


Link to post
Share on other sites
Krypt0n    4721
[quote name='David Neubelt' timestamp='1309226075' post='4828501']
[quote name='Hodgman' timestamp='1309160306' post='4828140']
In practice, the worst offender I've had so far was a shader with ~240 permutations. The build time for it was about 2 minutes, which didn't require any of the above strategies besides the first [img]http://public.gamedev.net/public/style_emoticons/default/wink.gif[/img]
[/quote]

Is your build distributed? 2 mins sounds fast for 240 shader compiles.
[/quote]depends on the compiler you use, dx9 compiles with 200ms in average and dx10/11 with 1000ms in average, consoles can have crazy ass optimization and take minutes to compile.
Crysis(1) seem to have above 64bit of flags, but they obviously don't use all permutations and I saw fxc in my task manager while running it, probably some runtime compilation if anything is missing.
once you reach 16bit of shaders, it's smart to cache them. our shader DB (server) handles ~2000 requests/s, 99.9% of the shaders are in the DB, just those not found are recompiled. the nightly build takes like 15min, without a DB it would be 4h+ for 16bit combinations.

Share this post


Link to post
Share on other sites
_rapt0r    111
One more question. Now very popular to make multi threaded renderers with command buffers (render block in frostbite, draw parts in killzone 2, etc).
In short the algorithm is as follows:
* Create render block (with geometry, shader, constants, texture list, flags,etc)
* Add to render queue
Later
* Sort, set and draw
The problem is that there are so many objects that are rendered in different ways, its not as simple as just passing an index and vertex buffer. And what happens with all the data for that object that needs to be rendered? May be somebody implemented similar systems. Thanks.

Share this post


Link to post
Share on other sites
_rapt0r    111
Thank you for your reply,
I did not understand what is draw-call structure in your system?
[code]
* a draw-call (e.g. a structure that maps to DrawIndexedPrimitive etc, and an enum saying which one it's mapping to)
[/code]
Why do you map draw functions?

Share this post


Link to post
Share on other sites
Krypt0n    4721
[quote name='rapt0r' timestamp='1309328318' post='4828966']
Thank you for your reply,
I did not understand what is draw-call structure in your system?
[code]
* a draw-call (e.g. a structure that maps to DrawIndexedPrimitive etc, and an enum saying which one it's mapping to)
[/code]
Why do you map draw functions?
[/quote]

something worth to read http://realtimecollisiondetection.net/blog/?p=86 :)

Share this post


Link to post
Share on other sites
RDragon1    1205
[quote name='maya18222' timestamp='1309105022' post='4827924']
But even still, isnt the act of changing shaders a very expensive operation?
[/quote]

Why do so many people think that switching shaders is some horrible thing to do? GPUs can overlap lots of different kinds of work at different stages of the pipeline with different costs. Fragment programs can be prefetched, and can be a virtually free operation, depending on how long the previous draw call takes to complete (if it's really fast, there's less opportunity to hide the time to load the new shader).

Share this post


Link to post
Share on other sites
Aqua Costa    3691
I found this rendering architecture post really interesting but I cannot think about a real case use.:blink:
In my projects I only use like 3/4 shaders one for transparent objects, one for opaque objects and one for terrain. (and some other shaders of HDR tone-mapping, shadow mapping, etc). Because every object has diffuse texture and normal/spec map.
Can someone please give me some examples of objects that need different shader permutations?
In this post we are talking about using 32/64 boolean options, and I can only think about some options like diffuse, normal, spec maps, parallax mapping and reflectivity...:huh:

Share this post


Link to post
Share on other sites
Hodgman    51220
[quote name='rapt0r' timestamp='1309328318' post='4828966']I did not understand what is draw-call structure in your system?[/quote]In the first post, you mention the "[i]render blocks -- Geometry & high--level state combinations[/i]".
"Geometry" isn't just a vertex buffer and an index buffer , because you can't draw buffers by themselves -- you also need the data that goes into calls like DrawIndexedPrimitives. e.g. at what offset do you start reading the buffer, how many items do you read from the buffer, what kind of primitives are you constructing, etc...

I treat the index/vertex buffers like all other states (they get put into state-groups), and then the "drawable" item that gets put into the queue is a structure that hold the arguments to DrawIndexedPrimitives/other draw calls.
[quote name='TiagoCosta' timestamp='1309345540' post='4829017']Can someone please give me some examples of objects that need different shader permutations?[/quote]In a forward rendered game, you might have options for the number of point lights, number of spot lights, number of directional lights, etc... e.g. a draw-call might want to find a permutation specifically compiled for 1 directional light, 2 spot lights and 5 point lights, and perform all that lighting in one pass.

If your objects have to be drawn several times in the pipeline, e.g. light-pre-pass which requires the attribute pass and the material pass, you can generate permutations from both, from the same source file.

You can even write a shader that supports a forward-lighting pipeline, a deferred shading pipeline, and a light-pre-pass pipeline ([i]with permutations for each pipeline being generated from the same source[/i]) -- the sample [url="http://horde3d.org/"]Horde3D[/url] shaders support multiple different pipelines like this.

If you use the same materials on skinned and non-skinned objects, then it's convenient simply write one shader, and let the skinned objects automatically choose the permutation that includes the animation code.

If not every object in your world needs normal maps, it's convenient to let you artists use the same shader on normal-mapped and non-normal-mapped objects, and have the right permutation be automatically chosen based on whether they've assigned a normal-map texture or not. Same goes for any other features your artists want to use from time to time, like per-vertex colouring, without either (a) forcing them to use it all the time, or (b) making them choose from a list of feature permutations manually.

If you start adding special behaviours or effects to objects, then it's very simple with a system like this in place. Say, you want to make certain objects have a yellow highlight when the player select them -- just add a shader option for a 'highlight' state. Then the game code can simply turn on and off a flag in the object's per-instance buffer, instead of manually swapping it's shader temporarily.

Out of my 64-bits, some are hard-coded to certain states ([i]e.g. is there a vertex-colour stream, is there a 2nd UV stream, is a particular sampler bound, etc...[/i]), while the rest can be defined by shaders/game-play code however they want.

Share this post


Link to post
Share on other sites
Hiyar    130
Also, if you want to distribute different materials on your terrain in single pass (parallax occlusion for cliffs, simple textures for grass etc), you will have shader permutations.
I think they explain this in that paper.

Share this post


Link to post
Share on other sites
Krypt0n    4721
[quote name='RDragon1' timestamp='1309334136' post='4828980']
[quote name='maya18222' timestamp='1309105022' post='4827924']
But even still, isnt the act of changing shaders a very expensive operation?
[/quote]

Why do so many people think that switching shaders is some horrible thing to do? GPUs can overlap lots of different kinds of work at different stages of the pipeline with different costs. Fragment programs can be prefetched, and can be a virtually free operation, depending on how long the previous draw call takes to complete (if it's really fast, there's less opportunity to hide the time to load the new shader).
[/quote]

back then, when the first 3d gpus arrived, even texture switching was expensive and the same issues carried over to shaders.


The GPU pipeline is split into different sub-pipelines, each works on its "jobs" which are not draw-calls and not primitives, those are simply jobs. if you draw/drew two objects with exact the same settings, it is/was very likely that they share the same pipelines to some degree.

if you change some setup of some sub pipeline, as the whole GPU doesn't track what job belongs to which drawcall or setup (that would be very expensive for little gain), that part of the pipeline and sometimes the whole pipeline had to be flushed (or in some lucky cases just a fence was added, which flushes partially). So the gpu side of the cost wasn't really just switching some resource, it's the stall you have due to the flush that sets the sub-pipelines to idle that costs.

on cpu side, switching shader means often that they need to be prepared, not all features you see on api side are really features, they are often just patched shaders, so something that might look for you like a simple shader switch might be like a recompilation because you have some "weird" texture set or a vertex format that the shader "emulates".

as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).










@Tiago

there are tons of possibilities, I think most ppl rather struggle to limit their 'bits' :)

- shadows on/of

- lightsource type (point/direction/spot)

- fog on/off

- in forward rendering you might have n-lights

- skinning

- morphing

- detail layer

- vertex shading (instead of pixel, for some distance LODs?)

- parallax mapping

- switching between normalmap and bumpmap

- back lighting like on thin sails, flags, vegetation, paper

- some sinus swinging like vegetation underwater or in wind

- rim lighting

- cubemap lighting

- clip (dx10> does not have alphatest hardware)

...

I don't say you have to have all, but some engines have it.

Share this post


Link to post
Share on other sites
_rapt0r    111
[quote name='Hodgman' timestamp='1309349297' post='4829030']
In the first post, you mention the "[i]render blocks -- Geometry & high--level state combinations[/i]".
"Geometry" isn't just a vertex buffer and an index buffer , because you can't draw buffers by themselves -- you also need the data that goes into calls like DrawIndexedPrimitives. e.g. at what offset do you start reading the buffer, how many items do you read from the buffer, what kind of primitives are you constructing, etc...

I treat the index/vertex buffers like all other states (they get put into state-groups), and then the "drawable" item that gets put into the queue is a structure that hold the arguments to DrawIndexedPrimitives/other draw calls.
[/quote]

How do you convert High-level '[i]Drawables[/i]' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?

Share this post


Link to post
Share on other sites
Aqua Costa    3691
So this is how I'm designing my rendering architecture:

-> ShaderProgram class (contains the shaders and cbuffers):
-> Material class (contains the textures, blend/rasterizer/depth stencil states and other booleans used by this material, and chooses the correct shader permutation based on the booleans and textures provided);
-> Actor class (contains the vertices/indices buffers, instance buffers (if needed), pointers to the materials used by this actor, the world/bone matrices and the bounding boxes);

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?

I'm totally open to suggestions :wink:

Also, to the GBuffer pass I sort the objects in front-to-back order, then in the second geometry/material pass how should I order the objects? By shader programs, blend states?

Share this post


Link to post
Share on other sites
Quat    568
Sorry to ask a new question in this thread--I can make a new topic if that is better.

How do you create all your shader variations. Right now I am using the effects framework with compile time flags to switch things on and off and I literally compile the shaders with the flags set with the options I want enabled. Obviously I only need to type this out once, but it still seems like there is a better way than (pseudocode):

TwoLightsTexReflect = CompileShader(2, true, false, false, true);
OneLightsTexAlphaTestFog = CompileShader(1, true, true, true, false);
....
ugh

Share this post


Link to post
Share on other sites
Krypt0n    4721
[quote name='TiagoCosta' timestamp='1309384242' post='4829255']
So this is how I'm designing my rendering architecture:

-> ShaderProgram class (contains the shaders and cbuffers):
-> Material class (contains the textures, blend/rasterizer/depth stencil states and other booleans used by this material, and chooses the correct shader permutation based on the booleans and textures provided);
-> Actor class (contains the vertices/indices buffers, instance buffers (if needed), pointers to the materials used by this actor, the world/bone matrices and the bounding boxes);

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?
[/quote]
all my drawcalls are send from one tight loop that's having pointer to all needed VB/IB/CB/Shader/Textures, there is no need nowadays to have specialized drawing functions in all type of entities.

[quote]

Also, to the GBuffer pass I sort the objects in front-to-back order, then in the second geometry/material pass how should I order the objects? By shader programs, blend states?


[/quote]

I sort the objects once, not per pass, to avoid tiny differences that could result in different rendering orders (e.g. if you dynamically detect cases where instancing could be a win and your g-pass without instancing is setting different z-values than your geometry/material-pass). I also sort the list with a stable-sort, as you could otherwise encounter object flickering due to changing draw order in-between frames (e.g. two decals on a wall that change draw order would be noticeable).

the sorting order shall be in a way that you have as few state switches as possible. if you'd have two mesh types and 100 different shader to draw them, it wouldn't be smart to give the shaders a higher priority, as you'd switch 100 shader and probably 200 times meshes (VB/IB), if you'd sort by meshes, you'd have 2 switches for meshes and then up to 100 shader switches per mesh. This is also hardware, driver, platform dependent, no general way it would always work best. But if you sort and organize the pipeline like this, you can't be worst than the vanilla immediate rendering and usually it'll be a win.




Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this