# Frostbite rendering architecture question.

This topic is 2348 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Sorry to ask a new question in this thread--I can make a new topic if that is better.

How do you create all your shader variations. Right now I am using the effects framework with compile time flags to switch things on and off and I literally compile the shaders with the flags set with the options I want enabled. Obviously I only need to type this out once, but it still seems like there is a better way than (pseudocode):

TwoLightsTexReflect = CompileShader(2, true, false, false, true);
OneLightsTexAlphaTestFog = CompileShader(1, true, true, true, false);
....
ugh

how bout

 for(int a=0;a<1<<bits;a++) { std::string Flags; if(a&1) Flags+="TwiSideLight"; if(a&2) Flags+="CheeseTexture"; if(a&4) Flags+="GameOfLifeTexture"; ... Compile(...,Flags...); } 

you dont really want to write 64k of permutations, even with a lot of spare time ;)

##### Share on other sites
The older (DX9 and DX10) versions of the effect framework supports arrays of shaders. So you could make an array of N pixel shaders for N light sources, and then in your app code set an integer to specify the number of lights and the framework would select the correct version of the shader to use. The skinning sample in the SDK does this.

##### Share on other sites

How do you convert High-level 'Drawables' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?
So I've basically got: DrawCalls, Resources (cbuffers, vertex/index buffers, shaders) and StateGroups --- you can use these primitives to compose layers of functionality.
It might be easier to describe with some pseudo-code. For example, we could have a [font="Courier New"]PlayerEntity[/font], which has a [font="Courier New"]ModelInstance[/font], which has a [font="Courier New"]Geometry[/font], which has a [font="Courier New"]Mesh[/font], which has a [font="Courier New"]DrawCall[/font]/[font="Courier New"]Material[/font] pair, which has a [font="Courier New"]Shader[/font]:[source lang=cpp]

{
StateGroup* state;
vector<cosnt CBuffer*> defaults;

{
defaults = programs->GetDefaultCBuffers();

state = new StateGroup();
for( i=0; i<defaults.size(); ++i )
}
}

struct MaterialRes
{
StateGroup* state;
vector<CBuffer*> cbuffers;

Material( ShaderRes* s, vector<CBuffer*>& v )
{
cbuffers = v;

state = new StateGroup();
for( i=0; i<cbuffers.size(); ++i )
}
}

struct MeshRes
{
DrawCall* draw;
MaterialRes* material;
}

struct GeometryRes
{
StateGroup* state;
VertexBuffer* vb;
IndexBuffer* ib;
vector<MeshRes*> meshes;

Material()
{
state = new StateGroup();
}
}

struct ModelInstance
{
StateGroup* state;
CBuffer* constants;
Geometry* model;

ModelInstance()
{
constants = new CBuffer_InstanceData();
constants->SetProperty( "WorldMatrix", Identity );
}

void Draw( RenderQueue& queue )
{
for( i=0; i!=model->meshes.size(); ++i )
{
Mesh* mesh = model->meshes;
StateGroup* stateMaterial = mesh->material->state;
StateGroup* stateGeometry = mesh->state;
StateGroup* stateInstance = state;

StateGroup* states[4] = { stateInstance, stateGeometry, stateMaterial, stateShader };
DrawCall* draw = mesh->draw;
queue.Submit( draw, states, 4 );
}
}
}

struct PlayerEntity
{
ModelInstance* body;
ModelInstance* gun;

void Draw( RenderQueue& queue )
{
body->Draw(queue);
gun->Draw(queue);
}
}[/source][font="arial, verdana, tahoma, sans-serif"]

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?
An actor might be made up of several draw calls, and those draw-calls might need to be drawn at different stages of the pipeline -- e.g. if part of the actor is opaque and part is translucent.

To easily deal with this, I would have each actor submit it's meshes/drawables/whatever to the renderer, and have the renderer call the actual "Draw" functions at the appropriate times. Or, have the actor 'register' it's "drawables" with some kind of rendering manager in advance, and let that manager object perform the submission on behalf of the actor (this way the actor doesn't have a draw function at all).[/font]

##### Share on other sites

How do you convert High-level 'Drawables' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?
So I've basically got: DrawCalls, Resources (cbuffers, vertex/index buffers, shaders) and StateGroups --- you can use these primitives to compose layers of functionality.
It might be easier to describe with some pseudo-code. For example, we could have a PlayerEntity, which has a ModelInstance, which has a Geometry, which has a Mesh, which has a Material, which has a Shader:[source lang=cpp]
{
StateGroup* state;
vector<cosnt CBuffer*> defaults;

{
defaults = programs->GetDefaultCBuffers();

state = new StateGroup();
for( i=0; i<defaults.size(); ++i )
}
}

//binds some useful shader values, textures, etc
struct MaterialRes
{
StateGroup* state;
vector<CBuffer*> cbuffers;

MaterialRes( ShaderRes* s, vector<CBuffer*>& v )
{
cbuffers = v;

state = new StateGroup();
for( i=0; i<cbuffers.size(); ++i )
}
}

//a draw-call (i.e. "sub-mesh") paired with a material
struct MeshRes
{
DrawCall* draw;
MaterialRes* material;
}

//binds the index/vertex buffers
struct GeometryRes
{
StateGroup* state;
VertexBuffer* vb;
IndexBuffer* ib;
vector<MeshRes*> meshes;

GeometryRes()
{
state = new StateGroup();
}
}

//an actual object in the world. Links to the above resources, and binds per-instance data, like a world-matrix.
struct ModelInstance
{
StateGroup* state;
CBuffer* constants;
Geometry* model;

ModelInstance()
{
constants = new CBuffer<InstanceData>();
constants->SetProperty( "WorldMatrix", Identity );
}

void Draw( RenderQueue& queue )
{
for( i=0; i!=model->meshes.size(); ++i )
{
Mesh* mesh = model->meshes;
//build the state-stack for this draw-call
StateGroup* stateMaterial = mesh->material->state;
StateGroup* stateGeometry = mesh->state;
StateGroup* stateInstance = state;

StateGroup* stateStack[4] = { stateInstance, stateGeometry, stateMaterial, stateShader };
DrawCall* draw = mesh->draw;
queue.Submit( draw, stateStack, 4 ); //<-- here is the part where we actually submit something for drawing
}
}
}

//a higher-level game object made up of several models
struct PlayerEntity
{
ModelInstance* body;
ModelInstance* gun;

void Draw( RenderQueue& queue )
{
body->Draw(queue);
gun->Draw(queue);
}
}[/source](n.b. completely made-up code to try and get some ideas about composition across)
[font="arial, verdana, tahoma, sans-serif"]

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?
An actor might be made up of several draw calls, and those draw-calls might need to be drawn at different stages of the pipeline -- e.g. if part of the actor is opaque and part is translucent.

To easily deal with this, I would have each actor submit it's meshes/drawables/whatever to the renderer, and have the renderer call the actual "Draw" functions at the appropriate times. [/font]
[font="arial, verdana, tahoma, sans-serif"]As Krypt0n mentioned, you might not even want the Actor to be responsible for this submission though -- you could have the actor 'register' it's "drawables" with some kind of rendering manager in advance, and let that manager object perform the submission on behalf of the actor (this way the actor doesn't have a draw function at all).[/font]

##### Share on other sites
@Hodgman

Should the class StateGroup look like this?
 class StateGroup { public: Add(BindShaderCommand command); //Lots of different Add methods //Lost of different Get methods private: ShaderPrograms* program; vector<CBuffer*> cbuffers; VertexBuffer* vBuffer; IndexBuffer* iBuffer; //etc, etc, etc } 

What should the DrawCall and various Bind*Something*Command structs look like?

What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on

##### Share on other sites
Should the class StateGroup look like this? What should the DrawCall and various Bind*Something*Command structs look like?
Conceptually, mine looks more like[source lang=cpp]class StateGroup
{
public:
typedef std::vector<RenderState*> StateVec;

void Add(RenderState* s) { states.push_back(s); }
StateVec::const_iterator Begin() { return states.begin(); }
StateVec::const_iterator End() { return states.begin(); }
private:
StateVec states;
};

class RenderCommand
{
public:
virtual ~RenderCommand(){}
virtual void Execute( RenderDevice& ) = 0;
};

class DrawCall : public RenderCommand {};
class RenderState : public RenderCommand
{
enum StateType
{
BlendMode,
VertexBuffer,
CBuffer0,
CBuffer1,
/*etc*/
};
virtual StateType GetType() const = 0;
};

//Dx9 implementation
class BindVertexBuffer : public RenderState
{
public:
void Execute(RenderDevice&);
StateType GetType() { return VertexBuffer; }
private:
IDirect3DVertexBuffer9* buffer;
};
class DrawIndexedPrimitives : public DrawCall
{
public:
void Execute(RenderDevice&);
private:
D3DPRIMITIVETYPE Type;
INT BaseVertexIndex;
UINT MinIndex;
UINT NumVertices;
UINT StartIndex;
UINT PrimitiveCount;
};[/source]In practice though, for performance reasons there's no std::vectors of pointers or virtual functions -- the state-group is a blob of bytes that looks something like:|size |bitfield |number |state #0|state #0|state #1|state #1|... |in |of states|of states|type |data |type |data |... |bytes|contained|contained|enum | |enum | |...
What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on[/quote]Nothing, it's perfectly valid to merge groups together like that if you want to
However, in this case, the instance-group might be shared between a couple or draw-calls (the number that make up a particular model), the geometry group might be shared between dozens of draw-calls (that model times the number of instances of that model), the material group might be shared between hundreds of draw-calls (if the same material is used by different models) and the shader group might be shared between thousands (if the same shader is used by different materials).
The 'stack' kinda forms a pyramid of specialization/sharing, where the bottom layers are more likely to be shared between items, and the top layers are more likely to be specialized for a particular item.

##### Share on other sites
To use this architecture with deferred lighting a.k.a. light pre-pass I've added an extra StateGroup to the ShaderRes called gBufferState that contains the shader program to draw the objects to the G-Buffer and the original state var contains the shader program to draw the object in the second geometry pass. Or should I create a new PassRes struct? I guess it doesn't make much difference.

Regarding shader permutations: I can use the same constant buffer struct in all shader permutations and then the each shader permutation uses the constants that it needs right?

##### Share on other sites
as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).

The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader. In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)

##### Share on other sites

[quote name='Krypt0n' timestamp='1309350513' post='4829038']as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).

The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader.[/quote] for the HW I know it causes recompilation and even on some dx9 hardware, depending on framebuffer configuration, it's done in shaders and not in the rop.

In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)

[/quote]it's not a misstake, removing it from the pipeline just reflects in a better way how hardware works, this allows better shader. when the driver "recompiles" shaders, it's actually just patching them by NOPing out some area or adding some assembly snipples that e.g. rejects pixel based on the alpha. if you embed a "clip"/"kill" into your shader, then it's included into the optimization process and on all statges of compilation (be it fxc or driver frontend or backend) the pixel removel is moved to the front, so all unnecessary computations are avoided by an early out or at least by disabling unneded work (e.g. texture units are not causes any memory traffic for pixel that have been clipped).

I wish all the rop computations would be programmable, and also the texture units, at least the decompression to the L1 texture caches

##### Share on other sites
Yeah I don't see the point in having a fixed-function feature in the pipeline when a programmable one is available, particularly if the fixed-function one is going to be more limiting without having any better performance. But we're getting off-topic here.

##### Share on other sites
for the HW I know it causes recompilation

What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?

##### Share on other sites
Great discussion, am learning a lot from this.
I have questions that I hope can be answered -
1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)
2. With this system, it seems all programs are based on a single source file. How would users be able to use their own shader code and not just the supplied one?
3. Can we have 2 programs with the same shader flags?

Seems like we have a single list where all progams are stored.
However, what if we want more that one type of shader source from which programs are created.
For instance, we might have source for ligthing, one for terrain, and one user defined code.

Basically, am thinking of how to make this work in a data driven way where the user can change the source code for the shaders and chose what the shader flag bits mean and supply the #define strings per shader flag.

##### Share on other sites

1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)

to be really flexible, you create a second file with the flags used in your shader, in my case they have the same name, just different extensions

 toon.hlsl toon.flags 

flags is just a list of flags, each line corresponds to a bit.

my material files allow you to set flags (xml files), then I match the material flag-bits to the ones a shader sets. that works completely without recompilation.

2. With this system, it seems all programs are based on a single source file. How would users be able to use their own shader code and not just the supplied one?[/quote]

it was just an example on a single file, but you can use it on as many files as you want, I reference to the shader name in my material files, but you could also use some bit-mask to index into a shader array, it's up to you. I don't really see a limitation.

3. Can we have 2 programs with the same shader flags?[/quote]why not?

Seems like we have a single list where all progams are stored.
However, what if we want more that one type of shader source from which programs are created.
For instance, we might have source for ligthing, one for terrain, and one user defined code.
How can we cope with this? Would there be shader flag for LIGHTING_SHADER, TERRAIN_SHADER?[/quote]that's up to your implementation, but as you run short on flags (usually I do, at least), I would recomment to not use flags if not needed. those are anyway different shader for different materials. flags should be used if you have a specific type of shader that you permute, e.g. you provide a vertexstream with bitangen&binormal or just with a normal, you provide a bumpmap or a normalmap. those could be flags where you wouldn't want to write a special shader.

but you are also free to use "#include" in your shader, having some highlevel shader like "terrain" and "toon" and "skin", those can include common thing like usual vertex streams, texture sets, modification (e.g. a sin wave flag), which are applied independent of the high level things (like blending terrain layers, or subsurface scattering for skin etc.).

##### Share on other sites

[quote name='Eric Lengyel' timestamp='1309666676' post='4830523']In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)

it's not a misstake, removing it from the pipeline just reflects in a better way how hardware works, this allows better shader. when the driver "recompiles" shaders, it's actually just patching them by NOPing out some area or adding some assembly snipples that e.g. rejects pixel based on the alpha. if you embed a "clip"/"kill" into your shader, then it's included into the optimization process and on all statges of compilation (be it fxc or driver frontend or backend) the pixel removel is moved to the front, so all unnecessary computations are avoided by an early out or at least by disabling unneded work (e.g. texture units are not causes any memory traffic for pixel that have been clipped).[/quote]

I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders, but using that option or the alpha test should be a choice left to the programmer. The hardware still has dedicated alpha-testing capabilities that would be faster in some cases. Removing the alpha test actually makes the API a worse reflection of the true hardware functionality.

##### Share on other sites

To use this architecture with deferred lighting a.k.a. light pre-pass I've added an extra StateGroup to the ShaderRes called gBufferState that contains the shader program to draw the objects to the G-Buffer and the original state var contains the shader program to draw the object in the second geometry pass. Or should I create a new PassRes struct? I guess it doesn't make much difference.
In my 'shader' objects (similar to what Microsoft calls an 'Effect' or a 'Technique') I've actually got multiple passes defined. Each pass then has a list of permutations.
Depending on which part of the pipeline you're rendering for (shadow, g-buffer, etc), a different pass is selected from the shader (and a permutation is then selected from that pass).
Regarding shader permutations: I can use the same constant buffer struct in all shader permutations and then the each shader permutation uses the constants that it needs right?[/quote]Yeah, there's no way for the game-side code to reliably predict which permutation will be chosen, so you shouldn't change your cbuffer layouts for different permutaitons. You can however have certain cbuffers that are used by some perm's and ignored by others.

[font="arial, verdana, tahoma, sans-serif"]
1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)
The engine itself doesn't have to know what the options are -- it only knows that there's 64 bits worth of options.[/font]
[font="arial, verdana, tahoma, sans-serif"]
[/font]
[font="arial, verdana, tahoma, sans-serif"]In Horde3D, they give you 32-bits worth of options, and to make a new option, you just put a new [/font][font="Courier New"]#ifdef[/font] into your shader code. They use a naming convention where if a pre-processor token starts with [font="Courier New"]_F_##[/font] (where ## is 0 to 31), e.g. if you shader contains [font="Courier New"]#ifdef _F_06_NormalMapping[/font], then if someone enables option #6, the engine will select a permutation that was compiled with normal-mapping code.

At work, we actually use a modified version of HLSL where we can write something like:option normalMapping : 6; ... if( normalMapping ) { ... }
2. With this system, it seems all programs are based on a single source file.Seems like we have a single list where all progams are stored. [/quote]No, all permutations of a single source-file come from that source file. If you have a 2nd source file, it has it's own list of permutations.
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"] [/font][/font]
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"]
What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?
Is there any way to know??
I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders
Doesn't texkill/clip/discard just set a bit indicating that the ROP should discard, and not actually skip the shader instructions that come after it? Or has this been improved on newer cards?[/font][/font]

##### Share on other sites
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"][quote name='Eric Lengyel' timestamp='1309832790' post='4831213']What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?
Is there any way to know??[/quote]

Yes, as a matter of fact, there is. It is not too difficult to reverse-engineer the command buffer by stepping through assembly code with the Visual Studio debugger and see exactly what information the driver is sending to the hardware. Once you know how to locate the command buffer, the extraction of hardware register data can be automated. You can learn many interesting things by doing this, and it will change the way you think about the hardware. (Also, AMD has actually published their register specs for hardware up to R700.) I can tell you the register numbers and formats for the alpha test functionality on any GPU that I have physical access to.

[/font][/font]
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"][quote name='Eric Lengyel' timestamp='1309915357' post='4831584']I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders
Doesn't texkill/clip/discard just set a bit indicating that the ROP should discard, and not actually skip the shader instructions that come after it? Or has this been improved on newer cards?[/font][/font]
[/quote]

Generally, yes, but that bit can also be used to suppress texture fetches later in the shader, saving memory bandwidth, and GPUs have done this since at least 2004.

##### Share on other sites
@Hodgman
1.You can ask the GPU guys, but then you'd be not allowed to tell anyone, might explain why Krypton doesn't say anything specific.
2.You can develop for consoles, then you might get a little inside, depending on the console.
3.You can also check the open gpu specifications that ATI/AMD and intel released. For ATI as an example
http://developer.amd.com/documentation/guides/Pages/default.aspx#open_gpu
you will see, that the R3xx family of GPUs has "Alpha Functions" which refers to alphatest, the various HD2x00 graphicscards have alphablend, but I can't find any alphatest informations anymore. I've seen some linux driver mailinglist where some guys were wondering how to apply that to their driver, it makes some things quite complicated.
I think PowerVR chips, that are used in Atom chipsets, support D3D10.1, but it's no secret, due to the deferred pipeline, all the computations are done on the chip, there is no real ROP, even if you output antialiasing and use alpha blending, it's all done in the shader units down to the point where the AA samples are merged into one final pixel, which is the only moment the part that you could call "ROP" is doing something, by converting the color to the final format.

The compiler of the driver can decide on that. branching is usually free, you don't waste any performance in that case. You are right, it still needs to set the masking bits and the ROPs need to merge all fragment streams, but it seems like they have no way to compare, just masking pixels based on the bitmask. But that's what they do anyway all the time, be it due to the fine raster mask or alpha2coverage.

##### Share on other sites

Conceptually, mine looks more like[source lang=cpp]class StateGroup
{
public:
typedef std::vector<RenderState*> StateVec;

void Add(RenderState* s) { states.push_back(s); }
StateVec::const_iterator Begin() { return states.begin(); }
StateVec::const_iterator End() { return states.begin(); }
private:
StateVec states;
};

class RenderCommand
{
public:
virtual ~RenderCommand(){}
virtual void Execute( RenderDevice& ) = 0;
};

class DrawCall : public RenderCommand {};
class RenderState : public RenderCommand
{
enum StateType
{
BlendMode,
VertexBuffer,
CBuffer0,
CBuffer1,
/*etc*/
};
virtual StateType GetType() const = 0;
};

//Dx9 implementation
class BindVertexBuffer : public RenderState
{
public:
void Execute(RenderDevice&);
StateType GetType() { return VertexBuffer; }
private:
IDirect3DVertexBuffer9* buffer;
};
class DrawIndexedPrimitives : public DrawCall
{
public:
void Execute(RenderDevice&);
private:
D3DPRIMITIVETYPE Type;
INT BaseVertexIndex;
UINT MinIndex;
UINT NumVertices;
UINT StartIndex;
UINT PrimitiveCount;
};[/source]In practice though, for performance reasons there's no std::vectors of pointers or virtual functions -- the state-group is a blob of bytes that looks something like:|size |bitfield |number |state #0|state #0|state #1|state #1|... |in |of states|of states|type |data |type |data |... |bytes|contained|contained|enum | |enum | |...
What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on
Nothing, it's perfectly valid to merge groups together like that if you want to
However, in this case, the instance-group might be shared between a couple or draw-calls (the number that make up a particular model), the geometry group might be shared between dozens of draw-calls (that model times the number of instances of that model), the material group might be shared between hundreds of draw-calls (if the same material is used by different models) and the shader group might be shared between thousands (if the same shader is used by different materials).
The 'stack' kinda forms a pyramid of specialization/sharing, where the bottom layers are more likely to be shared between items, and the top layers are more likely to be specialized for a particular item.
[/quote]

I'm a bit interested in learning a bit more about how you've created a setup where you avoid use of virtual functions and vectors. I've hardly slept last night trying to figure out how I would do that - Having a system like the one you propose would kill performance having each render command require a virtual call plus a lot of vector iterations. I've read your blog post regarding the blobs but I'm having a hard time figuring out how that fits into this. Would you just have the renderer which receives the commands switch on type and reinterp cast the memory?

##### Share on other sites
Hidden
Premature post

Fantastic Hodgman thank you very much for the answer. I was also wondering how you go about sorting your resulting commandbuffer as it consists of several connected commands which in themselves cannot be moved around independently?

##### Share on other sites
On the previous page, I mentioned submitting draw/state pairs... Let's call them [font="Courier New"]RenderInstance[/font]s:struct RenderInstance { u32 sortingKey; DrawCall* draw; vector<StateGroup*> states;//not really a vector ;) };It's the queue of [font="Courier New"]RenderInstance[/font]s which gets sorted (not the command buffers). The sorted [font="Courier New"]RenderInstance[/font] queue is then used to generate a stream of commands.
Afterwards, another job takes the sorted instances and submits their commands to either the device or to a command buffer. Something like:submit instances sort instances for each instance for each state-group for each state if state is not redundant submit state submit draw-callThe [font="Courier New"]submit[/font] part is either switching on the type to execute the command then and there, or it's copying it into a buffer that can be executed later.

To sort the instances, I let the "submitter" specify a 32-bit number, which can be anything. The lower level rendering systems don't care what the numbers mean, they're just used to sort items into the right order.
The higher level rendering systems might put material-hashes in there, or depth values, or a combination of both, with some bits specifying layers, some specifying depth, some specifying a material ID, etc....

##### Share on other sites
Thanks for information guys, this has really cleared things up.

##### Share on other sites
 class CommandBindVAO { private: uint m_uiVAO; public: void Execute(Context* pkContext) const { pkContext->BindVAO(m_uiVAO); } }; class CommandUnbindVAO { public: void Execute(Context* pkContext) const { pkContext->UnbindVAO(); } }; class CommandBindProgram { private: RFShaderProgram* m_pkProgram; public: void Execute(Context* pkContext) const { pkContext->BindProgram(m_pkProgram); } }; class CommandSetRenderState { private: RFRenderState* m_pkState; public: void Execute(Context* pkContext) const { pkContext->ApplyRenderState(m_pkState); } }; class CommandGroup { public: enum ECmdType { ST_BIND_VAO = 1 << 0, ST_UNBIND_VAO = 1 << 1, ST_SET_PASS_UNIFORMS = 1 << 2, ST_BIND_PROGRAM = 1 << 3, ST_SET_RENDERSTATE = 1 << 4 }; private: size_t m_szCmdsSize; uint64 m_uiCmdFlags; uint m_uiCmdCount; void* m_pvCmd; public: size_t GetCmdSize() const { return m_szCmdsSize; } uint64 GetCmdFlags() const { return m_uiCmdFlags; } uint GetCmdCount() const { return m_uiCmdCount; } const void* GetCmds() const { return m_pvCmd; } }; //////////////////////////////////////////////////////// void Renderer::Render() { // Create sort list uint uiIndex = 0; for (RenderQueue::InstanceVector::const_iterator kIter = m_pkQueue->Begin(); kIter != m_pkQueue->End(); ++kIter) { m_kSortList.push_back(SortListItem((*kIter).GetSortKey(), uiIndex)); uiIndex++; } // Sort render queue std::stable_sort(m_kSortList.begin(), m_kSortList.end(), QueueSorter); // Iterate render instances in sorted order for (std::vector<SortListItem>::const_iterator kIter = m_kSortList.begin(); kIter != m_kSortList.end(); ++kIter) { const RenderInstance& kInstance = m_pkQueue->Get(kIter->m_uiIndex); // Iterate command groups uint64 uiUsedCommands = 0; for (RenderInstance::CommandGroupVector::const_iterator kCmdIter = kInstance.Begin(); kCmdIter != kInstance.End(); ++kCmdIter) { const CommandGroup* pkCmdGroup = *kCmdIter; // Iterate commands and execute on context const void* pvCmds = pkCmdGroup->GetCmds(); uint uiCmdCount = pkCmdGroup->GetCmdCount(); for (uint ui = 0; ui < uiCmdCount; ++ui) { // Get command type const CommandGroup::ECmdType eType = *reinterpret_cast<const CommandGroup::ECmdType*>(pvCmds); pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandGroup::ECmdType)); // Check if command type was already applied ealiere in the stack bool bApply = (uiUsedCommands & eType) != 0; // Remember type uiUsedCommands |= eType; // Handle command type correctly switch (eType) { case CommandGroup::ST_BIND_VAO: { // Execute command if (bApply) { const CommandBindVAO& kCmd = *reinterpret_cast<const CommandBindVAO*>(pvCmds); kCmd.Execute(m_pkContext); } // Offset command stream pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandBindVAO)); } break; case CommandGroup::ST_UNBIND_VAO: { // Execute command if (bApply) { const CommandUnbindVAO& kCmd = *reinterpret_cast<const CommandUnbindVAO*>(pvCmds); kCmd.Execute(m_pkContext); } // Offset command stream pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandUnbindVAO)); } break; case CommandGroup::ST_BIND_PROGRAM: { // Execute command if (bApply) { const CommandBindProgram& kCmd = *reinterpret_cast<const CommandBindProgram*>(pvCmds); kCmd.Execute(m_pkContext); } // Offset command stream pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandBindProgram)); } break; case CommandGroup::ST_SET_RENDERSTATE: { // Execute command if (bApply) { const CommandSetRenderState& kCmd = *reinterpret_cast<const CommandSetRenderState*>(pvCmds); kCmd.Execute(m_pkContext); } // Offset command stream pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandSetRenderState)); } break; } } } // Switch on drawcall and execute const DrawCall* pkDrawCall = kInstance.GetDrawCall(); switch (pkDrawCall->GetType()) { case DrawCall::DCT_DRAW_ARRAYS: static_cast<const DrawCallDrawArrays*>(pkDrawCall)->Execute(m_pkContext); break; } } } 

CommandGroups are what you would call StateGroups - As that is what they are. Commands to change state as far as I understand.

Right now I'm manually iterating the command groups which would obviously be done using a proper iterator when time comes. Same goes with the use of vectors.

Just a quick mockup of a renderer::render method. Am I completely on the wrong track. Obviously my framework is written in OpenGL though that shouldn't change much. Context is a context proxy which keeps track of which VAO / state is set etc.

I'm having a hard time figuring out which commands I could define as all I could come up with where the 5 I've shown. I'm also a bit in doubt why you would make a separate Drawcall class instead of having it as a command.

The uniforms are cause me problems as well. In my setup a material contains x techniques which in turn contains x passes which contain x uniforms (default values / auto values set by the framework) and a shader program. Each MeshRes (in the sense you're using it) contains a pointer to a material. Come command queue execution I have to apply / update these uniforms after having bound the shader program. Would that result in a new command type? Would this defeat the purpose of having this highly compacted memory command queue as that would require me to jump to the MaterialPass and iterate all the uniforms updating / uploading them to the GPU.

The following is basicly what I think I need to do
 for each MeshInstance in MeshInstanceList { for each SubMesh in MeshInstance { Store ShaderProgram // Which program to render using Store UniformDefaults // Material pass defined uniform defaults Store UniformAuto // Material pass defined auto-filled uniforms using context state (view, viewprojection, time etc.) Store TextureDefaults // Material pass defined textures - Set in material definition Store UniformInstance // Submesh Instance defined uniforms Store TextureInstance // Submesh Instance defined textures Store VAO // Submesh buffer data binding Store DrawCall // Encapsuled Add RenderInstance to queue } } Sort renderqueue Submit renderqueue to renderer for each RenderInstance in renderqueue { Update UniformAuto from context Find and apply WorldTransform on context (used for auto uniforms) Apply ShaderProgram on context Apply UniformDefaults Apply UniformAuto Apply UniformInstance Bind TextureDefaults Bind TextureInstance Bind VAO Dispatch DrawCall Unbind VAO } 

Is the above sensible and would it make sense in the context of what Hodgman has proposed?

Oh and thank you very much for all the help you've given me - And the community!

##### Share on other sites
Just a quick mockup of a renderer::render method. Am I completely on the wrong track?
Yeah that looks similar to what I'm used to. I use something analogous to your "[font="Courier New"]uiUsedCommands[/font]/[font="Courier New"]bApply[/font]" code to ensure commands at the top of the stack take precedence over commands of the same type lower in the stack.

My [font="Courier New"]bApply[/font] test is a bit more complicated though, as it also checks if the command being inspected was already set by the previous render-instance. i.e. if two consecutive render instances use the same material, then all the states from the material's state-group can usually be ignored when drawing the 2nd instance.

My "Iterate render instances" loop is also passed a "default" state-group, which is conceptually put at the bottom of every state-stack. If an instance doesn't set a particular state and the default group contains that state, then the default value will be used.
If you don't do this, then you end up with behaviours like -- one object enables alpha blending, and then all following objects also end up being alpha-blended, because they didn't manually specify a "disable alpha blending" command.

Also, with the way your code is at the moment, only a single [font="Courier New"]SetRenderState[/font] command will be applied per instance. If you want to set two different render-states, only the first one will actually be set at the moment (the second will be ignored). For this reason, I have every different render-state as a different command ID.
I'm having a hard time figuring out which commands I could define as all I could come up with where the 5 I've shown. I'm also a bit in doubt why you would make a separate Drawcall class instead of having it as a command.[/quote]As above, I've got commands for each different render-state. I've also got commands for each different CBuffer slot and each texture-binding slot (for each type of shader).

I've limited myself to 14 CBuffer slots each for the vertex and pixel shader, so, there's actually 28 different IDs that are associated with the "bind cbuffer" command.

My draw-calls are actually a command, just like state-changes. However, I split commands into 3 different categories -- general state-changes, draw-calls, and per-pass state-changes.
State-groups can only contain general state-changes. Actual render-instances must use a draw-call command (not a state-change command).
The 3rd category are stored in something similar to a state-group, which is used to set up an entire "pass" of the rendering pipeline -- commands such as binding render-targets, depth-buffers, viewports, scissor tests, etc go into this category.
Come command queue execution I have to apply / update these uniforms after having bound the shader program. Would that result in a new command type?[/quote]There's a bunch of different abstractions for how uniforms are set, depending on your API... GL uses this model you're familiar with, you set the uniforms on the currently bound program... DX9 uses a model where there's a set of ~200 global registers, and any changes made to them persist from one shader to the next... DX10/11 are similar to 9, but you've got a set of bound CBuffers instead of individually bound uniforms.

So, I looked at these abstractions, and decided that the cbuffer approach made the most sense to me. No matter what the back-end rendering API actually is, my renderer deals with cbuffers -- and as as above, I've got 14 cbuffer binding slots/commands per shader type.

The way this is used generally, is that a "shader" state-group on the bottom of the stack contains commands to bind cbuffers that contain default values. The "material" and "object/instance" cbuffers then contains commands to bind their own cbuffers (which override the "default" commands).

On APIs that don't actually use the cbuffer abstraction, then yes, there's a step that looks at the currently bound cbuffers and sets all of the individual uniforms. I do this step prior to every draw call (with a whole bunch of optimisations to skip unnecessary work).
Regarding memory layout, I allocate all my cbuffer blocks (which are blobs containing uniforms) from a separate linear allocator.