• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
_rapt0r

Frostbite rendering architecture question.

76 posts in this topic

[quote name='Quat' timestamp='1309397714' post='4829357']
Sorry to ask a new question in this thread--I can make a new topic if that is better.

How do you create all your shader variations. Right now I am using the effects framework with compile time flags to switch things on and off and I literally compile the shaders with the flags set with the options I want enabled. Obviously I only need to type this out once, but it still seems like there is a better way than (pseudocode):

TwoLightsTexReflect = CompileShader(2, true, false, false, true);
OneLightsTexAlphaTestFog = CompileShader(1, true, true, true, false);
....
ugh
[/quote]
how bout

[code]


for(int a=0;a<1<<bits;a++)

{

std::string Flags;

if(a&1) Flags+="TwiSideLight";

if(a&2) Flags+="CheeseTexture";

if(a&4) Flags+="GameOfLifeTexture";

...

Compile(...,Flags...);

}

[/code]

you dont really want to write 64k of permutations, even with a lot of spare time ;)




2

Share this post


Link to post
Share on other sites
The older (DX9 and DX10) versions of the effect framework supports arrays of shaders. So you could make an array of N pixel shaders for N light sources, and then in your app code set an integer to specify the number of lights and the framework would select the correct version of the shader to use. The skinning sample in the SDK does this.
0

Share this post


Link to post
Share on other sites
[quote name='rapt0r' timestamp='1309351056' post='4829043']
How do you convert High-level '[i]Drawables[/i]' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?[/quote]So I've basically got: DrawCalls, Resources (cbuffers, vertex/index buffers, shaders) and StateGroups --- you can use these primitives to compose layers of functionality.
It might be easier to describe with some pseudo-code. For example, we could have a PlayerEntity, which [i]has a[/i] ModelInstance, which [i]has a[/i] Geometry, which [i]has a[/i] Mesh, which [i]has a[/i] Material, which [i]has a[/i] Shader:[source lang=cpp]
//binds the shader programs and default shader values
struct ShaderRes
{
StateGroup* state;
ShaderPrograms* programs;
vector<cosnt CBuffer*> defaults;

ShaderRes(const char* name)
{
programs = Load(name);
defaults = programs->GetDefaultCBuffers();

state = new StateGroup();
state->Add( new BindShaderCommand(programs) );
for( i=0; i<defaults.size(); ++i )
state->Add( new BindCBufferCommand(defaults[i]) );
}
}


//binds some useful shader values, textures, etc
struct MaterialRes
{
StateGroup* state;
ShaderRes* shader;
vector<CBuffer*> cbuffers;

MaterialRes( ShaderRes* s, vector<CBuffer*>& v )
{
shader = s;
cbuffers = v;

state = new StateGroup();
for( i=0; i<cbuffers.size(); ++i )
state->Add( new BindCBufferCommand(cbuffers[i]) );
}
}


//a draw-call (i.e. "sub-mesh") paired with a material
struct MeshRes
{
DrawCall* draw;
MaterialRes* material;
}


//binds the index/vertex buffers
struct GeometryRes
{
StateGroup* state;
VertexBuffer* vb;
IndexBuffer* ib;
vector<MeshRes*> meshes;

GeometryRes()
{
state = new StateGroup();
state->Add( new BindVertexBufferCommand(vb) );
state->Add( new BindIndexBufferCommand(ib) );
}
}


//an actual object in the world. Links to the above resources, and binds per-instance data, like a world-matrix.
struct ModelInstance
{
StateGroup* state;
CBuffer* constants;
Geometry* model;

ModelInstance()
{
constants = new CBuffer<InstanceData>();
constants->SetProperty( "WorldMatrix", Identity );
}

void Draw( RenderQueue& queue )
{
for( i=0; i!=model->meshes.size(); ++i )
{
Mesh* mesh = model->meshes[i];
//build the state-stack for this draw-call
StateGroup* stateShader = mesh->material->shader->state;
StateGroup* stateMaterial = mesh->material->state;
StateGroup* stateGeometry = mesh->state;
StateGroup* stateInstance = state;

StateGroup* stateStack[4] = { stateInstance, stateGeometry, stateMaterial, stateShader };
DrawCall* draw = mesh->draw;
queue.Submit( draw, stateStack, 4 ); //<-- here is the part where we actually submit something for drawing
}
}
}


//a higher-level game object made up of several models
struct PlayerEntity
{
ModelInstance* body;
ModelInstance* gun;

void Draw( RenderQueue& queue )
{
body->Draw(queue);
gun->Draw(queue);
}
}[/source](n.b. completely made-up code to try and get some ideas about composition across)
[font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='TiagoCosta' timestamp='1309384242' post='4829255']
Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?[/quote]An actor might be made up of several draw calls, and those draw-calls might need to be drawn at different stages of the pipeline -- e.g. if part of the actor is opaque and part is translucent.

To easily deal with this, I would have each actor submit it's meshes/drawables/whatever to the renderer, and have the renderer call the actual "Draw" functions at the appropriate times. [/size][/font]
[font="arial, verdana, tahoma, sans-serif"]As Krypt0n mentioned, you might not even want the Actor to be responsible for this submission though -- you could have the actor 'register' it's "drawables" with some kind of rendering manager in advance, and let that manager object perform the submission on behalf of the actor (this way the actor doesn't have a draw function at all).[/font]
1

Share this post


Link to post
Share on other sites
@Hodgman

Should the class StateGroup look like this?
[code]
class StateGroup
{
public:
Add(BindShaderCommand command);
//Lots of different Add methods

//Lost of different Get methods

private:
ShaderPrograms* program;
vector<CBuffer*> cbuffers;
VertexBuffer* vBuffer;
IndexBuffer* iBuffer;
//etc, etc, etc
}
[/code]

What should the DrawCall and various Bind*Something*Command structs look like?

What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on
1

Share this post


Link to post
Share on other sites
To use this architecture with deferred lighting a.k.a. light pre-pass I've added an extra StateGroup to the ShaderRes called gBufferState that contains the shader program to draw the objects to the G-Buffer and the original state var contains the shader program to draw the object in the second geometry pass. Or should I create a new PassRes struct? I guess it doesn't make much difference.

Regarding shader permutations: I can use the same constant buffer struct in all shader permutations and then the each shader permutation uses the constants that it needs right?
1

Share this post


Link to post
Share on other sites
[quote name='Krypt0n' timestamp='1309350513' post='4829038']as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).[/quote]

The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader. In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)
2

Share this post


Link to post
Share on other sites
[quote name='Eric Lengyel' timestamp='1309666676' post='4830523']
[quote name='Krypt0n' timestamp='1309350513' post='4829038']as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).[/quote]

The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader.[/quote] for the HW I know it causes recompilation and even on some dx9 hardware, depending on framebuffer configuration, it's done in shaders and not in the rop.




[quote]In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)


[/quote]it's not a misstake, removing it from the pipeline just reflects in a better way how hardware works, this allows better shader. when the driver "recompiles" shaders, it's actually just patching them by NOPing out some area or adding some assembly snipples that e.g. rejects pixel based on the alpha. if you embed a "clip"/"kill" into your shader, then it's included into the optimization process and on all statges of compilation (be it fxc or driver frontend or backend) the pixel removel is moved to the front, so all unnecessary computations are avoided by an early out or at least by disabling unneded work (e.g. texture units are not causes any memory traffic for pixel that have been clipped).

I wish all the rop computations would be programmable, and also the texture units, at least the decompression to the L1 texture caches :)
1

Share this post


Link to post
Share on other sites
Yeah I don't see the point in having a fixed-function feature in the pipeline when a programmable one is available, particularly if the fixed-function one is going to be more limiting without having any better performance. But we're getting off-topic here. :P
1

Share this post


Link to post
Share on other sites
[quote name='Krypt0n' timestamp='1309778277' post='4830894']for the HW I know it causes recompilation[/quote]

What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?
0

Share this post


Link to post
Share on other sites
Great discussion, am learning a lot from this.
I have questions that I hope can be answered -
1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)
2. With this system, it seems all programs are based on a single source file. How would users be able to use their own shader code and not just the supplied one?
3. Can we have 2 programs with the same shader flags?

Seems like we have a single list where all progams are stored.
However, what if we want more that one type of shader source from which programs are created.
For instance, we might have source for ligthing, one for terrain, and one user defined code.
How can we cope with this? Would there be shader flag for LIGHTING_SHADER, TERRAIN_SHADER?

Basically, am thinking of how to make this work in a data driven way where the user can change the source code for the shaders and chose what the shader flag bits mean and supply the #define strings per shader flag.
0

Share this post


Link to post
Share on other sites
[quote name='Andrew Kabakwu' timestamp='1309874832' post='4831352']
1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)[/quote]

to be really flexible, you create a second file with the flags used in your shader, in my case they have the same name, just different extensions


[code]

toon.hlsl

toon.flags

[/code]

flags is just a list of flags, each line corresponds to a bit.

my material files allow you to set flags (xml files), then I match the material flag-bits to the ones a shader sets. that works completely without recompilation.

[quote]

2. With this system, it seems all programs are based on a single source file. How would users be able to use their own shader code and not just the supplied one?[/quote]


it was just an example on a single file, but you can use it on as many files as you want, I reference to the shader name in my material files, but you could also use some bit-mask to index into a shader array, it's up to you. I don't really see a limitation.

[quote]

3. Can we have 2 programs with the same shader flags?[/quote]why not?
[quote]


Seems like we have a single list where all progams are stored.
However, what if we want more that one type of shader source from which programs are created.
For instance, we might have source for ligthing, one for terrain, and one user defined code.
How can we cope with this? Would there be shader flag for LIGHTING_SHADER, TERRAIN_SHADER?[/quote]that's up to your implementation, but as you run short on flags (usually I do, at least), I would recomment to not use flags if not needed. those are anyway different shader for different materials. flags should be used if you have a specific type of shader that you permute, e.g. you provide a vertexstream with bitangen&binormal or just with a normal, you provide a bumpmap or a normalmap. those could be flags where you wouldn't want to write a special shader.

but you are also free to use "#include" in your shader, having some highlevel shader like "terrain" and "toon" and "skin", those can include common thing like usual vertex streams, texture sets, modification (e.g. a sin wave flag), which are applied independent of the high level things (like blending terrain layers, or subsurface scattering for skin etc.).
0

Share this post


Link to post
Share on other sites
[quote name='Krypt0n' timestamp='1309778277' post='4830894']
[quote name='Eric Lengyel' timestamp='1309666676' post='4830523']In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)[/quote]

it's not a misstake, removing it from the pipeline just reflects in a better way how hardware works, this allows better shader. when the driver "recompiles" shaders, it's actually just patching them by NOPing out some area or adding some assembly snipples that e.g. rejects pixel based on the alpha. if you embed a "clip"/"kill" into your shader, then it's included into the optimization process and on all statges of compilation (be it fxc or driver frontend or backend) the pixel removel is moved to the front, so all unnecessary computations are avoided by an early out or at least by disabling unneded work (e.g. texture units are not causes any memory traffic for pixel that have been clipped).[/quote]

I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders, but using that option or the alpha test should be a choice left to the programmer. The hardware still has dedicated alpha-testing capabilities that would be faster in some cases. Removing the alpha test actually makes the API a worse reflection of the true hardware functionality.
0

Share this post


Link to post
Share on other sites
[quote name='TiagoCosta' timestamp='1309611183' post='4830328']
To use this architecture with deferred lighting a.k.a. light pre-pass I've added an extra StateGroup to the ShaderRes called gBufferState that contains the shader program to draw the objects to the G-Buffer and the original state var contains the shader program to draw the object in the second geometry pass. Or should I create a new PassRes struct? I guess it doesn't make much difference.[/quote]In my 'shader' objects ([i]similar to what Microsoft calls an 'Effect' or a 'Technique'[/i]) I've actually got multiple passes defined. Each pass then has a list of permutations.
Depending on which part of the pipeline you're rendering for ([i]shadow, g-buffer, etc[/i]), a different pass is selected from the shader ([i]and a permutation is then selected from that pass[/i]).[quote]Regarding shader permutations: I can use the same constant buffer struct in all shader permutations and then the each shader permutation uses the constants that it needs right?[/quote]Yeah, there's no way for the game-side code to reliably predict which permutation will be chosen, so you shouldn't change your cbuffer layouts for different permutaitons. You can however have certain cbuffers that are used by some perm's and ignored by others.


[font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='Andrew Kabakwu' timestamp='1309874832' post='4831352']1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)[/quote]The engine itself doesn't have to know what the options are -- it only knows that there's 64 bits worth of options.[/size][/font]
[size="2"][font="arial, verdana, tahoma, sans-serif"]
[/font][/size]
[size="2"][font="arial, verdana, tahoma, sans-serif"]In [url="http://horde3d.org/"]Horde3D[/url], they give you 32-bits worth of options, and to make a new option, you just put a new [/font][font="Courier New"]#ifdef[/font] into your shader code. They use a naming convention where if a pre-processor token starts with [font="Courier New"]_F_##[/font] ([i]where ## is 0 to 31[/i]), e.g. if you shader contains [font="Courier New"]#ifdef _F_06_NormalMapping[/font], then if someone enables option #6, the engine will select a permutation that was compiled with normal-mapping code.[/size]

[size="2"]At work, we actually use a modified version of HLSL where we can write something like:[/size][size="2"][code]option normalMapping : 6;
...
if( normalMapping ) {
...
}[/code][quote]2. With this system, it seems all programs are based on a single source file.Seems like we have a single list where all progams are stored. [/quote]No, all permutations of a single source-file come from that source file. If you have a 2nd source file, it has it's own list of permutations.[/size]
[font="arial, verdana, tahoma, sans-serif"][size="2"][font="arial, verdana, tahoma, sans-serif"] [/font][/size][/font]
[font="arial, verdana, tahoma, sans-serif"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='Eric Lengyel' timestamp='1309832790' post='4831213']What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?[/quote]Is there any way to know??[quote name='Eric Lengyel' timestamp='1309915357' post='4831584']I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders[/quote]Doesn't texkill/clip/discard just set a bit indicating that the ROP should discard, and not actually skip the shader instructions that come after it? Or has this been improved on newer cards?[/size][/font][/size][/font]
1

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1309917613' post='4831593'][font="arial, verdana, tahoma, sans-serif"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='Eric Lengyel' timestamp='1309832790' post='4831213']What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?[/quote]Is there any way to know??[/quote]

Yes, as a matter of fact, there is. It is not too difficult to reverse-engineer the command buffer by stepping through assembly code with the Visual Studio debugger and see exactly what information the driver is sending to the hardware. Once you know how to locate the command buffer, the extraction of hardware register data can be automated. You can learn many interesting things by doing this, and it will change the way you think about the hardware. (Also, AMD has actually published their register specs for hardware up to R700.) I can tell you the register numbers and formats for the alpha test functionality on any GPU that I have physical access to.

[/size][/font][/size][/font][quote name='Hodgman' timestamp='1309917613' post='4831593'][font="arial, verdana, tahoma, sans-serif"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='Eric Lengyel' timestamp='1309915357' post='4831584']I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders[/quote]Doesn't texkill/clip/discard just set a bit indicating that the ROP should discard, and not actually skip the shader instructions that come after it? Or has this been improved on newer cards?[/size][/font][/size][/font]
[/quote]

Generally, yes, but that bit can also be used to suppress texture fetches later in the shader, saving memory bandwidth, and GPUs have done this since at least 2004.
0

Share this post


Link to post
Share on other sites
@Hodgman
1.You can ask the GPU guys, but then you'd be not allowed to tell anyone, might explain why Krypton doesn't say anything specific.
2.You can develop for consoles, then you might get a little inside, depending on the console.
3.You can also check the open gpu specifications that ATI/AMD and intel released. For ATI as an example
http://developer.amd.com/documentation/guides/Pages/default.aspx#open_gpu
you will see, that the R3xx family of GPUs has "Alpha Functions" which refers to alphatest, the various HD2x00 graphicscards have alphablend, but I can't find any alphatest informations anymore. I've seen some linux driver mailinglist where some guys were wondering how to apply that to their driver, it makes some things quite complicated.
I think PowerVR chips, that are used in Atom chipsets, support D3D10.1, but it's no secret, due to the deferred pipeline, all the computations are done on the chip, there is no real ROP, even if you output antialiasing and use alpha blending, it's all done in the shader units down to the point where the AA samples are merged into one final pixel, which is the only moment the part that you could call "ROP" is doing something, by converting the color to the final format.

Regarding the alpha test mask:
The compiler of the driver can decide on that. branching is usually free, you don't waste any performance in that case. You are right, it still needs to set the masking bits and the ROPs need to merge all fragment streams, but it seems like they have no way to compare, just masking pixels based on the bitmask. But that's what they do anyway all the time, be it due to the fine raster mask or alpha2coverage.
0

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1309479814' post='4829824']
Conceptually, mine looks more like[source lang=cpp]class StateGroup
{
public:
typedef std::vector<RenderState*> StateVec;

void Add(RenderState* s) { states.push_back(s); }
StateVec::const_iterator Begin() { return states.begin(); }
StateVec::const_iterator End() { return states.begin(); }
private:
StateVec states;
};

class RenderCommand
{
public:
virtual ~RenderCommand(){}
virtual void Execute( RenderDevice& ) = 0;
};

class DrawCall : public RenderCommand {};
class RenderState : public RenderCommand
{
enum StateType
{
BlendMode,
VertexBuffer,
CBuffer0,
CBuffer1,
/*etc*/
};
virtual StateType GetType() const = 0;
};

//Dx9 implementation
class BindVertexBuffer : public RenderState
{
public:
void Execute(RenderDevice&);
StateType GetType() { return VertexBuffer; }
private:
IDirect3DVertexBuffer9* buffer;
};
class DrawIndexedPrimitives : public DrawCall
{
public:
void Execute(RenderDevice&);
private:
D3DPRIMITIVETYPE Type;
INT BaseVertexIndex;
UINT MinIndex;
UINT NumVertices;
UINT StartIndex;
UINT PrimitiveCount;
};[/source]In practice though, for performance reasons there's no std::vectors of pointers or virtual functions -- the state-group is a [url="http://bitsquid.blogspot.com/2010/02/blob-and-i.html"]blob[/url] of bytes that looks something like:[code]|size |bitfield |number |state #0|state #0|state #1|state #1|...
|in |of states|of states|type |data |type |data |...
|bytes|contained|contained|enum | |enum | |...[/code]
[quote]What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on[/quote]Nothing, it's perfectly valid to merge groups together like that if you want to [img]http://public.gamedev.net/public/style_emoticons/default/wink.gif[/img]
However, in this case, the instance-group might be shared between a couple or draw-calls ([i]the number that make up a particular model[/i]), the geometry group might be shared between dozens of draw-calls ([i]that model times the number of instances of that model[/i]), the material group might be shared between hundreds of draw-calls ([i]if the same material is used by different models[/i]) and the shader group might be shared between thousands ([i]if the same shader is used by different materials[/i]).
The 'stack' kinda forms a pyramid of specialization/sharing, where the bottom layers are more likely to be shared between items, and the top layers are more likely to be specialized for a particular item.
[/quote]

I'm a bit interested in learning a bit more about how you've created a setup where you avoid use of virtual functions and vectors. I've hardly slept last night trying to figure out how I would do that - Having a system like the one you propose would kill performance having each render command require a virtual call plus a lot of vector iterations. I've read your blog post regarding the blobs but I'm having a hard time figuring out how that fits into this. Would you just have the renderer which receives the commands switch on type and reinterp cast the memory?
0

Share this post


Link to post
Share on other sites
Fantastic Hodgman thank you very much for the answer. I was also wondering how you go about sorting your resulting commandbuffer as it consists of several connected commands which in themselves cannot be moved around independently?
0

Share this post


Link to post
Share on other sites
On the previous page, I mentioned submitting draw/state pairs... Let's call them [font="Courier New"]RenderInstance[/font]s:[code]struct RenderInstance
{
u32 sortingKey;
DrawCall* draw;
vector<StateGroup*> states;//not really a vector ;)
};[/code]It's the queue of [font="Courier New"]RenderInstance[/font]s which gets sorted (not the command buffers). The sorted [font="Courier New"]RenderInstance[/font] queue is then used to generate a stream of commands.
Afterwards, another job takes the sorted instances and submits their commands to either the device or to a command buffer. Something like:[code]submit instances
sort instances
for each instance
for each state-group
for each state
if state is not redundant
submit state
submit draw-call[/code]The [font="Courier New"]submit[/font] part is either switching on the type to execute the command then and there, or it's copying it into a buffer that can be executed later.

To sort the instances, I let the "submitter" specify a 32-bit number, which can be anything. The lower level rendering systems don't care what the numbers mean, they're just used to sort items into the right order.
The higher level rendering systems might put material-hashes in there, or depth values, or a combination of both, with some bits specifying layers, some specifying depth, some specifying a material ID, etc....
0

Share this post


Link to post
Share on other sites
[code]
class CommandBindVAO
{
private:
uint m_uiVAO;

public:
void Execute(Context* pkContext) const
{
pkContext->BindVAO(m_uiVAO);
}
};

class CommandUnbindVAO
{
public:
void Execute(Context* pkContext) const
{
pkContext->UnbindVAO();
}
};

class CommandBindProgram
{
private:
RFShaderProgram* m_pkProgram;

public:
void Execute(Context* pkContext) const
{
pkContext->BindProgram(m_pkProgram);
}
};

class CommandSetRenderState
{
private:
RFRenderState* m_pkState;

public:
void Execute(Context* pkContext) const
{
pkContext->ApplyRenderState(m_pkState);
}
};

class CommandGroup
{
public:
enum ECmdType
{
ST_BIND_VAO = 1 << 0,
ST_UNBIND_VAO = 1 << 1,
ST_SET_PASS_UNIFORMS = 1 << 2,
ST_BIND_PROGRAM = 1 << 3,
ST_SET_RENDERSTATE = 1 << 4
};

private:
size_t m_szCmdsSize;
uint64 m_uiCmdFlags;
uint m_uiCmdCount;
void* m_pvCmd;

public:
size_t GetCmdSize() const { return m_szCmdsSize; }
uint64 GetCmdFlags() const { return m_uiCmdFlags; }
uint GetCmdCount() const { return m_uiCmdCount; }
const void* GetCmds() const { return m_pvCmd; }
};

////////////////////////////////////////////////////////

void Renderer::Render()
{
// Create sort list
uint uiIndex = 0;
for (RenderQueue::InstanceVector::const_iterator kIter = m_pkQueue->Begin();
kIter != m_pkQueue->End(); ++kIter)
{
m_kSortList.push_back(SortListItem((*kIter).GetSortKey(), uiIndex));
uiIndex++;
}

// Sort render queue
std::stable_sort(m_kSortList.begin(), m_kSortList.end(), QueueSorter);

// Iterate render instances in sorted order
for (std::vector<SortListItem>::const_iterator kIter = m_kSortList.begin(); kIter != m_kSortList.end(); ++kIter)
{
const RenderInstance& kInstance = m_pkQueue->Get(kIter->m_uiIndex);

// Iterate command groups
uint64 uiUsedCommands = 0;
for (RenderInstance::CommandGroupVector::const_iterator kCmdIter = kInstance.Begin();
kCmdIter != kInstance.End(); ++kCmdIter)
{
const CommandGroup* pkCmdGroup = *kCmdIter;

// Iterate commands and execute on context
const void* pvCmds = pkCmdGroup->GetCmds();
uint uiCmdCount = pkCmdGroup->GetCmdCount();

for (uint ui = 0; ui < uiCmdCount; ++ui)
{
// Get command type
const CommandGroup::ECmdType eType = *reinterpret_cast<const CommandGroup::ECmdType*>(pvCmds);
pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandGroup::ECmdType));

// Check if command type was already applied ealiere in the stack
bool bApply = (uiUsedCommands & eType) != 0;

// Remember type
uiUsedCommands |= eType;

// Handle command type correctly
switch (eType)
{
case CommandGroup::ST_BIND_VAO:
{
// Execute command
if (bApply)
{
const CommandBindVAO& kCmd = *reinterpret_cast<const CommandBindVAO*>(pvCmds);
kCmd.Execute(m_pkContext);
}

// Offset command stream
pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandBindVAO));
}
break;

case CommandGroup::ST_UNBIND_VAO:
{
// Execute command
if (bApply)
{
const CommandUnbindVAO& kCmd = *reinterpret_cast<const CommandUnbindVAO*>(pvCmds);
kCmd.Execute(m_pkContext);
}

// Offset command stream
pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandUnbindVAO));
}
break;

case CommandGroup::ST_BIND_PROGRAM:
{
// Execute command
if (bApply)
{
const CommandBindProgram& kCmd = *reinterpret_cast<const CommandBindProgram*>(pvCmds);
kCmd.Execute(m_pkContext);
}

// Offset command stream
pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandBindProgram));
}
break;

case CommandGroup::ST_SET_RENDERSTATE:
{
// Execute command
if (bApply)
{
const CommandSetRenderState& kCmd = *reinterpret_cast<const CommandSetRenderState*>(pvCmds);
kCmd.Execute(m_pkContext);
}

// Offset command stream
pvCmds = static_cast<const void*>(static_cast<const char*>(pvCmds) + sizeof(CommandSetRenderState));
}
break;
}
}
}

// Switch on drawcall and execute
const DrawCall* pkDrawCall = kInstance.GetDrawCall();
switch (pkDrawCall->GetType())
{
case DrawCall::DCT_DRAW_ARRAYS:
static_cast<const DrawCallDrawArrays*>(pkDrawCall)->Execute(m_pkContext);
break;
}
}
}
[/code]

CommandGroups are what you would call StateGroups - As that is what they are. Commands to change state as far as I understand.

Right now I'm manually iterating the command groups which would obviously be done using a proper iterator when time comes. Same goes with the use of vectors. :)

Just a quick mockup of a renderer::render method. Am I completely on the wrong track. Obviously my framework is written in OpenGL though that shouldn't change much. Context is a context proxy which keeps track of which VAO / state is set etc.

I'm having a hard time figuring out which commands I could define as all I could come up with where the 5 I've shown. I'm also a bit in doubt why you would make a separate Drawcall class instead of having it as a command.

The uniforms are cause me problems as well. In my setup a material contains x techniques which in turn contains x passes which contain x uniforms (default values / auto values set by the framework) and a shader program. Each MeshRes (in the sense you're using it) contains a pointer to a material. Come command queue execution I have to apply / update these uniforms after having bound the shader program. Would that result in a new command type? Would this defeat the purpose of having this highly compacted memory command queue as that would require me to jump to the MaterialPass and iterate all the uniforms updating / uploading them to the GPU.

The following is basicly what I think I need to do
[code]
for each MeshInstance in MeshInstanceList
{
for each SubMesh in MeshInstance
{
Store ShaderProgram // Which program to render using
Store UniformDefaults // Material pass defined uniform defaults
Store UniformAuto // Material pass defined auto-filled uniforms using context state (view, viewprojection, time etc.)
Store TextureDefaults // Material pass defined textures - Set in material definition
Store UniformInstance // Submesh Instance defined uniforms
Store TextureInstance // Submesh Instance defined textures
Store VAO // Submesh buffer data binding
Store DrawCall // Encapsuled

Add RenderInstance to queue
}
}

Sort renderqueue

Submit renderqueue to renderer

for each RenderInstance in renderqueue
{
Update UniformAuto from context

Find and apply WorldTransform on context (used for auto uniforms)

Apply ShaderProgram on context

Apply UniformDefaults
Apply UniformAuto
Apply UniformInstance

Bind TextureDefaults
Bind TextureInstance

Bind VAO
Dispatch DrawCall
Unbind VAO
}
[/code]

Is the above sensible and would it make sense in the context of what Hodgman has proposed?

Oh and thank you very much for all the help you've given me - And the community!
0

Share this post


Link to post
Share on other sites
[quote name='elurahu' timestamp='1309966306' post='4831827']Just a quick mockup of a renderer::render method. Am I completely on the wrong track?[/quote]Yeah that looks similar to what I'm used to. I use something analogous to your "[font="Courier New"]uiUsedCommands[/font]/[font="Courier New"]bApply[/font]" code to ensure commands at the top of the stack take precedence over commands of the same type lower in the stack.

My [font="Courier New"]bApply[/font] test is a bit more complicated though, as it also checks if the command being inspected was already set by the previous render-instance. i.e. if two consecutive render instances use the same material, then all the states from the material's state-group can usually be ignored when drawing the 2nd instance.

My "Iterate render instances" loop is also passed a "default" state-group, which is conceptually put at the bottom of every state-stack. If an instance [i]doesn't [/i]set a particular state [i]and[/i] the default group contains that state, then the default value will be used.
If you don't do this, then you end up with behaviours like -- one object enables alpha blending, and then all following objects also end up being alpha-blended, because they didn't manually specify a "disable alpha blending" command.

Also, with the way your code is at the moment, only a single [font="Courier New"]SetRenderState[/font] command will be applied per instance. If you want to set two [i]different[/i] render-states, only the first one will actually be set at the moment (the second will be ignored). For this reason, I have every different render-state as a different command ID.
[quote]I'm having a hard time figuring out which commands I could define as all I could come up with where the 5 I've shown. I'm also a bit in doubt why you would make a separate Drawcall class instead of having it as a command.[/quote]As above, I've got commands for each different render-state. I've also got commands for each different CBuffer slot and each texture-binding slot (for each type of shader).

I've limited myself to 14 CBuffer slots each for the vertex and pixel shader, so, there's actually 28 different IDs that are associated with the "bind cbuffer" command.

My draw-calls [i]are[/i] actually a command, just like state-changes. However, I split commands into 3 different categories -- general state-changes, draw-calls, and per-pass state-changes.
State-groups can only contain general state-changes. Actual render-instances must use a draw-call command (not a state-change command).
The 3rd category are stored in something similar to a state-group, which is used to set up an entire "pass" of the rendering pipeline -- commands such as binding render-targets, depth-buffers, viewports, scissor tests, etc go into this category.[quote]Come command queue execution I have to apply / update these uniforms after having bound the shader program. Would that result in a new command type?[/quote]There's a bunch of different abstractions for how uniforms are set, depending on your API... GL uses this model you're familiar with, you set the uniforms on the currently bound program... DX9 uses a model where there's a set of ~200 global registers, and any changes made to them persist from one shader to the next... DX10/11 are similar to 9, but you've got a set of bound CBuffers instead of individually bound uniforms.

So, I looked at these abstractions, and decided that the cbuffer approach made the most sense to me. No matter what the back-end rendering API actually is, my renderer deals with cbuffers -- and as as above, I've got 14 cbuffer binding slots/commands per shader type.

The way this is used generally, is that a "shader" state-group on the bottom of the stack contains commands to bind cbuffers that contain default values. The "material" and "object/instance" cbuffers then contains commands to bind their own cbuffers ([i]which override the "default" commands[/i]).

On APIs that don't actually use the cbuffer abstraction, then yes, there's a step that looks at the currently bound cbuffers and sets all of the individual uniforms. I do this step prior to every draw call ([i]with a whole bunch of optimisations to skip unnecessary work[/i]).
Regarding memory layout, I allocate all my cbuffer blocks (which are blobs containing uniforms) from a separate linear allocator.
2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0