Jump to content

  • Log In with Google      Sign In   
  • Create Account

Frostbite rendering architecture question.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
76 replies to this topic

#21 Krypt0n   Crossbones+   -  Reputation: 2572

Like
1Likes
Like

Posted 29 June 2011 - 06:28 AM


But even still, isnt the act of changing shaders a very expensive operation?


Why do so many people think that switching shaders is some horrible thing to do? GPUs can overlap lots of different kinds of work at different stages of the pipeline with different costs. Fragment programs can be prefetched, and can be a virtually free operation, depending on how long the previous draw call takes to complete (if it's really fast, there's less opportunity to hide the time to load the new shader).


back then, when the first 3d gpus arrived, even texture switching was expensive and the same issues carried over to shaders.


The GPU pipeline is split into different sub-pipelines, each works on its "jobs" which are not draw-calls and not primitives, those are simply jobs. if you draw/drew two objects with exact the same settings, it is/was very likely that they share the same pipelines to some degree.

if you change some setup of some sub pipeline, as the whole GPU doesn't track what job belongs to which drawcall or setup (that would be very expensive for little gain), that part of the pipeline and sometimes the whole pipeline had to be flushed (or in some lucky cases just a fence was added, which flushes partially). So the gpu side of the cost wasn't really just switching some resource, it's the stall you have due to the flush that sets the sub-pipelines to idle that costs.

on cpu side, switching shader means often that they need to be prepared, not all features you see on api side are really features, they are often just patched shaders, so something that might look for you like a simple shader switch might be like a recompilation because you have some "weird" texture set or a vertex format that the shader "emulates".

as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).










@Tiago

there are tons of possibilities, I think most ppl rather struggle to limit their 'bits' :)

- shadows on/of

- lightsource type (point/direction/spot)

- fog on/off

- in forward rendering you might have n-lights

- skinning

- morphing

- detail layer

- vertex shading (instead of pixel, for some distance LODs?)

- parallax mapping

- switching between normalmap and bumpmap

- back lighting like on thin sails, flags, vegetation, paper

- some sinus swinging like vegetation underwater or in wind

- rim lighting

- cubemap lighting

- clip (dx10> does not have alphatest hardware)

...

I don't say you have to have all, but some engines have it.

Sponsor:

#22 rapt0r   Members   -  Reputation: 111

Like
0Likes
Like

Posted 29 June 2011 - 06:37 AM

In the first post, you mention the "render blocks -- Geometry & high--level state combinations".
"Geometry" isn't just a vertex buffer and an index buffer , because you can't draw buffers by themselves -- you also need the data that goes into calls like DrawIndexedPrimitives. e.g. at what offset do you start reading the buffer, how many items do you read from the buffer, what kind of primitives are you constructing, etc...

I treat the index/vertex buffers like all other states (they get put into state-groups), and then the "drawable" item that gets put into the queue is a structure that hold the arguments to DrawIndexedPrimitives/other draw calls.


How do you convert High-level 'Drawables' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?

#23 TiagoCosta   Crossbones+   -  Reputation: 2206

Like
2Likes
Like

Posted 29 June 2011 - 03:50 PM

So this is how I'm designing my rendering architecture:

-> ShaderProgram class (contains the shaders and cbuffers):
-> Material class (contains the textures, blend/rasterizer/depth stencil states and other booleans used by this material, and chooses the correct shader permutation based on the booleans and textures provided);
-> Actor class (contains the vertices/indices buffers, instance buffers (if needed), pointers to the materials used by this actor, the world/bone matrices and the bounding boxes);

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?

I'm totally open to suggestions :wink:

Also, to the GBuffer pass I sort the objects in front-to-back order, then in the second geometry/material pass how should I order the objects? By shader programs, blend states?

#24 Quat   Members   -  Reputation: 404

Like
0Likes
Like

Posted 29 June 2011 - 07:35 PM

Sorry to ask a new question in this thread--I can make a new topic if that is better.

How do you create all your shader variations. Right now I am using the effects framework with compile time flags to switch things on and off and I literally compile the shaders with the flags set with the options I want enabled. Obviously I only need to type this out once, but it still seems like there is a better way than (pseudocode):

TwoLightsTexReflect = CompileShader(2, true, false, false, true);
OneLightsTexAlphaTestFog = CompileShader(1, true, true, true, false);
....
ugh
-----Quat

#25 Krypt0n   Crossbones+   -  Reputation: 2572

Like
1Likes
Like

Posted 29 June 2011 - 08:02 PM

So this is how I'm designing my rendering architecture:

-> ShaderProgram class (contains the shaders and cbuffers):
-> Material class (contains the textures, blend/rasterizer/depth stencil states and other booleans used by this material, and chooses the correct shader permutation based on the booleans and textures provided);
-> Actor class (contains the vertices/indices buffers, instance buffers (if needed), pointers to the materials used by this actor, the world/bone matrices and the bounding boxes);

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?

all my drawcalls are send from one tight loop that's having pointer to all needed VB/IB/CB/Shader/Textures, there is no need nowadays to have specialized drawing functions in all type of entities.

Also, to the GBuffer pass I sort the objects in front-to-back order, then in the second geometry/material pass how should I order the objects? By shader programs, blend states?



I sort the objects once, not per pass, to avoid tiny differences that could result in different rendering orders (e.g. if you dynamically detect cases where instancing could be a win and your g-pass without instancing is setting different z-values than your geometry/material-pass). I also sort the list with a stable-sort, as you could otherwise encounter object flickering due to changing draw order in-between frames (e.g. two decals on a wall that change draw order would be noticeable).

the sorting order shall be in a way that you have as few state switches as possible. if you'd have two mesh types and 100 different shader to draw them, it wouldn't be smart to give the shaders a higher priority, as you'd switch 100 shader and probably 200 times meshes (VB/IB), if you'd sort by meshes, you'd have 2 switches for meshes and then up to 100 shader switches per mesh. This is also hardware, driver, platform dependent, no general way it would always work best. But if you sort and organize the pipeline like this, you can't be worst than the vanilla immediate rendering and usually it'll be a win.






#26 Krypt0n   Crossbones+   -  Reputation: 2572

Like
2Likes
Like

Posted 29 June 2011 - 08:06 PM

Sorry to ask a new question in this thread--I can make a new topic if that is better.

How do you create all your shader variations. Right now I am using the effects framework with compile time flags to switch things on and off and I literally compile the shaders with the flags set with the options I want enabled. Obviously I only need to type this out once, but it still seems like there is a better way than (pseudocode):

TwoLightsTexReflect = CompileShader(2, true, false, false, true);
OneLightsTexAlphaTestFog = CompileShader(1, true, true, true, false);
....
ugh

how bout



for(int a=0;a<1<<bits;a++)

{

std::string Flags;

if(a&1) Flags+="TwiSideLight";

if(a&2) Flags+="CheeseTexture"; 

if(a&4) Flags+="GameOfLifeTexture";  

...

Compile(...,Flags...);

}


you dont really want to write 64k of permutations, even with a lot of spare time ;)






#27 MJP   Moderators   -  Reputation: 11380

Like
0Likes
Like

Posted 29 June 2011 - 11:18 PM

The older (DX9 and DX10) versions of the effect framework supports arrays of shaders. So you could make an array of N pixel shaders for N light sources, and then in your app code set an integer to specify the number of lights and the framework would select the correct version of the shader to use. The skinning sample in the SDK does this.

#28 Hodgman   Moderators   -  Reputation: 30388

Like
3Likes
Like

Posted 30 June 2011 - 07:31 AM

How do you convert High-level 'Drawables' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?

So I've basically got: DrawCalls, Resources (cbuffers, vertex/index buffers, shaders) and StateGroups --- you can use these primitives to compose layers of functionality.
It might be easier to describe with some pseudo-code. For example, we could have a PlayerEntity, which has a ModelInstance, which has a Geometry, which has a Mesh, which has a DrawCall/Material pair, which has a Shader:[source lang=cpp]struct ShaderRes{ StateGroup* state; ShaderPrograms* programs; vector<cosnt CBuffer*> defaults; Shader(const char* name) { programs = Load(name); defaults = programs->GetDefaultCBuffers(); state = new StateGroup(); state->Add( new BindShaderCommand(programs) ); for( i=0; i<defaults.size(); ++i ) state->Add( new BindCBufferCommand(defaults[i]) ); }}struct MaterialRes{ StateGroup* state; ShaderRes* shader; vector<CBuffer*> cbuffers; Material( ShaderRes* s, vector<CBuffer*>& v ) { shader = s; cbuffers = v; state = new StateGroup(); state->Add( new BindShaderCommand(programs) ); for( i=0; i<cbuffers.size(); ++i ) state->Add( new BindCBufferCommand(cbuffers[i]) ); }}struct MeshRes{ DrawCall* draw; MaterialRes* material;}struct GeometryRes{ StateGroup* state; VertexBuffer* vb; IndexBuffer* ib; vector<MeshRes*> meshes; Material() { state = new StateGroup(); state->Add( new BindVertexBufferCommand(vb) ); state->Add( new BindIndexBufferCommand(ib) ); }}struct ModelInstance{ StateGroup* state; CBuffer* constants; Geometry* model; ModelInstance() { constants = new CBuffer_InstanceData(); constants->SetProperty( "WorldMatrix", Identity ); } void Draw( RenderQueue& queue ) { for( i=0; i!=model->meshes.size(); ++i ) { Mesh* mesh = model->meshes[i]; StateGroup* stateShader = mesh->material->shader->state; StateGroup* stateMaterial = mesh->material->state; StateGroup* stateGeometry = mesh->state; StateGroup* stateInstance = state; StateGroup* states[4] = { stateInstance, stateGeometry, stateMaterial, stateShader }; DrawCall* draw = mesh->draw; queue.Submit( draw, states, 4 ); } }}struct PlayerEntity{ ModelInstance* body; ModelInstance* gun; void Draw( RenderQueue& queue ) { body->Draw(queue); gun->Draw(queue); }}[/source]

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?

An actor might be made up of several draw calls, and those draw-calls might need to be drawn at different stages of the pipeline -- e.g. if part of the actor is opaque and part is translucent.

To easily deal with this, I would have each actor submit it's meshes/drawables/whatever to the renderer, and have the renderer call the actual "Draw" functions at the appropriate times. Or, have the actor 'register' it's "drawables" with some kind of rendering manager in advance, and let that manager object perform the submission on behalf of the actor (this way the actor doesn't have a draw function at all).


#29 Hodgman   Moderators   -  Reputation: 30388

Like
1Likes
Like

Posted 30 June 2011 - 07:36 AM

How do you convert High-level 'Drawables' to RenderInstance objects? Is it common structure for all entities or every entity type has its one structure, that low level render system knows about?

So I've basically got: DrawCalls, Resources (cbuffers, vertex/index buffers, shaders) and StateGroups --- you can use these primitives to compose layers of functionality.
It might be easier to describe with some pseudo-code. For example, we could have a PlayerEntity, which has a ModelInstance, which has a Geometry, which has a Mesh, which has a Material, which has a Shader:[source lang=cpp]//binds the shader programs and default shader valuesstruct ShaderRes{ StateGroup* state; ShaderPrograms* programs; vector<cosnt CBuffer*> defaults; ShaderRes(const char* name) { programs = Load(name); defaults = programs->GetDefaultCBuffers(); state = new StateGroup(); state->Add( new BindShaderCommand(programs) ); for( i=0; i<defaults.size(); ++i ) state->Add( new BindCBufferCommand(defaults[i]) ); }}//binds some useful shader values, textures, etcstruct MaterialRes{ StateGroup* state; ShaderRes* shader; vector<CBuffer*> cbuffers; MaterialRes( ShaderRes* s, vector<CBuffer*>& v ) { shader = s; cbuffers = v; state = new StateGroup(); for( i=0; i<cbuffers.size(); ++i ) state->Add( new BindCBufferCommand(cbuffers[i]) ); }}//a draw-call (i.e. "sub-mesh") paired with a materialstruct MeshRes{ DrawCall* draw; MaterialRes* material;}//binds the index/vertex buffersstruct GeometryRes{ StateGroup* state; VertexBuffer* vb; IndexBuffer* ib; vector<MeshRes*> meshes; GeometryRes() { state = new StateGroup(); state->Add( new BindVertexBufferCommand(vb) ); state->Add( new BindIndexBufferCommand(ib) ); }}//an actual object in the world. Links to the above resources, and binds per-instance data, like a world-matrix.struct ModelInstance{ StateGroup* state; CBuffer* constants; Geometry* model; ModelInstance() { constants = new CBuffer<InstanceData>(); constants->SetProperty( "WorldMatrix", Identity ); } void Draw( RenderQueue& queue ) { for( i=0; i!=model->meshes.size(); ++i ) { Mesh* mesh = model->meshes[i]; //build the state-stack for this draw-call StateGroup* stateShader = mesh->material->shader->state; StateGroup* stateMaterial = mesh->material->state; StateGroup* stateGeometry = mesh->state; StateGroup* stateInstance = state; StateGroup* stateStack[4] = { stateInstance, stateGeometry, stateMaterial, stateShader }; DrawCall* draw = mesh->draw; queue.Submit( draw, stateStack, 4 ); //<-- here is the part where we actually submit something for drawing } }}//a higher-level game object made up of several modelsstruct PlayerEntity{ ModelInstance* body; ModelInstance* gun; void Draw( RenderQueue& queue ) { body->Draw(queue); gun->Draw(queue); }}[/source](n.b. completely made-up code to try and get some ideas about composition across)

Should I implement drawing functions in the Actor class and call them when the actor needs to be drawn or get pointers to the buffers and call the DrawIndexed() functions in the renderer?

An actor might be made up of several draw calls, and those draw-calls might need to be drawn at different stages of the pipeline -- e.g. if part of the actor is opaque and part is translucent.

To easily deal with this, I would have each actor submit it's meshes/drawables/whatever to the renderer, and have the renderer call the actual "Draw" functions at the appropriate times.

As Krypt0n mentioned, you might not even want the Actor to be responsible for this submission though -- you could have the actor 'register' it's "drawables" with some kind of rendering manager in advance, and let that manager object perform the submission on behalf of the actor (this way the actor doesn't have a draw function at all).

#30 TiagoCosta   Crossbones+   -  Reputation: 2206

Like
1Likes
Like

Posted 30 June 2011 - 12:10 PM

@Hodgman

Should the class StateGroup look like this?
class StateGroup
{
public:
   	Add(BindShaderCommand command);
   //Lots of different Add methods

   //Lost of different Get methods
   
private:
    	ShaderPrograms* program;
    	vector<CBuffer*> cbuffers;
    	VertexBuffer* vBuffer;
    	IndexBuffer* iBuffer;
    	//etc, etc, etc
}

What should the DrawCall and various Bind*Something*Command structs look like?

What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on

#31 Hodgman   Moderators   -  Reputation: 30388

Like
3Likes
Like

Posted 30 June 2011 - 06:23 PM

Should the class StateGroup look like this? What should the DrawCall and various Bind*Something*Command structs look like?

Conceptually, mine looks more like[source lang=cpp]class StateGroup{public: typedef std::vector<RenderState*> StateVec; void Add(RenderState* s) { states.push_back(s); } StateVec::const_iterator Begin() { return states.begin(); } StateVec::const_iterator End() { return states.begin(); } private: StateVec states;};class RenderCommand{public: virtual ~RenderCommand(){} virtual void Execute( RenderDevice& ) = 0;};class DrawCall : public RenderCommand {};class RenderState : public RenderCommand{ enum StateType { BlendMode, VertexBuffer, CBuffer0, CBuffer1, /*etc*/ }; virtual StateType GetType() const = 0;};//Dx9 implementationclass BindVertexBuffer : public RenderState{public: void Execute(RenderDevice&); StateType GetType() { return VertexBuffer; }private: IDirect3DVertexBuffer9* buffer;};class DrawIndexedPrimitives : public DrawCall{public: void Execute(RenderDevice&);private: D3DPRIMITIVETYPE Type; INT BaseVertexIndex; UINT MinIndex; UINT NumVertices; UINT StartIndex; UINT PrimitiveCount;};[/source]In practice though, for performance reasons there's no std::vectors of pointers or virtual functions -- the state-group is a blob of bytes that looks something like:
|size |bitfield |number   |state #0|state #0|state #1|state #1|...
|in   |of states|of states|type    |data    |type    |data    |...
|bytes|contained|contained|enum    |        |enum    |        |...

What stops you from using a single StateGroup and use it in the whole hierarchy? I guess in the MaterialRes you could get the StateGroup from the ShaderRes and so on

Nothing, it's perfectly valid to merge groups together like that if you want to Posted Image
However, in this case, the instance-group might be shared between a couple or draw-calls (the number that make up a particular model), the geometry group might be shared between dozens of draw-calls (that model times the number of instances of that model), the material group might be shared between hundreds of draw-calls (if the same material is used by different models) and the shader group might be shared between thousands (if the same shader is used by different materials).
The 'stack' kinda forms a pyramid of specialization/sharing, where the bottom layers are more likely to be shared between items, and the top layers are more likely to be specialized for a particular item.

#32 TiagoCosta   Crossbones+   -  Reputation: 2206

Like
1Likes
Like

Posted 02 July 2011 - 06:53 AM

To use this architecture with deferred lighting a.k.a. light pre-pass I've added an extra StateGroup to the ShaderRes called gBufferState that contains the shader program to draw the objects to the G-Buffer and the original state var contains the shader program to draw the object in the second geometry pass. Or should I create a new PassRes struct? I guess it doesn't make much difference.

Regarding shader permutations: I can use the same constant buffer struct in all shader permutations and then the each shader permutation uses the constants that it needs right?

#33 Eric Lengyel   Crossbones+   -  Reputation: 2327

Like
2Likes
Like

Posted 02 July 2011 - 10:17 PM

as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).


The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader. In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)

#34 Krypt0n   Crossbones+   -  Reputation: 2572

Like
1Likes
Like

Posted 04 July 2011 - 05:17 AM

as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).


The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader.

for the HW I know it causes recompilation and even on some dx9 hardware, depending on framebuffer configuration, it's done in shaders and not in the rop.




In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)


it's not a misstake, removing it from the pipeline just reflects in a better way how hardware works, this allows better shader. when the driver "recompiles" shaders, it's actually just patching them by NOPing out some area or adding some assembly snipples that e.g. rejects pixel based on the alpha. if you embed a "clip"/"kill" into your shader, then it's included into the optimization process and on all statges of compilation (be it fxc or driver frontend or backend) the pixel removel is moved to the front, so all unnecessary computations are avoided by an early out or at least by disabling unneded work (e.g. texture units are not causes any memory traffic for pixel that have been clipped).

I wish all the rop computations would be programmable, and also the texture units, at least the decompression to the L1 texture caches :)

#35 MJP   Moderators   -  Reputation: 11380

Like
1Likes
Like

Posted 04 July 2011 - 01:27 PM

Yeah I don't see the point in having a fixed-function feature in the pipeline when a programmable one is available, particularly if the fixed-function one is going to be more limiting without having any better performance. But we're getting off-topic here. :P

#36 Eric Lengyel   Crossbones+   -  Reputation: 2327

Like
0Likes
Like

Posted 04 July 2011 - 08:26 PM

for the HW I know it causes recompilation


What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?

#37 Andrew Kabakwu   Members   -  Reputation: 728

Like
0Likes
Like

Posted 05 July 2011 - 08:07 AM

Great discussion, am learning a lot from this.
I have questions that I hope can be answered -
1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)
2. With this system, it seems all programs are based on a single source file. How would users be able to use their own shader code and not just the supplied one?
3. Can we have 2 programs with the same shader flags?

Seems like we have a single list where all progams are stored.
However, what if we want more that one type of shader source from which programs are created.
For instance, we might have source for ligthing, one for terrain, and one user defined code.
How can we cope with this? Would there be shader flag for LIGHTING_SHADER, TERRAIN_SHADER?

Basically, am thinking of how to make this work in a data driven way where the user can change the source code for the shaders and chose what the shader flag bits mean and supply the #define strings per shader flag.

#38 Krypt0n   Crossbones+   -  Reputation: 2572

Like
0Likes
Like

Posted 05 July 2011 - 06:27 PM

1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)


to be really flexible, you create a second file with the flags used in your shader, in my case they have the same name, just different extensions



toon.hlsl

toon.flags


flags is just a list of flags, each line corresponds to a bit.

my material files allow you to set flags (xml files), then I match the material flag-bits to the ones a shader sets. that works completely without recompilation.

2. With this system, it seems all programs are based on a single source file. How would users be able to use their own shader code and not just the supplied one?



it was just an example on a single file, but you can use it on as many files as you want, I reference to the shader name in my material files, but you could also use some bit-mask to index into a shader array, it's up to you. I don't really see a limitation.

3. Can we have 2 programs with the same shader flags?

why not?

Seems like we have a single list where all progams are stored.
However, what if we want more that one type of shader source from which programs are created.
For instance, we might have source for ligthing, one for terrain, and one user defined code.
How can we cope with this? Would there be shader flag for LIGHTING_SHADER, TERRAIN_SHADER?

that's up to your implementation, but as you run short on flags (usually I do, at least), I would recomment to not use flags if not needed. those are anyway different shader for different materials. flags should be used if you have a specific type of shader that you permute, e.g. you provide a vertexstream with bitangen&binormal or just with a normal, you provide a bumpmap or a normalmap. those could be flags where you wouldn't want to write a special shader.

but you are also free to use "#include" in your shader, having some highlevel shader like "terrain" and "toon" and "skin", those can include common thing like usual vertex streams, texture sets, modification (e.g. a sin wave flag), which are applied independent of the high level things (like blending terrain layers, or subsurface scattering for skin etc.).

#39 Eric Lengyel   Crossbones+   -  Reputation: 2327

Like
0Likes
Like

Posted 05 July 2011 - 07:22 PM

In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)


it's not a misstake, removing it from the pipeline just reflects in a better way how hardware works, this allows better shader. when the driver "recompiles" shaders, it's actually just patching them by NOPing out some area or adding some assembly snipples that e.g. rejects pixel based on the alpha. if you embed a "clip"/"kill" into your shader, then it's included into the optimization process and on all statges of compilation (be it fxc or driver frontend or backend) the pixel removel is moved to the front, so all unnecessary computations are avoided by an early out or at least by disabling unneded work (e.g. texture units are not causes any memory traffic for pixel that have been clipped).


I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders, but using that option or the alpha test should be a choice left to the programmer. The hardware still has dedicated alpha-testing capabilities that would be faster in some cases. Removing the alpha test actually makes the API a worse reflection of the true hardware functionality.

#40 Hodgman   Moderators   -  Reputation: 30388

Like
1Likes
Like

Posted 05 July 2011 - 08:00 PM

To use this architecture with deferred lighting a.k.a. light pre-pass I've added an extra StateGroup to the ShaderRes called gBufferState that contains the shader program to draw the objects to the G-Buffer and the original state var contains the shader program to draw the object in the second geometry pass. Or should I create a new PassRes struct? I guess it doesn't make much difference.

In my 'shader' objects (similar to what Microsoft calls an 'Effect' or a 'Technique') I've actually got multiple passes defined. Each pass then has a list of permutations.
Depending on which part of the pipeline you're rendering for (shadow, g-buffer, etc), a different pass is selected from the shader (and a permutation is then selected from that pass).

Regarding shader permutations: I can use the same constant buffer struct in all shader permutations and then the each shader permutation uses the constants that it needs right?

Yeah, there's no way for the game-side code to reliably predict which permutation will be chosen, so you shouldn't change your cbuffer layouts for different permutaitons. You can however have certain cbuffers that are used by some perm's and ignored by others.


1. How would this system allow users to add their own shader options and defines for these options? (without recompiling engine code)

The engine itself doesn't have to know what the options are -- it only knows that there's 64 bits worth of options.



In Horde3D, they give you 32-bits worth of options, and to make a new option, you just put a new #ifdef into your shader code. They use a naming convention where if a pre-processor token starts with _F_## (where ## is 0 to 31), e.g. if you shader contains #ifdef _F_06_NormalMapping, then if someone enables option #6, the engine will select a permutation that was compiled with normal-mapping code.

At work, we actually use a modified version of HLSL where we can write something like:
option normalMapping : 6;
...
if( normalMapping ) {
...
}

2. With this system, it seems all programs are based on a single source file.Seems like we have a single list where all progams are stored.

No, all permutations of a single source-file come from that source file. If you have a 2nd source file, it has it's own list of permutations.


What mainstream GPU, specifically, do you believe doesn't have alpha test capabilities outside the pixel shader?

Is there any way to know??

I agree that having the kill instruction early in a shader can provide a performance increase for complex shaders

Doesn't texkill/clip/discard just set a bit indicating that the ROP should discard, and not actually skip the shader instructions that come after it? Or has this been improved on newer cards?





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS