Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


I love shader permutations! (not sure if ironic or not)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 DwarvesH   Members   -  Reputation: 471

Like
1Likes
Like

Posted 15 March 2014 - 11:35 AM

So I decided to support most reasonable lighting setups. And I also want them to perform at maximum speed, so run-time shader dynamic branching is out of the question.

 

So I came up with the concept of render profiles. Each combination of render profiles results in a unique pixel shader. All permutations are automatically generated.

 

Render profiles support the following settings for now:

  • Ambient mode. Can be off, flat color modulation, spherical harmonics or environment map ambient lighting. For metallic or mixed objects off ambient lighting is the same as flat, because of reasons and PBR.
  • Baked ambient AO. On or off. Baked AO only gets displayed in the ambient component because you shouldn't have AO in strong direct light.
  • SSAA mode: three settings. Off, medium and super duper ultra for now.
  • HDR mode. Currently only off and a single tone-mapping operator is supported. I'll add things like filmic later.
  • Material nature: metallic or dielectric. Or mixed, where you can lerp between metallic and dielectric.

This is pretty comprehensive. I found that a handful of render profiles are enough to render scenes.

 

The only problem is the number of permutations. With this limited setup there are 120 different techniques. I can easily see this getting over 1000. They are autogenerated so not a big problem, but I was wondering how others do this.

 

Manual shader source management is out of the question. Even a custom tailored solution that only woks with exactly what I want to render and that shader is compiled for my setup only will have dozens of permutations, so generated seems to win. The 120 techniques do occupy 100 KiB of source code and take 10 second to compile under FXC, but precompiled loads very fast.

 

So my question  for more experienced folk: is this a good approach? Half live 2 uses almost 2000 permutations, so I'm not the only one doing this. And pretty much everyone uses permutations to handle different light setups and numbers. Unless they write those fancy shaders with for loops.



Sponsor:

#2 Jason Z   Crossbones+   -  Reputation: 5283

Like
1Likes
Like

Posted 15 March 2014 - 01:07 PM

I'm not a game developer, so take my advice for what its worth :)  You have already made the case for why you need to use permutations in the beginning of your post.  Max performance, and support for a number of different lighting techniques pretty much mandates that you have a bunch of different permutations of your shaders.  I would also suggest that if you can auto generate your shaders, then absolutely you should - with one caveat.  If you want max performance, then you should allow special customization for cases where you find that the generated code isn't all that efficient.

 

Other than that, you should be pre-compiling your shaders anyways, so really that doesn't affect your end product.  It seems to me that you have already made a strong, logical case for going the way that you have, so I agree with your approach!



#3 Juliean   GDNet+   -  Reputation: 2720

Like
2Likes
Like

Posted 15 March 2014 - 01:08 PM


And I also want them to perform at maximum speed, so run-time shader dynamic branching is out of the question.

 

Ironically, on modern hardware, you are likely going to shoot yourself in the food with this mindest. Switching shaders, from what I recall, can be almost as, if not even more, expensive than dynamic branching. Especially when, as in permutations, all branches will take the same path for all pixels of the same mesh.

 

Regarding your actual question, it appears to me that you are putting too much weight at those "shader profiles". From the number of techniques you statet, am I correct to assume all those settings are part of one uber-shader? In my own implementation, stuff like SSAO are all different shaders applied in different stages. I am using a deferred renderer, so my implementation might vary, but especially HDR (and maybe SSAO too) can be done in a post-process-effect, therefore be their own shader. This not only reduces the number of permutations, but also removes some - no need for an "off" permutation, off simply equals to not rendering the pass.



#4 DwarvesH   Members   -  Reputation: 471

Like
1Likes
Like

Posted 15 March 2014 - 01:58 PM

I'm not a game developer, so take my advice for what its worth smile.png  You have already made the case for why you need to use permutations in the beginning of your post.  Max performance, and support for a number of different lighting techniques pretty much mandates that you have a bunch of different permutations of your shaders.  I would also suggest that if you can auto generate your shaders, then absolutely you should - with one caveat.  If you want max performance, then you should allow special customization for cases where you find that the generated code isn't all that efficient.

 

Other than that, you should be pre-compiling your shaders anyways, so really that doesn't affect your end product.  It seems to me that you have already made a strong, logical case for going the way that you have, so I agree with your approach!

 

OK, thanks for the feedback! One good thing I noticed is that my prototyping time is way down. I can write a new sub-shader, run my generation and try it out with it affecting all permutations.

 

The generated code for an individual shader is not that short but very readable. So I can track down bugs, do changes in the generated version and if needed feed them back to the generator.

 

So, what are you then? You have a huge rep...

 

 

 


And I also want them to perform at maximum speed, so run-time shader dynamic branching is out of the question.

 

Ironically, on modern hardware, you are likely going to shoot yourself in the food with this mindest. Switching shaders, from what I recall, can be almost as, if not even more, expensive than dynamic branching. Especially when, as in permutations, all branches will take the same path for all pixels of the same mesh.

 

Regarding your actual question, it appears to me that you are putting too much weight at those "shader profiles". From the number of techniques you statet, am I correct to assume all those settings are part of one uber-shader? In my own implementation, stuff like SSAO are all different shaders applied in different stages. I am using a deferred renderer, so my implementation might vary, but especially HDR (and maybe SSAO too) can be done in a post-process-effect, therefore be their own shader. This not only reduces the number of permutations, but also removes some - no need for an "off" permutation, off simply equals to not rendering the pass.

 

 

No uber shaders, only relevant settings are accessed in one permutation/render profile. I had uber shaders in the dynamic branching version. They greatly vary in size, with the flat ambient shader being one line of code and the mixed metal dielectric lerp environment mapped lighter being quite long. The former only depends on the material ambient color set as a pixel shader constant, while the alter has a ton of material properties and sampler dependencies.

 

Every permutation can be easily copied and pasted into any FX file as long as the 200 lines of code of common structure and variables are included.

 

I am using forward rendering, so I pretty much need to either change change shader technique or lighting constants for every single object, but the total pool of techniques is low. One disadvantage is that you can't use instancing, but you can't really use instancing with forward rendering only with simple lighting schemes.

 

I wrote the whole thing today, so it is not perfect yet. Some things shouldn't be there.

 

HDR shouldn't result in a permutation, and once I add it to the post processing pipeline I will reduce my permutations by half. But today is the first time I wrote a HDR shader, so I went with the simplest solution.

 

SSAO will again be added to the post processor, so no further permutations. I first need to figure out how the hell to do SSAO in a forward renderer. I tried before but it is very grainy.

 

SSAA is a weird beast. I'm using fixed buffer size SSAA so you must run it every single pixel shader. To keep the permutations down I only included three settings out of the 6 I have implemented. The final version will have off, almost medium and high since it is so expensive, leaving out things like the super duper ultra high version (11x SSAA). That is only there as a reference when I need to compare to maximum achievable quality.

 

And If I decide to drop SSAA I will reduce permutation by 66%, but I do hope to keep it. I don't care about surface aliasing, but SSAA makes the render rock solid when objects are moving around. I use it for it's temporal aliasing properties and high frequency data shimmering reduction.

 

And I plan to keep it around for a secret screenshot mode, with even more hardcore options. Maybe 30x SSAA...



#5 kalle_h   Members   -  Reputation: 1563

Like
1Likes
Like

Posted 16 March 2014 - 06:44 AM

Kill all permutations that are not really needed. More options does only confuse artists and make debugging harder. Also when you have minimal amount of options its easier to optimize those.



#6 DwarvesH   Members   -  Reputation: 471

Like
0Likes
Like

Posted 17 March 2014 - 03:17 AM

Kill all permutations that are not really needed. More options does only confuse artists and make debugging harder. Also when you have minimal amount of options its easier to optimize those.

 

I would love to do that, but there are 3 things preventing me to do so:

1. My render models are not done yet. They are good enough if I wished to ship a game tomorrow, but I'm doing active research on rendering. You know how that goes. It will be probably doing research 20 years from now if I don't leave the field.

2. I have personally no idea how I want the final render setup to be. Having it parametrized and with live response to my changes and lighting conditions will allow me to determine the setup I want to use by trial and error.

3. I am the engine writer (and an "artist"). I need to worry about all the options. Once I have them figured out, other artist won't see only a small % of the options and use templates anyway.

 

But you are right with too many permutations. Once I figure out permutations that nobody would need, they go. I wrote the second version of the system and it is a lot better, but the number of permutations is getting out of hand.

 

So my first design was really bad. You should never write something like this after a few hours long sprint though SIGGRAPH slides. I was heavily biased towards PBR.

 

My first mistake was for render profiles to correct your input to achieve better PBR. See the ambient correction I was talking about. From now on render profiles are guaranteed to obey 100% of the input requirements. Did you create a nonsensical profile? It will render as such. Want PBR rendering? Make sure you set up your profile correctly for that.

 

The second mistake was making the render profiles too abstract and aware of high level concepts like metals and dielectrics. Nature is one thing, rendering is another. In rendering metallic is not an absolute undivisible property of a render profile. It just happens that a certain combination of simple low level profile parameters will give the metallic look.

 

So from now on render profiles obey all input and have a lot more low level inputs and no high-level abstract concepts.

 

They don't exist anymore also. Logically they still exist, but practically I split them up in two. Render profiles have been neutered and they only have a few general properties that won't really change during a scene render, like HDR and SSAA.

 

The rest of the parameters are stored in another input, called the Material. I know, very surprising.

 

OK, this makes much more sense. I don't know why I did no come up with this layout before. Maybe you should not create important designs in the weekend. Now you have a handful of render profiles, for some games even only one. This profile gets changed in the option menus.

 

Then you have as many as you need materials. The properties of the material that tell you what to render are linked to permutations and are called a material template. The rest ore only shader inputs, like intensity and color and are called material properties. Material properties are the traditional concept of material that you find in other shaders. Currently they have: surface gloss, ambient modulate, specular color, diffuse modulate and specular modulate. I'll try to keep this list as short as possible.

 

Material templates on the other hand need to be resolved. The good news is they they can be resolved once after shader load or a device reset/options change and are good to go. The bad news is that all of them need to be resolved and lighting change might necessitate a fresh resolve. Deciding if a lighting change needs to trigger a resolve is probably more expensive and error prone than doing it every time. It is not a slow process by any means and currently I'm resolving each frame for each template and I'll continue to do so to test out performance. With any luck you can keep this approach up to very high batch counts. Or if your lighting setup is mostly static you are set anyways.

 

Material templates have a lot of properties and the list is getting only bigger, but here are the main properties for the ambient component:

  • Type: can be none, color or diffuse. None is rarely used, but I can imagine uses for it. You may have small side-levels like dark and atmospheric caves where you want 100% of you lighting to come from diffuse and specular direct lighting. No indirect lighting, so no ambient component. Color ambient lighting does not sample the diffuse map so it is mostly useful for metals and looks horrible without IBL. Diffuse ambient is identical to color, only multiplied by the diffuse map RGB.
  • Baked AO: on or off.
  • SH modulate: on or off.
  • Environment map: on or off.
  • Reflections: on or off.

I am yet to add a few more very important properties, with two key one being:

  • Secular color modulate: on or off.
  • Fresnel modulate: on or off

But even with these settings, I managed to do some interesting renders. Here is the same physically implausible metal rendered at 4 different gloss levels:

 

 
This is a true "metal" setup, so the diffuse component of the map is ignored. The only reason the "albedo" check box is checked is because the alpha-component has the baked AO. If I uncheck the albedo map, the baked AO option won't work because the alpha channel will be 1.
 
So how do those renders look?
 
And more importantly? Is my new model better? I think it is is. It is more flexible and customizable, while at the same time being low level and having no built in logic. It will render exactly what you tell it, not decide what to render based on what you tell it.
 
There are some downsides too. The old smart model would interpret your settings and for a very few set of input metal configurations, render stuff more complexly than the current model is able. There is a solution in the current model: layering. A few key configurations, all of them shinny metal will require a 2 layer material. Or in nex-tgen it will be common to render everything with 2-3 layers. I'll cross that bridge when I reach it. Layering is theoretically simple (layer 1 color + layer 2 color), but I'm reading up on the Oren-Nayar model. That looks fun and real time approximable.
 
The second problem is the number of permutations. The above renders use all a single permutation. So a single material template. The only material parameter that changes is the glossiness.
 
This doesn't change the fact that the shader has almost 200 permutations.
 
So I'm introducing the 1k rule. When I reach 1k permutations, I'll improve the system to go under 1k. The first time this happens will be simple: get rid of PS HDR and move it to post processing. That will reduce the number of permutations by 50%. The second time it happens, it will be just as easy. Get rid of baked AO permutations. Instead with 2-3 extra ALU, this can be computed at run time. This will reduce again the number of permutations by 50%. After this it is going to get hard and reductions will be single digit % and really tax the template resolve phase. But this gives me 4k permutations to experiment with. All auto-generated. Currently all of them are as well optimized as if I wrote them by hand and look identical to human written shaders.


#7 Stainless   Members   -  Reputation: 1060

Like
0Likes
Like

Posted 17 March 2014 - 03:17 AM

I would take the generated shaders and run them through a shader optimiser to see how efficient your code ends up.

 

I have found that the shader compilers vary hugely from platform to platform. I had a major problem with one shader.

 

I ran it on a Mali device and compilation failed because the compiled shader ended up as 527 instructions. (512 limit)

 

So I compiled it with the Mali offline shader compiler, 372 instructions. A massive difference just because of the quality of the shader compiler on the device.

 

Just to make things really nice, the device does not support binary shaders. I can't use the offline compiler and prebuild the shaders.

 

I ran the shader through a source code shader optomiser, the differences between what I had written and the output of the optomiser were trivial. I really didn't expect it to have any effect on the instruction count at all. 509 instructions.



#8 DwarvesH   Members   -  Reputation: 471

Like
0Likes
Like

Posted 17 March 2014 - 03:42 AM

I would take the generated shaders and run them through a shader optimiser to see how efficient your code ends up.

 

I have found that the shader compilers vary hugely from platform to platform. I had a major problem with one shader.

 

I ran it on a Mali device and compilation failed because the compiled shader ended up as 527 instructions. (512 limit)

 

So I compiled it with the Mali offline shader compiler, 372 instructions. A massive difference just because of the quality of the shader compiler on the device.

 

Just to make things really nice, the device does not support binary shaders. I can't use the offline compiler and prebuild the shaders.

 

I ran the shader through a source code shader optomiser, the differences between what I had written and the output of the optomiser were trivial. I really didn't expect it to have any effect on the instruction count at all. 509 instructions.

 

There is more than one shader compiler out here? This is new to me blink.png! Never used anything else than FXC in fx_2_0 mode. And I know about Cg.

 

But I did notice that is not perfect. When I first implemented SSAA the compiler lost its ability to keep track of temporary values and I would run out of registers in code it had no busyness to.

 

And FX is slow as hell. I'm thinking of implementing ghetto constant buffers.

 

Here's the thing: FX under DirectX 10+ behaves weirdly and different enough from DirectX 9. I'm having massive problems with it. And I heard that from 11 on the FX framework is deprecated. I won't be using DirectX 9 forever. Even diehardness has an expiration date.

 

But from 10 on constant buffers are used. You can still use FX variables, but something weird is going on.

 

And maintaining a constant buffer version and a FX version is too much work and no fun.

 

So here's my idea: create on the CPU side the constant buffer layout. On DX10 use normal constant buffers. On DX9 ignore the FX framework and set shader constants directly. Set the whole constant range from c0 to cN with you aligned constant buffer on the CPU side in one call, with extra care on the layout matching side.

 

Would something like this work?



#9 Hodgman   Moderators   -  Reputation: 31781

Like
1Likes
Like

Posted 17 March 2014 - 05:33 AM

The only problem is the number of permutations. With this limited setup there are 120 different techniques. I can easily see this getting over 1000. They are autogenerated so not a big problem, but I was wondering how others do this.

Regarding the problem in the OP - the permutations should be based on the needs of a particular game, not every possible game that could be made with the engine.

Your permutation system should be flexible and data-driven, so that you can add/remove shader options quickly and easily.

For one game, you'll probably pick one ambient lighting solution (e.g. SH diffuse and IBL specular for everything), one AO solution, etc...
Or if you've got multiple AO solutions, they don't need to result in multiple permutations. e.g. on the last game I worked on, we rendered stencil-shadows, pre-baked shadows, directionally-traced "SSAO" shadows and shadow-mapped shadows all into a shared 720p screen-space buffer. The forward rendering shaders then just took a single texture sample from this buffer to determine their shadow & occlusion data. We could mix and match techniques and iterate on ideas over the project without having to touch the forward lighting shader!
 
For SSAA, have you benchmarked your shader-based super-sampling with just rendering at a higher resolution? If it's an "uber" detail setting, aimed at high-spec PC users only, they might both be equivalent. Or, if it's a "beyond uber" setting for generating screenshots for magazines, then performance doesn't matter and it may be better to implement the simplest solution (p.s. almost every engine I've used has had a mode like this for generating print-quality screenshots. One rendered at 15360 x 8640 for that mode biggrin.png).
 
Metallic just means spec-mask is ~>0.2 and may have 3 colour channels, and that albedo is black. Non-metallic is the opposite (albedo ~>0.04 && ~<0.9, and spec-mask greyscale and usually ~=0.03).
There's two common workflows:
Traditional: Give the artists a (coloured) spec-map and an albedo map. They can make real materials and unreal materials.
Metal-map: Give the artists a colour map and a (greyscale) metal map. Albedo is lerp(color, black, metal) and spec-mask is lerp(0.03, color, metal). This forces realistic materials.
(both of the above also have a roughness/glossiness/spec-power map)
 
The workflow that I chose in my current game is a mixture of both. The artists get a colour map and a spec-mask map.*
Metal (not used in the BRDF, just in the following two) is saturate( specMap*2-1 ),
*Spec-mask is lerp( saturate(specMap*2)*0.2, color, metal ),
*Albedo is lerp( color, black, metal )
i.e. spec values below 0.5 are remapped to 0-0.2 (dielectric range) and are greyscale, whereas spec values above 0.5 behave like a metal map.
 

The engine shouldn't force a particular choice here. If a new game wants a new workflow, they should be able to make those changes without having to edit the engine code.
 

Manual shader source management is out of the question. Even a custom tailored solution that only woks with exactly what I want to render and that shader is compiled for my setup only will have dozens of permutations, so generated seems to win. The 120 techniques do occupy 100 KiB of source code and take 10 second to compile under FXC, but precompiled loads very fast.
 
So my question  for more experienced folk: is this a good approach?

Yeah you can't hand-write them all. Either you stitch them together from code-fragments using a generator, or you write uber-shaders with compile-time branching (e.g. #ifdef), and then use a tool to compile all the permutations for you.
 
It's absolutely standard to pre-compile all your permutations and ship a large number of them.
...except in OpenGL-land, where there's no ahead-of-time compiler (in OpenGL, the compiler is implemented by your graphics driver, and varies vendor to vendor...).
 

Ironically, on modern hardware, you are likely going to shoot yourself in the food with this mindest. Switching shaders, from what I recall, can be almost as, if not even more, expensive than dynamic branching. Especially when, as in permutations, all branches will take the same path for all pixels of the same mesh.

That's an apples and oranges -- e.g. "getting into and out of your car is slower than walking to the shops" depends on how far you live from the shops wink.png
Switching shaders is a CPU-side API operation, so there'll be some CPU cost as you interact with the API, it validates your actions and generates a stream of actual commands for the GPU. It's also a GPU-front-end operation, where it will have to move the shader program into the L2 cache, synchronize the completion of earlier draw-calls and schedule the work of the new draw-calls that use this shader.
GPUs like to work on large data-sets at once -- if you switch shaders and then only draw 100 verts/pixels with each shader, then you'll likely get horrible performance. However, the driver/GPU can likely almost completely hide these GPU-side switching costs as long as you draw enough pixels. e.g. maybe if you draw 1000 pixels, then while they're processing, the GPU can be pre-fetching the next shader program, and there's enough individual pixels in flight to ensure that all the ALU units are busy without stalls...
You definitely want to minimize state-changes, but don't go overboard.
A dynamic branch on the other hand is a cost that you pay repeatedly for every pixel. The correct answer is an optimization problem, which as always is situation dependent, so can only be answered by profiling and experimenting with that particular situation. A good framework should allow you to experiment!
 

One disadvantage is that you can't use instancing, but you can't really use instancing with forward rendering only with simple lighting schemes.

You can definitely use instancing... You'll just need to get creative with how you send the lighting information to the shaders.
 

I have personally no idea how I want the final render setup to be. Having it parametrized and with live response to my changes and lighting conditions will allow me to determine the setup I want to use by trial and error.

I'd instead focus on being able to recompile and reload your shaders / models / textures / materials while the game is running. You'll be able to experiment with more things, quicker. It also helps in full production where all the artists can iterate on their work.
 

Material templates have a lot of properties and the list is getting only bigger, but here are the main properties for the ambient component

This stuff should not be hard-coded into the engine. Every game has different rendering requirements. If for every game you've got to go and edit the engine to remove unwanted parameters and add new ones into the fixed material class, then it's not very flexible. These templates should be derived from the shaders, and be able to be set automatically from data provided by the artists (e.g. from a collada file, etc, or your own material editor if you take that path).
 

Is my new model better? I think it is is. It is more flexible and customizable, while at the same time being low level and having no built in logic.

It seems to have a built in set of data channels though, which is just as bad; it restricts the kinds of logic that can be implemented.
 

There is more than one shader compiler out here? This is new to me ! Never used anything else than FXC in fx_2_0 mode. And I know about Cg.

He's talking about GL. In GL, every graphics driver has it's own compiler built-in, and you've got no choice but to ship your GLSL source code to the users (no pre-compiling). On these platforms, it's standard to run your GL code through a program that compiles it, optimizes it, and then decompiles it back into 'optimal' GLSL code... sad.png dry.png A terrible situation. One reason why GL isn't more popular right there!
 

And FX is slow as hell. I'm thinking of implementing ghetto constant buffers.
Here's the thing: FX under DirectX 10+ behaves weirdly and different enough from DirectX 9. I'm having massive problems with it. And I heard that from 11 on the FX framework is deprecated. I won't be using DirectX 9 forever. Even diehardness has an expiration date.
But from 10 on constant buffers are used. You can still use FX variables, but something weird is going on.
And maintaining a constant buffer version and a FX version is too much work and no fun.

The FX framework is very outdated, and a left-over from D3D9... In D3D11, they released the source code for it so you can keep using it if you like, or you can migrate away from it or customize it... Internally, it just makes a big "globals" cbuffer per shader, which is very inefficient. e.g. if 99% of the shader variables don't change, but 1% do, then the entire "globals" cbuffer has to be updated anyway.
You should definitely structure your renderer around the concept of cbuffers instead of individual shader variables if it's going to exist into the future past D3D9.
I've got a post here where I describe how I emulate cbuffers on D3D9 (which ended up being more efficient than using fx files on D3D9 for melaugh.png http://www.gamedev.net/topic/618167-emulating-cbuffers/
 
Some cbuffers are set by the game, dynamically, e.g. ones containing dynamic lights, or the camera, etc...
Others are set by the artists and don't change at runtime -- these ones I actually construct ahead of time and save in binary form, by inspecting the shader files for the cbuffer structure and the COLLADA models for the material values. The set of material values also aren't hard-coded -- if I add a new variable to one of my material cbuffers, then it shows up in the artists' model editing program (Maya/XSI/etc), and they can set it there and re-export their COLLADA file.



#10 DwarvesH   Members   -  Reputation: 471

Like
0Likes
Like

Posted 17 March 2014 - 08:39 AM

Wow, a lot to think about and implement.

 

I can't handle them all at once so let's go one by one.

 

Especially since I unceremoniously managed to reach exactly 2316 permutations. I have estimated that the total number of permutations will be at least 20844 if I go all inclusive. Which I wont.

 

 

For SSAA, have you benchmarked your shader-based super-sampling with just rendering at a higher resolution? If it's an "uber" detail setting, aimed at high-spec PC users only, they might both be equivalent. Or, if it's a "beyond uber" setting for generating screenshots for magazines, then performance doesn't matter and it may be better to implement the simplest solution (p.s. almost every engine I've used has had a mode like this for generating print-quality screenshots. One rendered at 15360 x 8640 for that mode ).

 

 

It is going to be an option for uber PCs. And it is not quite as expensive as I though it was going to be, but I barely started testing. From my limited testing if I create an ugly game it might even work on Intel HD bellow 720p smile.png.

 

I believe that the problem of geometry edges being aliased is more than solved nowadays, but anti-aliasing is far from done. I predict that the future holds either better temporal cross frame solutions than we have today or super-sampling. We now so much brute force at our disposal and pretty much the only super expensive player in the house is shadows.

 

But SSAA triples my number of permutations.

 

To solve this I have identified 3 common scenarios:

  1. You don't want SSAA in your game because of reasons.
  2. You want SSAA. The user selects a profile, which can be off or some setting. You only use one setting at a time.
  3. You want SSAA. But you want to go all fancy. The user selects a profile, which can be off or some setting. If it not off, you render things with that setting, while selectively renderings thing without SSAA, like distant objects. If you have an open world game, once the fog or DOF field kicks in the distance SSAA will provide minimal benefit so you may turn it off.

So generally you'll have only one setting loaded. sometimes two.

 

So I'll split up the shaders based on SSAA and only load the profiles which are needed. When user changes profile, old setting is discarded, new shader is loaded. The setting will be in the graphics options dialog so user are used to this taking a bit.

 

This way I'll have only 33% permutations to worry about. May even go with one SSAA off and a medium setting.

 

So the official number of permutations that I have right now is 772 (with SSAA adjustmets).

 

Another serious offender is HDR. 

 

I may or may not remove from the pixel shader. Removing will half the permutations.

 

There are two options for HDR:

  1. Post processing. The standard way. Advantages: easy adaptation. Disadvantage. More memory and floating point render target. Just switching to floating point can cause considerable performance loss.
  2. PS HDR. The last statement in the PS performs tone mapping. Advantages: I did extensive testing and with a well behaved render order depth wise you can barely measure the performance loss. Disadvantages: adaptation is difficult. Problematic with transparent objects.

I won't decide on this right now. Maybe some further input can sway me the right direction.

 

What I need to decide instead is to HDR or not to HDR:

  1. It seems that HDR is pretty important with the limited properties of current screens. And I do routinely get values over 1, especially with metals. So I'm guessing that HDR is a must?
  2. What kind of tone-mapping should I use? I hacked together a "filmic" one based on that uncharted 2 slideshow presentation and it seems to work pretty good. I have really nothing to compare it with.
  3. But it does make things darker. It also changes contrast and I'm at least 55% pro/45% against the darkening as an improvement of looks. Maybe 60%/40% smile.png. What I'm trying to say is that HDR needs a different lighting parametrization than non HDR. One setup just won't work. You can't switch HDR on or OFF and expect the same lighting scheme to work just as well, even if you don't get values over 1. 

So if I go HDR, I go all in. This won't change the shaders at all, but when I'm thinking of what light to put where and with what intensity and preparing IBL, I will always be in HDR mindset.

 

Does this sound sane?

 

And one more point:

  1. I mean 4. The forum won't let me reset the counter. This is about bloom and not strictly related to HDR. I switched over from wrongly doing bloom on linear space inputs when they were actually gamma and now they are correct and more powerful. And weird. not sure if they are correct. But I do get the feeling that my bloom is like 4 generations old. Is there a better more expensive way to do bloom as a post processor than scale down 1/4 with blocking out pixels below threshold, horizontal Gaussian blur, vertical Gaussian blur and combine?

Oh, and thank you all very much for the inputs. They are greatly appreciated!



#11 DwarvesH   Members   -  Reputation: 471

Like
0Likes
Like

Posted 18 March 2014 - 07:49 AM

OK, I have reached the conclusion that permutations works extremely well when you know what you want to render and need the appropriate shaders for that and only that. With my template system you'd need at most a few dozen permutations and you really don't need to generate or load any more. So no more hundreds/thousands of permutations.

 

But If you are writing an engine with a material editor that should support a  lot of options out of the box without any shader editing and intentionally limit the options to keep the permutations low, you still end up with thousands. And if you really don't limit and go wild, getting a theoretical number of over 500.000 is achievable.

 

So permutations are out of the question for the engine editor. 

 

I started experimenting with a new solution. I'm experimenting a lot right now before the engine gets too advanced because alter it is going to be much more difficult to change anything. I based my attempt on:

 


The FX framework is very outdated, and a left-over from D3D9... In D3D11, they released the source code for it so you can keep using it if you like, or you can migrate away from it or customize it... Internally, it just makes a big "globals" cbuffer per shader, which is very inefficient. e.g. if 99% of the shader variables don't change, but 1% do, then the entire "globals" cbuffer has to be updated anyway.
You should definitely structure your renderer around the concept of cbuffers instead of individual shader variables if it's going to exist into the future past D3D9.
I've got a post here where I describe how I emulate cbuffers on D3D9 (which ended up being more efficient than using fx files on D3D9 for me)  http://www.gamedev.net/topic/618167-emulating-cbuffers/

 

I had extremely limited experience with shaders without FX, but the DX10 constant buffer solution still seemed like a great idea. So I tried to implement something like you described there for DX9.

 

I had to overcome a few hurdles. For the first hour I just couldn't render anything in my hello world test. First time using raw shaders and I did not know about the matrix packing order discrepancy. FX and shaders use different conventions or so it seems.

 

The second obstacle was constant initialization. I guess that without the FX framework, something like "float gloss: register(c6) = 0.1;" will leave gloss as zero. And no more preshaders, but who needs them anyway :).

 

But after I fixed these I started rendering again. The solution is far from elegant. The "ghetto" constant buffer structure hopefully has compatibility with DX9 and 10, but I did have to add a lot of filler members to get the alignment right. Then there is the problem of SharpDX/C#: SetVertexShaderConstant and SetPixelShaderConstant really don't take that convenient parameters. I did not want to go with some marshaling or array solution, so I'm passing a pointer to these functions. The only overload that takes a pointer takes a *Matrix. So I'm making sure he structure is Matrix aligned and I'm casting a pointer to it to Matrix in unsafe code. Ugly...

 

The way I'm using this right now is based on the material templates. When you resolve one, this time a pixel shader only for it will be generated and compiled on the fly.

 

There are two scenarios:

  • In game. You will have several active material templates so compiling them is slowish but not that bad. The good news is that you can cache them. There is no need to compile every time because a template does not really need to change.
  • In editor mode, when editing the templates: now, this is problematic. I trimmed the shaders as low as I could, but the compiler is still very slow. The best case scenario is 30 ms, but some more complex templates take 300 ms. And these numbers are just going to go up.

I'll add multi-threaded resolve to editor mode so that your GUI is not frozen and a helpful message in the corner like "FXC is shiting itself compiling 20 lines of pixel shader that calls 2 functions. Please wait..." and leave it at that.

 

I need to continue my engine.

 

But someday I'll revisit this subject. I was thinking of either writing my own compiler (I'm actually good at compiler writing) or an easier solution: try binary adjustment and concatenation of permutation atomics. Computing a point light is always the same with only some input constant different, yet FXC is having problems and probably evaluates the same pixel shader function as many times as it finds it. I'm sure some clever trick can be used to speed up simple permutation based pixel shader compilation at least an order of magnitude.



#12 Jason Z   Crossbones+   -  Reputation: 5283

Like
0Likes
Like

Posted 18 March 2014 - 08:25 PM


So, what are you then? You have a huge rep...

I am a developer in the automotive industry, where I build data manipulation and visualization tools.  I'm also a co-author of the book in my signature below as well as contributing to a few others, and I have been fortunate enough to get the Microsoft MVP award in DirectX and more recently in C++.  But I have been around these forums for quite some time, and I really enjoy trying to help others learn graphics programming the same way I did!






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS