The only problem is the number of permutations. With this limited setup there are 120 different techniques. I can easily see this getting over 1000. They are autogenerated so not a big problem, but I was wondering how others do this.
Regarding the problem in the OP - the permutations should be based on the needs of a particular game, not every possible game that could be made with the engine.
Your permutation system should be flexible and data-driven, so that you can add/remove shader options quickly and easily.
For one game, you'll probably pick one ambient lighting solution (e.g. SH diffuse and IBL specular for everything), one AO solution, etc...
Or if you've got multiple AO solutions, they don't need to result in multiple permutations. e.g. on the last game I worked on, we rendered stencil-shadows, pre-baked shadows, directionally-traced "SSAO" shadows and shadow-mapped shadows all into a shared 720p screen-space buffer. The forward rendering shaders then just took a single texture sample from this buffer to determine their shadow & occlusion data. We could mix and match techniques and iterate on ideas over the project without having to touch the forward lighting shader!
For SSAA, have you benchmarked your shader-based super-sampling with just rendering at a higher resolution? If it's an "uber" detail setting, aimed at high-spec PC users only, they might both be equivalent. Or, if it's a "beyond uber" setting for generating screenshots for magazines, then performance doesn't matter and it may be better to implement the simplest solution (p.s. almost every engine I've used has had a mode like this for generating print-quality screenshots. One rendered at 15360 x 8640 for that mode ).
Metallic just means spec-mask is ~>0.2 and may have 3 colour channels, and that albedo is black. Non-metallic is the opposite (albedo ~>0.04 && ~<0.9, and spec-mask greyscale and usually ~=0.03).
There's two common workflows:
Traditional: Give the artists a (coloured) spec-map and an albedo map. They can make real materials and unreal materials.
Metal-map: Give the artists a colour map and a (greyscale) metal map. Albedo is lerp(color, black, metal) and spec-mask is lerp(0.03, color, metal). This forces realistic materials.
(both of the above also have a roughness/glossiness/spec-power map)
The workflow that I chose in my current game is a mixture of both. The artists get a colour map and a spec-mask map.*
Metal (not used in the BRDF, just in the following two) is saturate( specMap*2-1 ),
*Spec-mask is lerp( saturate(specMap*2)*0.2, color, metal ),
*Albedo is lerp( color, black, metal )
i.e. spec values below 0.5 are remapped to 0-0.2 (dielectric range) and are greyscale, whereas spec values above 0.5 behave like a metal map.
The engine shouldn't force a particular choice here. If a new game wants a new workflow, they should be able to make those changes without having to edit the engine code.
Manual shader source management is out of the question. Even a custom tailored solution that only woks with exactly what I want to render and that shader is compiled for my setup only will have dozens of permutations, so generated seems to win. The 120 techniques do occupy 100 KiB of source code and take 10 second to compile under FXC, but precompiled loads very fast.
So my question for more experienced folk: is this a good approach?
Yeah you can't hand-write them all. Either you stitch them together from code-fragments using a generator, or you write uber-shaders with compile-time branching (e.g. #ifdef), and then use a tool to compile all the permutations for you.
It's absolutely standard to pre-compile all your permutations and ship a large number of them.
...except in OpenGL-land, where there's no ahead-of-time compiler (in OpenGL, the compiler is implemented by your graphics driver, and varies vendor to vendor...).
Ironically, on modern hardware, you are likely going to shoot yourself in the food with this mindest. Switching shaders, from what I recall, can be almost as, if not even more, expensive than dynamic branching. Especially when, as in permutations, all branches will take the same path for all pixels of the same mesh.
That's an apples and oranges -- e.g. "getting into and out of your car is slower than walking to the shops" depends on how far you live from the shops
Switching shaders is a CPU-side API operation, so there'll be some CPU cost as you interact with the API, it validates your actions and generates a stream of actual commands for the GPU. It's also a GPU-front-end operation, where it will have to move the shader program into the L2 cache, synchronize the completion of earlier draw-calls and schedule the work of the new draw-calls that use this shader.
GPUs like to work on large data-sets at once -- if you switch shaders and then only draw 100 verts/pixels with each shader, then you'll likely get horrible performance. However, the driver/GPU can likely almost completely hide these GPU-side switching costs as long as you draw enough pixels. e.g. maybe if you draw 1000 pixels, then while they're processing, the GPU can be pre-fetching the next shader program, and there's enough individual pixels in flight to ensure that all the ALU units are busy without stalls...
You definitely want to minimize state-changes, but don't go overboard.
A dynamic branch on the other hand is a cost that you pay repeatedly for every pixel. The correct answer is an optimization problem, which as always is situation dependent, so can only be answered by profiling and experimenting with that particular situation. A good framework should allow you to experiment!
One disadvantage is that you can't use instancing, but you can't really use instancing with forward rendering only with simple lighting schemes.
You can definitely use instancing... You'll just need to get creative with how you send the lighting information to the shaders.
I have personally no idea how I want the final render setup to be. Having it parametrized and with live response to my changes and lighting conditions will allow me to determine the setup I want to use by trial and error.
I'd instead focus on being able to recompile and reload your shaders / models / textures / materials while the game is running. You'll be able to experiment with more things, quicker. It also helps in full production where all the artists can iterate on their work.
Material templates have a lot of properties and the list is getting only bigger, but here are the main properties for the ambient component
This stuff should not be hard-coded into the engine. Every game has different rendering requirements. If for every game you've got to go and edit the engine to remove unwanted parameters and add new ones into the fixed material class, then it's not very flexible. These templates should be derived from the shaders, and be able to be set automatically from data provided by the artists (e.g. from a collada file, etc, or your own material editor if you take that path).
Is my new model better? I think it is is. It is more flexible and customizable, while at the same time being low level and having no built in logic.
It seems to have a built in set of data channels though, which is just as bad; it restricts the kinds of logic that can be implemented.
There is more than one shader compiler out here? This is new to me ! Never used anything else than FXC in fx_2_0 mode. And I know about Cg.
He's talking about GL. In GL, every graphics driver has it's own compiler built-in, and you've got no choice but to ship your GLSL source code to the users (no pre-compiling). On these platforms, it's standard to run your GL code through a program that compiles it, optimizes it, and then decompiles it back into 'optimal' GLSL code... A terrible situation. One reason why GL isn't more popular right there!
And FX is slow as hell. I'm thinking of implementing ghetto constant buffers.
Here's the thing: FX under DirectX 10+ behaves weirdly and different enough from DirectX 9. I'm having massive problems with it. And I heard that from 11 on the FX framework is deprecated. I won't be using DirectX 9 forever. Even diehardness has an expiration date.
But from 10 on constant buffers are used. You can still use FX variables, but something weird is going on.
And maintaining a constant buffer version and a FX version is too much work and no fun.
The FX framework is very outdated, and a left-over from D3D9... In D3D11, they released the source code for it so you can keep using it if you like, or you can migrate away from it or customize it... Internally, it just makes a big "globals" cbuffer per shader, which is very inefficient. e.g. if 99% of the shader variables don't change, but 1% do, then the entire "globals" cbuffer has to be updated anyway.
You should definitely structure your renderer around the concept of cbuffers instead of individual shader variables if it's going to exist into the future past D3D9.
I've got a post here where I describe how I emulate cbuffers on D3D9 (which ended up being more efficient than using fx files on D3D9 for me) http://www.gamedev.net/topic/618167-emulating-cbuffers/
Some cbuffers are set by the game, dynamically, e.g. ones containing dynamic lights, or the camera, etc...
Others are set by the artists and don't change at runtime -- these ones I actually construct ahead of time and save in binary form, by inspecting the shader files for the cbuffer structure and the COLLADA models for the material values. The set of material values also aren't hard-coded -- if I add a new variable to one of my material cbuffers, then it shows up in the artists' model editing program (Maya/XSI/etc), and they can set it there and re-export their COLLADA file.