IIRC in Stalker they use Material IDs to index a dot(n,l) texture. If you are referring to this value, in case of different shaders you are forced to run multiple passes hoping dynamic branches (and locality) will help you. That sounds like a performance killer, as no stenciling/zcull can help you discard pixels before executing the pixel shader, because the reference value is stored in the gbuffer. Do you expect an ubershader with huge switches to perform better or do you think a multipass approach would be desirable?
latest hardware seems to be able to do this fast enough ... I haven't implemented it but a friend at NVIDIA did ... obviously I prefer the Light Pre-Pass renderer design that does not offer this challenge. I wouldn't be so sure about Stalker. The article was written three years before the game shipped. I would expect a lot of changes in the meantime ... I think some of the people that wrote the article left the team also. The idea to build a material system with a 3D texture does not sound reasonable to me :-) With G80 and newer hardware, this seems to be an option. I assume it is the same for ATI hardware because they had fast branches since a while. You would have to try out implementing 2D texture arrays and this and see what is faster. Because you will need the 2D texture support anyway on lower end hardware you won't loose much time.
to keep saying they took a wrong path is useless and arrogant.
to make up for my arrogance, here are a few ideas on how to design a system like this. 1. It depends on your game. Whatever your game is about will define which kind of materials you will use. Whatever materials you use you want to implement different combinations of shaders. So I would ask myself first this question. 2. It depends on how you implement lights. You won't achieve many fully dynamic lights with a traditional renderer design. If you go with a deferred lighting model, you will mainly have three shaders for each of the light types and then probably a few one for things like shadows, postfx, reflections etc. and maybe a few changes to the main three shaders for the restricted amount of materials that a deferred lighting scheme can handle on DX9 hardware (if you have really powerful hardware you might think about an index value in the G-Buffer that references different shaders and switches per-screenspace pixel the material. With a Light Pre-Pass you will have your three light shaders for the pre-pass and then a variety of shaders for the main rendering path. The variety is restricted on the light properties you store in the Light Buffer. If you have really powerful hardware you can store an index and switch shaders per-screenspace pixel .... 3. You might also think about the granularity of such a system. If you implement Oren-Nayar, Ashikhmin-Shirley, Cook-Torrance and other lighting models you want to think about how to combine those. Will each of those lighting models represent diffuse or specular or will there be a finer granularity? 4. You might also think about different normal data quality levels, like tangent space normals, derivative normal maps, height maps for normal blending and how you let artists pick one of those while at the same time making sure that the source asset is really in place. 5. For PostFX, Shadows, reflections and other separable high-level graphics like a dynamic sky system you do not want to generate those shaders because you only need one for each of those and optimizing those is a tough task.
There will be a ShaderX7 article in which I will describe a slight improvement to Michal's approach. Michal picks the right shadow map with a rather cool trick. Mine is a bit different but it might be more efficient. So what I do to pick the right map is send down the sphere that is constructed for the light view frustum. I then check if the pixel is in the sphere. If it is I pick that shadow map, if it isn't I go to the next sphere. I also early out if it is not in a sphere by returning white. At first sight it does not look like a trick but if you think about the spheres lined up along the view frustum and the way they intersect, it is actually pretty efficient and fast. On my target platforms, especially on the one that Michal likes a lot, this makes a difference.
dropped the renderer design that they described in GPU Gems 3 last minute. Their main renderer was a Z pre-pass renderer and it was so much faster on their target hardware that they dropped their deferred idea.
3D volume texture
just try it at some point. Take a common graphics card that the target group of your game is using and throw a 3D texture on it :-) ... Stalker: what was described in articles and what has shipped in Stalker must not have been the same thing. They had a very well scalable renderer that switched on and off a lot of features ... obviously not rendering data in a MRT was one of them. Depending on the underlying hardware it just went back and forth. They had a very flexible renderer design that just did the right thing. Switching on and off features.
I am excited about this game. I love what those guys are doing and -as you probably know- I was "close" to the project for a while when we got support from some of those guys. Michal Valient is doing great things ...
The volume texture is just a lookup table... it's not a "texture" per se so you don't need high precision ("HDR") storage.
You want enough deltas in there to make lighting look good. Linearly interpolating the values does not give you enough variety. So you end up with a 3D volume texture that will be too big for the target hardware.
Let me repeat why the deferred renderer idea did not work out in games:
1. you build your renderer following a graphics hardware design not the other way around .. otherwise you loose performance. Most hardware was build with a Z pre-pass renderer in mind. Let's say INTEL releases Larabee, you will want to build the renderer in a way that it squeezes out the highest performance from Larabee. This will be very interesting because it might require a renderer design that is different from other GPUs and it won't be what we call now deferred or forward ... it will be different :-) 2. most currently avaiable graphics cards do not have enough bandwidth to run a deferred renderer ... this is also true for the PS3 :-) ... so you loose large parts of the market and it makes it rather unattractive to build games like this 3. MSAA on DX9 hardware is not really easy ... on console platforms neither ... costs additional cycles to get it done ... more expensive than with a Z pre-pass renderer 4. There is no way to implement a halfway decent material system with a deferred renderer. A decent character setup with a skin, hair, cloth, eyes, eye-lashes shader is just not possible, so your games will look much worse.
Please do not forget a NVIDIA 8800 GTS is probably 10 times faster than a 8200 or however they call their low-end model but makes up only for < 1% of the market and probably also your target market. Even high-end console graphics chips are much slower than this card ... about comparable to a NVIDIA 7600 GT with a 128-bit bus. So from a financial and a quality stand-point it is not a good idea to do this :-)
perspective shadow map approaches increase the depth aliasing errors. You want to use an orthographic projection and use cascaded shadow maps or any other multi-frustum approach. The amount of depth aliasing errors and the very view dependent quality of perspective shadow maps make them unusable in games with a time of day feature where you need full-screenshadows.