If anyone has any other strategies for managing shader parameters in D3D9 / OpenGL, please share them below as well!
[hr]
Part 0 - why?
D3D9 (without the Microsoft Effect framework, or the CGFX framework) only provides the interface of there being a global set of "constant registers". IMHO, the D3D10/11 abstraction of having a global set of "buffer registers" (each of which contains a number of constants) is not only easier to manage as a user of a graphics API, but also allows for much more efficient rendering systems.
Also IMHO, the GL abstraction of "shader instances" -- an object that contains a link to a program plus all of the constant values to be used with that program -- is the most inefficient abstraction to build an engine on top of, and is hard to manage due to different groups of constants being produced by different sources with different frequencies. The CBuffer abstraction solves all these issues elegantly.
Part 1 - describing your cbuffers.
If we're using SM2/3 on D3D9, then we don't have the cbuffer syntax available, so we need an alternate way to describe them when writing our shaders.
One option I've used professionally is to make your own shader language that cross-compiles into HLSL, which gives you the option to use whatever syntax you want -- this requries a large time-investment to get up and running though.
A simple option is to use a naming convention, and lots of manual register specification. Because I want to be able to set the values of an entire CBuffer with one call to Set*ShaderConstantF, we need all variables within a "cbuffer" to be in contiguous registers (hence the manual register allocation).
This option might lead you to write code that looks like below, and you can extract the cbuffer layouts by parsing the code.
//cbuffer material : register(b2) { float4 diffuse; float4 specular; }
float4 cb2_material_diffuse : register(c0);
float4 cb2_material_specular : register(c1);
The option that I personally prefer is to embed a small piece of Lua code at the beginning of the shader file, enclosed in a comment. All my content-pipeline tools are written in C#, which is dead-simple to integrate with Lua, thanks to LuaInterface./*[FX]
cbuffer( 2, 'Material', {
{ diffuse = float4 },
{ specular = float4 },
{ foo = float },
{ bar = float2 },
{ baz = float },
})
*/
...
Before compiling a shader file with FXC, a C# tool extracts and executes the above Lua code (which as well as describing CBuffers, also describes techniques/passes/permutation-options, which are used to determine how many times to run FXC, and with what arguments). After running the Lua code, the C# tool has a description of the desired CBuffer layouts, which it can translate back into HLSL to produce a new temporary shader file, such as the snippet below. One advantage of this approach is it allows you to implement modern CBuffer packing rules, as long as you're ok with the scoping violations caused by #defines:float4 diffuse : register(c0);
float4 specular : register(c1);
float4 _packed0_ : register(c2);
#define foo _packed0_.x
#define bar _packed0_.yz
#define baz _packed0_.w
#line 1 "D:\blah\test2.hlsl"
... original file contents here
Part 2 - determining used CBuffersOnce you've got the above temporary HLSL files, you'll compile them as many times as is required by your techniques/passes/permutations - not all of the resulting binaries will use every CBuffer.
Below is my function for calling FXC - note that it uses both the /Fo option and the /Fc option, to output a shader binary and a text file that we can parse to get some information about the binary. Sure, you can load up the binary and use the D3D API to reflect on it, but I found this simpler, and I like KISS. The resulting binary code is returned, and the text file is parsed to fill in the usage parameter.
public byte[] CompileShader(string inputFile, ShaderProfile profile, string entry, Dictionary<string, string> defines, ShaderUsage usage)
{
string exe = Path.Combine(m_project.engineDirectory, "tools/fxc.exe");
string args = "/nologo /O3 ";
args += String.Format("/E{0} ", entry);
switch (profile)
{
case ShaderProfile.Pixel: args += "/Tps_3_0 "; break;
case ShaderProfile.Vertex: args += "/Tvs_3_0 "; break;
default: /* error handling */;
}
foreach (var define in defines)
{
if (string.IsNullOrEmpty(define.Value))
args += String.Format("/D{0} ", define.Key);
else
args += String.Format("/D{0}={1} ", define.Key, define.Value);
}
string tempOutBin = Path.GetTempFileName();
string tempOutAsm = Path.GetTempFileName();
args += String.Format("/Fo\"{0}\" /Fc\"{1}\" \"{2}\"", tempOutBin, tempOutAsm, inputFile);
Process.Output output = Process.Run(exe, args);
foreach (var msg in output.stderr)
{/* error handling */}
byte[] code = File.ReadAllBytes(tempOutBin);
string[] info = File.ReadAllLines(tempOutAsm);
ParseFxcResults(info, usage);
File.Delete(tempOutBin);
File.Delete(tempOutAsm);
return code;
}
In the text file, you can search for "// Registers:" to find the section describing the register allocations actually used by the binary. I then use the Regex of://\s+(?<name>[^\s]+)\s+(?<reg>[^\s]+)\s+(?<size>[^\s]+)
(i.e. // some space, name=everything until next space, more space, reg=until next space, more space, size=until next space)
...to extract data from each line into the named-captures of name, reg and size. All you really need to do here, is collect all of the 'c' registers that are used and compare them against the cbuffer descriptions from earlier -- if the register lies in the range that you allocated for a cbuffer, then that cbuffer is used by this binary.
You can use this information to create a mask for your engine, so your engine knows which cbuffers need to be set prior to drawing something using this binary (to avoid useless shader parameter setting).
This allows to, for example, to have a "Transform" cbuffer, which is only referenced by your vertex-shader, and a "Material Colours" cbuffer, which is only referenced by your pixel shader -- The user of your API can naively bind their CBuffers to both the pixel and vertex shader cbuffer slots (i.e. 'b registers'), but your engine can avoid setting the transform data into the pixel constant registers, and avoid setting the material colours into the vertex constant registers.