Emulating CBuffers

Started by
10 comments, last by Hodgman 12 years, 2 months ago
I was sent a PM asking about how I emulate CBuffers on D3D9, but unfortunately the message was lost in the great GDNet server crash of 2012, so I've forgotten the exact question, and it's probably of general interest anyway, so I'm replying to the PM with a post.

If anyone has any other strategies for managing shader parameters in D3D9 / OpenGL, please share them below as well!
[hr]
Part 0 - why?
D3D9 (without the Microsoft Effect framework, or the CGFX framework) only provides the interface of there being a global set of "constant registers". IMHO, the D3D10/11 abstraction of having a global set of "buffer registers" (each of which contains a number of constants) is not only easier to manage as a user of a graphics API, but also allows for much more efficient rendering systems.

Also IMHO, the GL abstraction of "shader instances" -- an object that contains a link to a program plus all of the constant values to be used with that program -- is the most inefficient abstraction to build an engine on top of, and is hard to manage due to different groups of constants being produced by different sources with different frequencies. The CBuffer abstraction solves all these issues elegantly.

Part 1 - describing your cbuffers.
If we're using SM2/3 on D3D9, then we don't have the cbuffer syntax available, so we need an alternate way to describe them when writing our shaders.
One option I've used professionally is to make your own shader language that cross-compiles into HLSL, which gives you the option to use whatever syntax you want -- this requries a large time-investment to get up and running though.

A simple option is to use a naming convention, and lots of manual register specification. Because I want to be able to set the values of an entire CBuffer with one call to Set*ShaderConstantF, we need all variables within a "cbuffer" to be in contiguous registers (hence the manual register allocation).
This option might lead you to write code that looks like below, and you can extract the cbuffer layouts by parsing the code.
//cbuffer material : register(b2) { float4 diffuse; float4 specular; }
float4 cb2_material_diffuse : register(c0);
float4 cb2_material_specular : register(c1);
The option that I personally prefer is to embed a small piece of Lua code at the beginning of the shader file, enclosed in a comment. All my content-pipeline tools are written in C#, which is dead-simple to integrate with Lua, thanks to LuaInterface.
/*[FX]
cbuffer( 2, 'Material', {
{ diffuse = float4 },
{ specular = float4 },
{ foo = float },
{ bar = float2 },
{ baz = float },
})
*/
...
Before compiling a shader file with FXC, a C# tool extracts and executes the above Lua code (which as well as describing CBuffers, also describes techniques/passes/permutation-options, which are used to determine how many times to run FXC, and with what arguments). After running the Lua code, the C# tool has a description of the desired CBuffer layouts, which it can translate back into HLSL to produce a new temporary shader file, such as the snippet below. One advantage of this approach is it allows you to implement modern CBuffer packing rules, as long as you're ok with the scoping violations caused by #defines:
float4 diffuse : register(c0);
float4 specular : register(c1);
float4 _packed0_ : register(c2);
#define foo _packed0_.x
#define bar _packed0_.yz
#define baz _packed0_.w
#line 1 "D:\blah\test2.hlsl"
... original file contents here
Part 2 - determining used CBuffers
Once you've got the above temporary HLSL files, you'll compile them as many times as is required by your techniques/passes/permutations - not all of the resulting binaries will use every CBuffer.

Below is my function for calling FXC - note that it uses both the /Fo option and the /Fc option, to output a shader binary and a text file that we can parse to get some information about the binary. Sure, you can load up the binary and use the D3D API to reflect on it, but I found this simpler, and I like KISS. The resulting binary code is returned, and the text file is parsed to fill in the usage parameter.
 public byte[] CompileShader(string inputFile, ShaderProfile profile, string entry, Dictionary<string, string> defines, ShaderUsage usage)
{
  string exe = Path.Combine(m_project.engineDirectory, "tools/fxc.exe");
  string args = "/nologo /O3 ";
  args += String.Format("/E{0} ", entry);
  switch (profile)
  {
    case ShaderProfile.Pixel: args += "/Tps_3_0 "; break;
    case ShaderProfile.Vertex: args += "/Tvs_3_0 "; break;
    default: /* error handling */;
  }
  foreach (var define in defines)
  {
    if (string.IsNullOrEmpty(define.Value))
      args += String.Format("/D{0} ", define.Key);
    else
      args += String.Format("/D{0}={1} ", define.Key, define.Value);
  }
  string tempOutBin = Path.GetTempFileName();
  string tempOutAsm = Path.GetTempFileName();
  args += String.Format("/Fo\"{0}\" /Fc\"{1}\" \"{2}\"", tempOutBin, tempOutAsm, inputFile);
  Process.Output output = Process.Run(exe, args);
  foreach (var msg in output.stderr)
  {/* error handling */}
  byte[] code = File.ReadAllBytes(tempOutBin);
  string[] info = File.ReadAllLines(tempOutAsm);
  ParseFxcResults(info, usage);
  File.Delete(tempOutBin);
  File.Delete(tempOutAsm);
  return code;
}
In the text file, you can search for "// Registers:" to find the section describing the register allocations actually used by the binary. I then use the Regex of:
//\s+(?<name>[^\s]+)\s+(?<reg>[^\s]+)\s+(?<size>[^\s]+)
(i.e. // some space, name=everything until next space, more space, reg=until next space, more space, size=until next space)
...to extract data from each line into the named-captures of name, reg and size. All you really need to do here, is collect all of the 'c' registers that are used and compare them against the cbuffer descriptions from earlier -- if the register lies in the range that you allocated for a cbuffer, then that cbuffer is used by this binary.
You can use this information to create a mask for your engine, so your engine knows which cbuffers need to be set prior to drawing something using this binary (to avoid useless shader parameter setting).

This allows to, for example, to have a "Transform" cbuffer, which is only referenced by your vertex-shader, and a "Material Colours" cbuffer, which is only referenced by your pixel shader -- The user of your API can naively bind their CBuffers to both the pixel and vertex shader cbuffer slots (i.e. 'b registers'), but your engine can avoid setting the transform data into the pixel constant registers, and avoid setting the material colours into the vertex constant registers.
Advertisement
We just to do constant buffer emulation back in the dark ages when we used D3D9. It was really simple...we just split up the constants by update frequency and kept track of what register each group started at. Then we'd author a corresponding C++ struct with matching parameters (and matching alignment), along with a templated class that would just set the constant data by passing a pointer to the struct to SetVertexShaderConstantF/SetPixelShaderConstantF at the proper offset. For per-draw constants we had multiple structs with the same register offset, and you'd just only use one in particular shader.

Thank god we switched to D3D11 and I don't have to deal with that crap anymore. smile.png
Thanks for taking the time to discuss this topic :)

I hadn't thought of using Lua to describe the FX section but I'll look into it. At the moment I was going on the path of having my own HLSL-like syntax which I would parse. This FX section would basically define all the constant buffers and techniques/contexts for the shader - very similar to Horde3D except with cbuffer support I suppose.

The main problem I'm having is the part where you talk about a mask so the engine knows what buffers to set before drawing. With the design above, it forces the cbuffers defined in the FX section to have unique names so that a user could look up a cbuffer for a particular shader type. Also, how does the mask fit in with your state group idea where you have cbuffer bind commands, are these commands necessary if you already have a mask to know what has to set?

This is an example of the structure/usage I was going for:

Effect:


[FX]
cbuffer cbPerObjectVS : register( b0 )
{
matrix g_mWorldViewProjection;
matrix g_mWorld;
};

cbuffer cbPerObjectPS : register( b0 )
{
float4 g_vObjectColor;
};


context SIMPLE
{
VertexShader = compile VS_SIMPLE;
PixelShader = compile FS_SIMPLE;
}


[VS_SIMPLE]

cbuffer cbPerObjectVS : register( b0 )
{
matrix g_mWorldViewProjection : packoffset( c0 );
matrix g_mWorld : packoffset( c4 );
};

...


[PS_SIMPLE]
cbuffer cbPerObjectPS : register( b0 )
{
float4 g_vObjectColor : packoffset( c0 );
};

...


Usage:


perObjectVSIndex = model->Effect()->FindCBuffer("cbPerObjectVS");
perObjectVSData = model->Effect()->CloneBuffer(perObjectVSIndex);


This would mean I'd need to have a lookup table to know what shader type (GS/VS/PS/etc) defines the CBuffer the user is asking for. In the case above it would map to the vertex shader in the effect. The CBuffer defined in the FX section is mainly just for creating the lookup table, the actual CBuffer layout is created in each shader type via reflection.

I'm not really sure this is the best way to go about it so I'm open to suggestions as I'm still in design/thinking stages :)
By the way, seeing as you use C# for your content pipeline, does that mean you only target windows for your engine or do you have bindings for other languages too?
With the design above, it forces the cbuffers defined in the FX section to have unique names so that a user could look up a cbuffer for a particular shader type.
Yeah at the moment, I force shader authors to give every cbuffer both a name and an ID.

Also, how does the mask fit in with your state group idea where you have cbuffer bind commands, are these commands necessary if you already have a mask to know what has to set?[/quote]The mask is separate from the binding commands.
After the user has issued a bunch of items to draw, it ends up as a list of state-changes (e.g. bind commands) and draw-calls. In the D3D9 renderer, a bind command simply writes some data into a cache and sets a dirty bit. When a draw-call is issued, this cbuffer cache is used to perform the actual [font=courier new,courier,monospace]Set*ShaderConstantF[/font] calls just prior to the [font=courier new,courier,monospace]Draw*Primitive[/font] call.

Before setting shader constants (before a draw-call), the renderer first has to select a shader permutation, which has one of these cbuffer masks. Only the cbuffers in the cache which are specified in this mask will be flushed through to D3D. Note that this is just a slight optimisation though, not a required feature.

There's a few reasons that the user may have issued a bind-cbuffer command that isn't actually needed --- perhaps the cbuffer is used by most permutations, but there's one permutation where it's not used. e.g. maybe your scene has a "flash of lightning" effect, where you simply disabling lighting calculations for that frame. The user might bind a light cbuffer regardless, but then some other layered render-state causes a "no lighting" permutation to be selected. In this situation, this allows the author of the "lighting" effect to quickly implement their idea without changing any of the "calculate/bind lighting cbuffers" code.

Another possibility is that you've got some global cbuffers, which you simply always bind out of convenience. e.g. maybe you define cbuffer #12 as holding fog/atmosphere settings, and bind this data to this slot by default (unless a particular renderable overrides that state with it's own binding). In this case, every draw-call would have something bound to cbuffer slot #12, but there might be some shaders/permutations that ignore fog, and hence don't need that buffer to be bound.

The CBuffer defined in the FX section is mainly just for creating the lookup table, the actual CBuffer layout is created in each shader type via reflection.
I'm not really sure this is the best way to go about it so I'm open to suggestions as I'm still in design/thinking stages [/quote]My only objection to that format is that you're repeating yourself (declaring cbuffers twice), which adds an extra place where mistakes can be made.

Would it be possibe to automatically generate one of these sets of information from the other? (e.g. generate the HLSL variables from the FX, or generate the FX by parsing/reflecting the HLSL?).

perObjectVSIndex = model->Effect()->FindCBuffer("cbPerObjectVS");
perObjectVSData = model->Effect()->CloneBuffer(perObjectVSIndex);[/quote]I've got an API like this, but for many cases I don't have to use it. For most engine-provided data, I can instead do:
[font=courier new,courier,monospace]perObjectVSData = CBuffer::Create( sizeof(StructThatIPromiseMatchesMyHLSL) );[/font]

And for debugging:
[font=courier new,courier,monospace]CBufferInfo* cb = model->Effect()->FindCBuffer("cbPerObjectVS");[/font]
[font=courier new,courier,monospace]ASSERT( cb->SizeInBytes() == sizeof(StructThatIPromiseMatchesMyHLSL) );[/font]
[font=courier new,courier,monospace]ASSERT( OFFSETOF(StructThatIPromiseMatchesMyHLSL::foo) == cb->OffsetOf("foo") );[/font]

By the way, seeing as you use C# for your content pipeline, does that mean you only target windows for your engine or do you have bindings for other languages too?[/quote]The engine is multi-platform, but the content tools are only designed to work on a Windows PC, because that's what we use for development ;)

There's a few options for integrating your windows-only content tools with your cross-platform engine that I've personally used:
1) Have some kind of "editor" build of the engine, which does more stuff than any of the single-platform builds (e.g. contains multiple different platform-specific data structures - so you can serialize your data for each different platform).
2) Link your content tools against your Windows build of your engine, and use formats that serialize identically for all platforms.
3) Don't directly link your content tools to your engine at all. Instead, define a specification for the input/output data formats, and implement a that spec once in the tools (as a producer) and once in the engine (as a consumer).

I currently use method #3 -- The C# tools uses easy-to-use-but-bloated data structures internally, and have hand-written binary serialisation code for outputting data to the engine. The C++ engine loads these data files into byte-arrays and casts them to hand-written structs that match the expected data layouts.

've got an API like this, but for many cases I don't have to use it. For most engine-provided data, I can instead do:
perObjectVSData = CBuffer::Create( sizeof(StructThatIPromiseMatchesMyHLSL) );

And for debugging:
CBufferInfo* cb = model->Effect()->FindCBuffer("cbPerObjectVS");
ASSERT( cb->SizeInBytes() == sizeof(StructThatIPromiseMatchesMyHLSL) );
ASSERT( OFFSETOF(StructThatIPromiseMatchesMyHLSL::foo) == cb->OffsetOf("foo") );


In my engine setup, all C structures that are required to match against HLSL cbuffers are compile-time forced provide a decl function for matching the alignments.
I have a templated subclass of the buffer object that turns CBuffer::Create( sizeof(StructThatIPromiseMatchesMyHLSL) ); into TCBuffer<structthatipromisematchesmyhlsl>, which when compiled under paranoia flags performs validation.

FWIW, I've taken the same approach to declaring structures in this way for vertex stream formats and pixel shader outputs.
The pixel shader output being defined in said way, allows my shader compiler to split a pixel shader with say, 7 float4 outputs into 2 seperate shaders for platforms that only support 4 targets under MRT.
The vertex stream format decl was helpful in getting hassle-free instancing up and running. The Transform CBuffer struct can be easily treated as a secondary vertex stream for DrawInstanced calls.</structthatipromisematchesmyhlsl>

My only objection to that format is that you're repeating yourself (declaring cbuffers twice), which adds an extra place where mistakes can be made.

Would it be possibe to automatically generate one of these sets of information from the other? (e.g. generate the HLSL variables from the FX, or generate the FX by parsing/reflecting the HLSL?).


I can understand your point of view but in both cases I see some potential problems, but maybe you have a way around those. If I specify the CBuffers in the FX, how do you know what shader type and permutation actually require that particular buffer? Or, if you specify them only in the shader code, how can you ensure there will always be the ability to do reflection? I'm only aware of DX having a reflection API for HLSL, is there an equivalent for GLSL?

Having a custom FX section allows me to have non platform specific meta data for uniforms, samplers, cbuffers, etc which would be useful in content generation tools (I don't think GLSL has semantic/annotation data?). In saying this, I'd probably have to go with the generating HLSL from FX section path as I can leave the generation of the shader code up to the low-level rendering backend, I simply pass it a cbuffer object created from the FX section and it'll return me the shader language representation - but as I mentioned above I'm not sure how to specify which shader type and in what permutation actually requires a particular CBuffer so it can be inserted into the shader code and compiled. I'd probably need a more complex FX section which specifies that link/mask which is what I gather your Lua FX section does?


[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

The engine is multi-platform, but the content tools are only designed to work on a Windows PC, because that's what we use for development ;)

[/font]
[/quote]

I'm attempting to go on the path of the engine API itself being used for any of the content tools. That way I only have to write things once and it can be used on any platform. Obviously creating the models and art would be done on whatever machine artists use, but there would be exporters for the main tools like 3dsmax and Blender.


[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

When a draw-call is issued, this cbuffer cache is used to perform the actual

[/font][color=#282828]

Set*ShaderConstantF

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

calls just prior to the

[/font][color=#282828]

Draw*Primitive

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

call.

[/font]
[/quote]

This sort of stems from the same issue I described earlier, but in your system how do you determine what shader type a CBuffer is linked to so you can know what [color=#282828]Set*ShaderConstantF function to call?

Thanks appreciate your help!

I can understand your point of view but in both cases I see some potential problems, but maybe you have a way around those.
(1) If I specify the CBuffers in the FX, how do you know what shader type and permutation actually require that particular buffer?
(2) Or, if you specify them only in the shader code, how can you ensure there will always be the ability to do reflection?
Yeah, being explicit (repeating yourself) can sometimes be a good thing. For example, it's common to have some variables that only show up when in an editor like Maya, but aren't "officially" part of the shader, so you could omit them from the [FX] part.
For (1), seeing you're writing your own syntax here, you could add a property to your FX description that explicitly states this, e.g.
[font=courier new,courier,monospace]cbuffer cbPerObjectVS : register( vs_b0 ){ ... };[/font]

For (2), you can write a full-blown HLSL parser yourself, and convert your HLSL source into an abstract-syntax-tree, which you can reflect over yourself. That's of course quite a bit more complex than the alternative of producing HLSL from your 'reflection' format though ;)
This is what we do at work, and the high investment cost to write your own parser/translater pays off by allowing you to use any kind of custom syntax that you like, to translate your code into multiple output languages, and to do things like converting [font=courier new,courier,monospace]if[/font]/[font=courier new,courier,monospace]for[/font] statements into permutations, etc...

I'm not sure how to specify which shader type and in what permutation actually requires a particular CBuffer so it can be inserted into the shader code and compiled. I'd probably need a more complex FX section which specifies that link/mask which is what I gather your Lua FX section does?[/quote]At the moment, my shader format at home is still fairly primitive, so I output all [fx] cbuffers into both the VS and PS. This has the effect of artificially limiting the number of constant registers available to me -- e.g. if a vertex shader cbuffer is bound to c0-c200, then those registers are unusable for ps variables for no good reason.
I'll probably remedy this by adding an explicit description like (1) above, where the [fx] section can say which shader-types should include the cbuffer (but still output the cbuffer for every permutation, for simplicity's sake)
This sort of stems from the same issue I described earlier, but in your system how do you determine what shader type a CBuffer is linked to so you can know what [color=#282828]Set*ShaderConstantF function to call?[/quote]There's three different situations here:
1) The engine/game code is binding a known cbuffer structure -- the person writing that code can hard-code either a SetPsCBuffer/SetVsCBuffer/etc command, because they know which shader they want to set the data to.
2) The content tools are compiling an artist's material -- all shader uniforms (which have been annotated with a "should appear in DCC GUI"-type tag) are available to be set by the artists when they author a model/material. When their models are imported, the content tools will take these uniform values, and search through every cbuffer (for every shader type) for a variable who's name matches.
When a match is found, an instance of that cbuffer is instantiated (or the existing instance grabbed) and the default value is overwritten with the artist's value.
If that name only shows up in a PS cbuffer, then only a PS cbuffer will be instantiated, and only a PS binding command generated.
3) The engine/game code is binding some cbuffer values, but the structure of the cbuffer isn't hard-coded. In this case, they can use the reflection API to iterate every cbuffer for each shader type, and find which cbuffers (and shader types) contain the variable they're trying to set, and then create the appropriate buffer instances and binding commands. If this is something that was going to happen every frame, you'd instantiate the relevant cbuffers/binding commands once, and store the relevant offsets into those cbuffers where your dynamic data will be written. N.B. in most cases, use-case #1 can be used instead of this #3 use-case.

For (1), seeing you're writing your own syntax here, you could add a property to your FX description that explicitly states this, e.g.
cbuffer cbPerObjectVS : register( vs_b0 ){ ... };


I think I'll definitely go with that approach. It seems pretty elegant and flexible and like you mentioned you can easily hide non "official" shader parameters from content tools simply by not putting it in the FX section. I also think I was over thinking some of this and outputting the cbuffers to every permutation would probably suffice :)


1) The engine/game code is binding a known cbuffer structure -- the person writing that code can hard-code either a SetPsCBuffer/SetVsCBuffer/etc command, because they know which shader they want to set the data to.
[/quote]

I hadn't actually thought of having separate bind commands for each shader type but it makes sense to do that. Do you think there is a need to actually flag what shader type a cbuffer is meant for, or would a generic cbuffer object suffice? Off the top of my head I can't think of a need why you might want to check if it's a "VS cbuffer" or a "PS cbuffer" etc.
Do you think there is a need to actually flag what shader type a cbuffer is meant for, or would a generic cbuffer object suffice? Off the top of my head I can't think of a need why you might want to check if it's a "VS cbuffer" or a "PS cbuffer" etc.
I don't think there's a need -- my C++-side cbuffer instance structures are very minimal -- they're basically just an array of bytes. They don't even know what their "name" is, what their layout is (i.e. no link to a reflection structure), or which shader/technique they were originally created for.

This topic is closed to new replies.

Advertisement