Matching shaders with constants, geometry, etc

Started by
4 comments, last by Hodgman 11 years, 2 months ago

Hi,

I'm looking for fast and flexible ways to handle the following problems:

-How to match different types of geometry with the appropriate shader input layout?

-How to update shader constants and pass them correctly to the shader since they need to be stored in a specific constant buffer at the currect position?

Currently I don't do any check regarding input layouts during run-time (loading of assets). The correct shader is chosen manually at asset creation time, is this "ok"?

An idea I have is to write a function to check model data/shader input layout compatibility and integrate it in the model utility (a tool to assign materials to each mesh subset etc), this will be done at asset creation time.

For shader constants:

I allocate chunks of data the size of each constant buffer used by the shader of that material. And store a table of pairs (constant name, pointer to the data).

This is quite fast since I have a number of pointers to POD equal to the number of constant buffers, so to update the constant buffers I just use those pointers.

To change the value of constants I use the table to find the pointer of the constant I want to update. Maybe this is how the D3D Effects framework handles it.

Any suggestions to improve performance/flexibility? How do you handle this?

Thanks.

Advertisement

For mesh data <-> shader inputs I take the fairly naive approach of mandating specific names for known vertex streams ('position', 'normal', 'tangent', etc).

Each mesh is stored as a map of names to data blocks, and at runtime I compare those stream names to the active attributes in the shader, and bind them appropriately.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

The correct shader is chosen manually at asset creation time, is this "ok"?

Often a bit of geometry needs to be drawn by more than one shader -- e.g. the depth-only pass when creating a shadow map, and the forward-rendering, or deferred-GBuffer pass.


At asset creation time, I create lists of vertex structures (the input structures to vertex shaders), and stream layout structures (the way that vertices will be stored in memory) in a Lua config file. Shaders specify their vertex input structure from this file.
This Lua file also contains a series of 'connections', specifying which stream-layouts are allowed to be used with which vertex-structures. I could probably do that automatically by comparing semantics, etc, but I prefer to keep it as a manual step for now... Each of these 'connections' basically represents one D3D InputLayout at runtime.

When compiling a bit of geometry, I look at which shader file (not a singular shader program, a collection of shader techniques, like an fx file) the artist has assigned to it, and then collect the set of vertex-structures that those techniques use.
Then, I find the set of stream-layouts that are compatible with all of these vertex-structures (if a stream-layout is 'connected' with one of the vertex-structures in the above set, but not all of them, it's excluded from the next step).
Then, I compare each of those stream-layouts to the actual attributes that have been exported by the artist, to see if their geometry can be saved in this stream-layout (e.g. if the stream-layout requires normals, did the artist export normals?), and as soon as one is found, I import the mesh using that stream-layout.
It's possible for this process to fail at several stages (e.g. producing empty sets, or finding you're missing attributes -- when this happens the artist simply gets an error and they have to fix their export).

Then when drawing something at runtime, I look at the current geometry to see what it's stream-layout is, and from this I get a table of possible InputLayout objects. I then iterate this table, finding the one that matches the current shader's vertex-structure.
e.g. inputLayout = m_inputLayoutPool[geo.streamLayoutId][shader.vertexStructureId]
The contents of this pool is genereated by compiling the previously mentioned Lua data file.

And store a table of pairs (constant name, pointer to the data).
This is quite fast since I have a number of pointers to POD equal to the number of constant buffers, so to update the constant buffers I just use those pointers.
To change the value of constants I use the table to find the pointer of the constant I want to update.

I do the same thing. However, you can usually categorize cbuffer updates into one of two categories -- (1) you're passing some engine structure to the shader code (e.g. a camera) or (2) an artist or gameplay programmer wants to set some property by name.


The above system works for either, but is especially useful for #2. To keep #1 simple, I often just ensure that I have a C++ structure with the exact same layout as the HLSL cbuffer, and then map the whole thing, casting it to the C++ type. The engine can then just write to that type as usual.

You definitely want to at least check the return code of CreateInputLayout and report any failures. This will tell you if you messed up, which usually happens if you try to use a vertex buffer that doesn't have an element expected by a vertex shader. If you want to take it further it's pretty easy to use the reflection interface to compare your input layout elements to the input signature of a vertex shader and determine which element is missing using the string names of the semantics.

When compiling a bit of geometry, I look at which shader file (not a singular shader program, a collection of shader techniques, like an fx file) the artist has assigned to it, and then collect the set of vertex-structures that those techniques use.
Then, I find the set of stream-layouts that are compatible with all of these vertex-structures (if a stream-layout is 'connected' with one of the vertex-structures in the above set, but not all of them, it's excluded from the next step).
Then, I compare each of those stream-layouts to the actual attributes that have been exported by the artist, to see if their geometry can be saved in this stream-layout (e.g. if the stream-layout requires normals, did the artist export normals?), and as soon as one is found, I import the mesh using that stream-layout.
It's possible for this process to fail at several stages (e.g. producing empty sets, or finding you're missing attributes -- when this happens the artist simply gets an error and they have to fix their export).

What about storing vertex attributes in different vertex buffers? Example: Position in one buffer, and normal/tangent/texture coordinates in another, since shadow mapping only needs the position.

I've read in this forums that there's an overhead associated with splitting vertex attributes in multiple vertex buffers that cancels the speed-up in shadow mapping.

I guess it depends on the project...

For an example of splitting the position into another stream, my Lua config would look like:


StreamFormat("standardStream",
{
    {
        { Float, 3, Position },
    },
    {
        { Float, 3, Normal },
        { Float, 3, Tangent },
        { Float, 2, TexCoord, 0 },
    },
})

VertexFormat("standardVertex",
{
    { "position", float3, Position },
    { "texcoord", float2, TexCoord, 0 },
    { "normal",   float3, Normal },
    { "tangent",  float3, Tangent },
})

InputLayout( "standardStream", "standardVertex" )

Off-topic, but keep in mind that this 2nd stream can exist in the same vertex-buffer at a certain offset, or in another vertex-buffer.

Yeah, this is an optimization problem, so it depends on the project/platform/etc...
Back in the Dx9 days, Microsoft's advice was to try and use per-stream vertex structures with a size of 32-bytes, because it was common for vertex-shaders to only have an instruction to read from VB's in 32B sized chunks, at 32B aligned offsets.
In the above example, Stream#0 (position) is 12 bytes and Stream#1 (the rest) is 32 bytes. This would mean that on the above type of hardware, the vertex shader would always issue exactly one "Load 32 Bytes" instruction to read in each vertex's data from Stream#1, and either 1 or 2 instructions to read in the data from Stream#0, depending on whether the data crosses a 32-byte-aligned boundary or not.

If instead, a vertex just had a F32*3 position and a F32*2 UV, that's a total of 20 bytes. If they were both in one stream, then the vertex-shader will have to issue 1, or maybe 2 "Load 32 Bytes" instructions (depending on alignment). If the data was split into two streams (one for pos, one for UV), then the VS would need between 2 and 4 load instructions.

So to begin with, you had the problem of optimizing your vertex format for just a single vertex-shader! If you then take multiple vertex-shaders into account, you've got a whole lot of options to weigh up.

It may even be more efficient to have a VB containing pos+UV+normal+etc, and a second VB containing just positions (a 2nd copy of them)!

I've no idea how out-dated that rule-of-thumb is though. I'm guessing modern GPU's are more flexible and that the vertex-fetching hardware is now merged with the texel-fetching hardware...

This topic is closed to new replies.

Advertisement