Designing efficient Material/Shader system

Started by
9 comments, last by Anfaenger 11 years, 8 months ago
I'm having trouble designing a new material system for my engine.

My primary goals are:

0) Minimal CPU overhead and memory footprint

1) I'd like to avoid setting shader parameters (uniforms) one by one
(as in 'shader scripts' approach, 'DirectX effects - style': query location by variable name, need to store metadata about uniforms, etc.).
but that means creating custom structs with the matching layout and 'hardcoding' shaders which is bad (?).

2) Improve efficiency via multithreading (issue graphics API calls in a separate 'render' thread, render commands buffering).
I intend on using two command buffers, there'll be a special UpdateConstantBuffer command,
and i'll need to copy entire constant/uniform buffer contents (each shader parameter) into the command buffer (?).
Seems like a lot of data to memcpy() around...

What are your ideas for designing a powerful, flexible and efficient graphics material system ?

Currently, i have a C++ wrapper class for each shader (generated with a tool, map CB, cast to struct, set params, unmap -> no metadata needed),
and several material classes (Phong, Plain, SSS) which i find cumbersome to use.
i'd rather have one material class, but then i don't know how to set proper shaders,
because material will be generic and shaders are hardcoded.
Advertisement
My actual solution was to make a new shading language like Cg that spits out either GLSL or HLSL.
This means I am parsing the entire shader grammatically and can thus extract all the information I want from it, including uniforms/constants, etc.

Shaders generate a copy of the uniform data for themselves to perform redundancy checking for each individual uniform in the case of DirectX 9 or OpenGL, and for DirectX 11 constant buffers are generated (and shared between shaders) (and I recommend 4 of them, not 2) and only remapped when something inside them changes (and each variable inside them is redundancy checked if the buffer’s dirty flag is not already set (the dirty flag is meant to avoid remapping so if it is already set it is faster to just copy the incoming data over the old even if it is redundant)).

In practice this is extremely efficient, but you probably don’t want to make your own language.

You don’have to go that far but you can still utilize the efficient parts.


Firstly, you do need a system in place that allows you to reduce redundancies as much as possible. That means a local copy of the uniform/constant data, hardcoded if necessary.

Secondly whatever you make needs to be as flexible as possible. You are going to hate yourself if new shaders are not easy to add, even if they are hardcoded (against which I also recommend unless you are making an in-house engine for a major studio).


In regards to your #2, multi-threaded rendering is typically a good idea. You don’t need to worry about having a lot of data to ::memcpy() around because it is on another thread.
You also don’t need to do anything special related to updating command buffers. They will use the same system as everything else.


Ultimately, I recommend putting the redundancy-checking system at the lowest level—a wrapper around the API you plan to use.
Then above that use your rendering thread to issue API commands through the wrappers.
This is the best and most stable way to get the best of both worlds. Redundancy checking is also important for more than just shader uniforms/constants; it should be applied to all render states.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Thank you, L. Spiro, for your reply, this is an excellent solution!


Though it surely takes a very long time to implement properly.



1. Do you also perform sematic checking (otherwise, when you translate into HLSL -> line numbers differ -> error messages useless) in your translator?


2. What parser generator did you use (or rolled your own hand-written recursive decent) ?

and i'm sure, there's a plethora of optimizations that can be done for generating the most efficient C++ wrapper code.


(i decided to use Direct3D 11 deferred contexts instead of writing a command buffer class,
so my low-level wrappers will work both with the immediate context and deferred contexts on other threads.)



3. Is it fine to create a separate constant buffer for each shader class if needed?
e.g. i have a few global, shared CBs, and i create new CBs for each shader (if it declares its own CB(s)).

4. Is it better to use a universal, generic material class instead of separate CWaterMaterial, CTerrainMaterial, CSkinMaterial ?
And how should material system talk to the low-level shader system? Through hardcoded shader variable semantics (e.g. WORLD_MTX, VIEW_POS, DIFFUSE_COLOR) ?

1. Do you also perform sematic checking (otherwise, when you translate into HLSL -> line numbers differ -> error messages useless) in your translator?

Somewhat. I let the HLSL or GLSL compilers catch things such as undeclared identifiers etc. I only add my own checks when they are related to my own shader-language rules.
You don’t need to worry about mismatching lines in error print-outs. Use #line before every statement, if/else/for/while/do/etc.
You can see actual translated code in the first post of this topic. Click the Spoiler button.
My language also eliminates unreferenced functions and constants, or else my constant buffers would be huge. Eliminated functions is the reason for so many redundant #line directives.

You should be aware that you also must perform post-processing manually and so must also write your own preprocessor.
Think about this case:

#define DEF_CONST( NUMBER ) Texture2D g_sTex ## NUMBER:register(t ## NUMBER)
DEF_CONST( 0 );
DEF_CONST( 7 );


Your language needs to know that g_sTex0 and g_sTex7 exist, and the only way to do that is to perform preprocessing. I use a second parser for that. Don’t even think of trying to do this during the real parsing stage.



2. What parser generator did you use (or rolled your own hand-written recursive decent) ?

Flex and Bison.



3. Is it fine to create a separate constant buffer for each shader class if needed?
e.g. i have a few global, shared CBs, and i create new CBs for each shader (if it declares its own CB(s)).

If by “class” you mean “instance” then it is one way to go. It will not kill your performance but it uses more memory and is not the absolute most efficient system there is. Obviously that would be for them to share buffers.



4. Is it better to use a universal, generic material class instead of separate CWaterMaterial, CTerrainMaterial, CSkinMaterial ?
And how should material system talk to the low-level shader system? Through hardcoded shader variable semantics (e.g. WORLD_MTX, VIEW_POS, DIFFUSE_COLOR) ?

Yes. The shader/material systems should not know what those things are. A water object should just use the generic shader system to make shaders for water.
The terrain should use the generic shader system to make terrain shaders.

My engine communicates with shaders using a set of predefined semantics. They allow the engine to update uniforms/constants automatically.
For user-defined uniforms/constants, users can get an engine handle to anything by name and then use that handle to update them. Internally, this is how the engine updates the semantic values. The parser keeps track of what globals are created with what semantics and later uses the names of those constants to get handles to all the known semantic types. The fact that the engine updates them automatically is just an extension of this.



If you are thinking about making a language, a shader language is not a terrible way to start because they are fairly simple.
But be aware that making even simple languages requires a great deal of experience in order to have a nice class structure and to be stable/leak-free.
As one example, when you parse the syntax tree in order to generate HLSL output, you can’t use recursion or you will risk stack overflow in large complicated shaders. So you have to have experience working with explicit stacks.
Making an entire shader system revolve around this is a large and risky undertaking. If one thing doesn’t work, the whole system does not work.
Just be aware.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Great, this clears up a lot.

For fast iteration, a system for reloading shaders on-the-fly is a 'must-have'.
But only affected files should be recompiled.

1. Do you track which files get #included into source files and how?
Do you keep some sort of a project file with compile timestamps ?

2. What if some shader doesn't have a particular semantic?
i'd like to avoid checks for invalid handles before making every UpdateShaderConstantByHandle() call.

3. Where are material parameters (e.g. colors, texture layers, etc.) stored?
i assume there're no such inefficient solutions as 'map<string,handle>'.

1. Do you track which files get #included into source files and how?
Do you keep some sort of a project file with compile timestamps ?
My game doesn't compile shaders, an external program compiles all the data files for the game. Yep, this external file maintains a 'project' file of all data that will end up shipping with the game, and maintains timestamps as part of the build process (and dependency graphs, e.g. for #includes). It also listens to OS file-system events in the export directories, so when a file is modified the OS tells the build-system, which highlights the 'rebuild' button in the UI. When the user clicks 'rebuild', it re-runs the build process (which checks the timestamps/dependencies and and rebuilds the modified files), and if the game is running, it sends a message to the game with the names of the modified output files so the game can re-load them.
3. Where are material parameters (e.g. colors, texture layers, etc.) stored?
In an array of bytes, which correspond to a particular shader cbuffer/UBO layout.

1) I'd like to avoid setting shader parameters (uniforms) one by one
Make sure that when you're writing you shaders, you group uniforms together into cbuffers/UBOs based on who is supposed to set them. E.g. one buffer for camera parameters, one for transform data, one for data set by the artists that will never change at runtime, etc...
1. How do you update shader constants?
- at runtime, keep shader metadata (info about used variables: name, type, offset, dimension);
- get parameter handle by string name or hardcoded semantic;
- if the handle is valid:
- - - SetParamByHandle() (copy float, matrix, int, bool into the parameter buffer);
- - - see if UniformBuffer::isDirty flag is 'true' and update the corresponding constant buffer;
?

sounds a bit messy and suboptimal, imho.

2. How do you implement support for multiple passes?

3. Could you please sketch your material class in pseudocode?

i guess, materials should only contain data declarations, no actual rendering code.

1. How do you update shader constants?
- at runtime, keep shader metadata (info about used variables: name, type, offset, dimension);
- get parameter handle by string name or hardcoded semantic;
- if the handle is valid:
- - - SetParamByHandle() (copy float, matrix, int, bool into the parameter buffer);
- - - see if UniformBuffer::isDirty flag is 'true' and update the corresponding constant buffer;
?
To show a background analogy for a moment. Say you've got a weapon and player class in your game, and the weapon wants to decrement the players health, but for some reason the Attack function is passed the player as a [font=courier new,courier,monospace]byte*[/font].
[font=courier new,courier,monospace]struct Player { int health; }; struct Weapon { void Attack(byte* player); };[/font]
You wouldn't use run-time reflection for this,
e.g. [font=courier new,courier,monospace]void Weapon::Attack(byte* p) { (*(int*)(p+Reflection<Player>["Health"].Offset))--; }[/font]
You'd just make sure the structure layout is known at compile-time,
e.g. [font=courier new,courier,monospace]void Weapon::Attack([/font]byte* p) { ((Player*)p)->health--; }
And most of the time, just because some code is C and some HLSL, there need not be any difference. You can define the same structure layout in your shader code and your game code, then both of them can read/write the variables in their native fashion.

It's only when parsing material files from your artists, or building a debugging system that you need a reflection system that can get offsets/etc from string-names. For any dynamic data that's generated by the game, you should be able to statically define the interface structure.

I do keep this reflection data around, but it belongs to the shader-code object. I can use it to set a cbuffer parameter by name though this is almost never needed (parsing of artist's material files and their conversion into binary cbuffer structures is done offline, not by the game).
Mostly it's for debugging purposes (e.g. [font=courier new,courier,monospace]assert( offsetof(Player::health) == shader.Reflection["Player"]["Health"].Offset );[/font])

Also, shader constant values don't belong to the material instances, they belong to the cbuffer instances. The "material" holds pointers to cbuffers paired with slot #'s.
e.g. if every 'material' needs the camera matrices, you don't want to have to loop through 100 objects setting the same matrix on all of them. You just plug your camera cbuffer into those objects once on-load, and then update the single camera cbuffer once per frame.
- - - SetParamByHandle() (copy float, matrix, int, bool into the parameter buffer);
- - - see if UniformBuffer::isDirty flag is 'true' and update the corresponding constant buffer[/quote]On older APIs like DX9, I emulate cbuffers myself with a 'parameter buffer' in RAM (as I found it the most efficient route to take, though you must ensure that your uniform registers are contiguous per 'cbuffer'). So there's no need to keep two different copies of the data in sync. On newer APIs, whether you need two copies of the data is an implementation detail of the dynamic data mapping system -- most API's have a "dynamic" flag, which tells the driver to perform double-buffering of your data internally to make mapping/unmapping fast.
2. How do you implement support for multiple passes?[/quote]Do you mean multi-pass shaders, where to draw the object it requires a few successive draw-calls? These types of materials aren't very common these days, but can be implemented by allowing your 'pass' definition to contain a link to the next 'pass'. This is more a detail of the model rendering system, or the part that submits the draw-calls for your meshes.
3. Could you please sketch your material class in pseudocode?[/quote]I kept simplifying/generalising until I didn't even have a material class left at run-time. I've just got a "state group" (a collection of GPU states required for a draw-call, such as cbuffer bindings) and cbuffers, which are just an array of bytes. I guess in psuedo code I could say:typedef vector<char> CBuffer;
struct State {}; struct CBufferBinding : State { int slot; CBuffer* data; }
typedef vector<State*> StateGroup;
N.B. looking at a cbuffer in isolation, there's no way to reflect upon it's contents. You need to also know which shader it's intented to be used with, and fetch the reflection data from that shader.
Thanks, Hodgman, this info is super useful!

1. Where do you keep 'shader resources' (such as pointers to material textures) and how do you bind them to the pipeline?

2. Do your shaders carry additional information about allowed render passes, vertex formats, etc. ?
(e.g. shader for filling g-buffer should only be used during SCENE_PASS_FILL_GBUFFER.)

casting to a native C struct is a better solution than hardcoded semantics, imho.
i still can't imagine having no material class (dunno how i would reference materials in assets and set material properties).

i still can't imagine having no material class (dunno how i would reference materials in assets and set material properties).


I think the idea is that you don't set material properties from within the game per se. Is sounds like the renderer has no idea what artist-defined parameters it's setting, it just loads a pre-processed cbuffer blob and plugs it into the shader. Likely I'm simplifying :)

Interesting stuff, Hodgman.

This topic is closed to new replies.

Advertisement