More constant buffers or more techniques?

Started by
5 comments, last by L. Spiro 10 years, 5 months ago

Suppose I have different object in my scene. Some of them use textures, some don't. Some textures need to be clipped, and some don't.

Is it better to create many techniques for different combinations (and send uniform variables for these "settings"), or create constant buffers with the settings and update it from the CPU while using one technique?

Advertisement

The first method will be faster in general. It's better (for performance) to have specialised shaders, rather than one universal with a lot of run-time dynamic branching.

I'm not an expert on this, so take what I say with a grain of salt, but I do believe switching constant buffers is much faster than switching shaders (which is what will happen when using multiple techniques) and that branching on a uniform isn't a big deal (see MJP's post here). So I would say stick with 1 technique and use branching.

Yea, the matter is probably much more complicated. But even in the cited post from MJP, he says that it applies to modern hardware (and 5 years is not THAT much, it's nVidia 9xxx).

Maybe someone else have some additional theoretical or practical feedback on this?

Read my and osmanb's comments on the same post.

I've been looking into the subject this last 2 days. I have lighting shader where various parts are toggled using variables in CB.

Specifically in my case, lightEnable times 3, shadowEnabled, specEnabled and translucencyEnabled. By removing all the branches, the number of SM5 instructions went down by 16%. The overall application performance went up by mere 1.5% - not much, but most of the cycles in my app are spent on post-processing and not the lighting shader.

My advice - use common sense. You'll usually want a shader per material, then share code using common HLSL headers. If your framework is well-written, then creating shader objects should be a breeze. There are some cases when uber-shaders are useful, but that's not usually the case.

[EDIT] - regardless of performance issues, seperate shaders is just a better design.

Btw, the common advice is to use some sort of render queues and sort it before rendering. And the first sorting criterion usually is shaders, because switching them is expensive (as Telanor said). But of course, even then you have more shader changes with specialised shaders than with few very general ones ;) But we are suddently talking just about FEW changes more.

On the other hand, if you have few shaders, you can better sort by the next criterion, which are probably textures.

It's all really complex.

I personaly use separate effects (shaders in my own system, not using the D3DX Effect framework) for individual material (or appearance) types. And at this moment, I have a separate shader for a material with a diffuse texture and without a diffuse texture and so on. The variations inside one effect are based just on stuff like colors and other parameters (reflectivity...). And there is NO hlsl branching at all. But that's mostly because I made it quite some time ago and I'm still using it on older GPUs, where branching really was a problem - and could even be resolved as a lerp between both branches!

I'm not an expert on this, so take what I say with a grain of salt, but I do believe switching constant buffers is much faster than switching shaders (which is what will happen when using multiple techniques) and that branching on a uniform isn't a big deal (see MJP's post here). So I would say stick with 1 technique and use branching.

It is not that simple.

GPU’s shade in blocks at a time, typically near 8×8 pixels. If every pixel in the block falls through the same set of branches then the branches will indeed be virtually free, but if not, only one branch can be taken at a time while the pixels going through the other branch must wait—parallelism is lost.

Typical cases in which this happens is when a translucent object has a branch for discarding below a certain threshold.

On the other hand, for most basic renders of solid objects they will all take the same path.

Still, the fastest code is code that is never executed.

Swapping shaders is more costly than updating buffers, but this can be misleading at face value. If you set the shader on every render call (even if it is the same shader set over and over) it will certainly be the slowest option, but via a simple sort on the renderables and a manual record of the last shader set, renderables using the same shaders can be grouped together and the swapping between shaders can be reduced to its bare minimum.

Using a single non-permutating Uber Shader requires more heavy updating of buffers, which approaches the overhead of a shader swap the more that has to be updated, and updating a part of a buffer that won’t even be used by the shader is something you want to avoid.

Creating multiple shaders with a reasonable amount of branching and then sorting by shader and removing redundant shader applies is the best way to reach a reasonable middle ground in which you can lean towards more shaders or more branches later as your own benchmarks properly inform you.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement