What makes games in PC compile their shaders during runtime?

Started by
5 comments, last by arly 6 months ago

What makes games in PC compile their shaders during runtime? What kind of specific features needed or whatever they're doing that makes it necessary for games in PC to compile their shaders during runtime? What kind of “on-the-edge” combinations that makes it sound like they need to compile so specific to almost every single GPU variant?

I'm not in the so-called AAA industries so I barely have the privilege to experience optimizing down to every single graphics card (or even down to driver version, too? lol) or anything to a point that I need to compile shaders during runtime. The only thing I can come up with is maybe when my game is available for mod without restarting the game. So I just want to understand when that is required, and what makes it so necessary. For example maybe situations like “oh crap we need a code specific for GTX 1080”, or “we need to have a code that re-compiles this shader before the player reaches that door ASAP”, or “alrite let's just let the user compile these shaders during launch first” kind of experiences? if I may assume that those are some of the kinds.

Advertisement

Every PC hardware is different, and so that precludes shipping pre-compiled shaders with the game (though this is possible on consoles with fixed hardware). Each GPU has a different driver and hardware with different needs and capabilities, and by compiling locally on the user's machine you can get a more optimal shader. There is also SPIR-V which is supposed to make it easier to ship precompiled shader code, and reduce runtime compilation costs, but it's not available everywhere.

The most straightforward thing to implement is a “shader cache”, which is basically a map from shader configuration to the already compiled shaders. The shader configuration may allow for the renderer to override settings on the shader via preprocessor defines (e.g. are shadows/reflections/parallax mapping/etc enabled), and the renderer shader overrides may be changed by the user graphics settings. This shader cache can be saved on disk to prevent recompiling things each time the game is launched. For each material that needs to be drawn in a frame, the renderer looks for the relevant shaders in the cache. If they are not there, then it needs to compile them (and place them in the cache). This obviously takes some time (e.g. 50ms per shader maybe), which leads to jerky performance.

I guess that many games do something similar to what I described above. The reasons might include:

  • It's just easier, the development time would be better spent on other things.
  • The number of possible shader permutations may be too large to compile in advance (i.e. when launching the game the first time). In my renderer I have 80+ different boolean shader options, that's 2^80 possible combinations, about as many as there are atoms in the universe. Not all of those are likely to be used, but it can be hard to predict which ones would be used on very large projects worked on by many people.
  • It may be hard to predict which shader permutations are needed in the game at a given time, especially for special effects that occur at unpredictable times.
  • On older graphics APIs (e.g. OpenGL before 4.1, or OpenGL ES before 3.0), there is no way to save the compiled shader, so it's impossible to compile anything in advance.

Now it would obviously be better to compile all the shaders when the game first launches, and save those to disk. However, this might take a long time (e.g. 10-20 minutes), which would also cause user frustration. It would have to be repeated each time the user changes certain graphics settings. The feasibility depends on the game and how it handles materials. Some games might have only a handful of materials with limited permutations, in which case it's easy to just compile those. Big AAA games are another story, with potentially thousands of materials/shaders.

So the key point here is to get more optimal shader out of the user's computer? Also I heard about this SPIR-V, but only to get pre-compiled shaders though (for example, glslc to SPIR-V bytecode binary file), but I barely understand the potential usage outside of it. Like, if you say “it's not available everywhere”, do you mean it's not available on users' computers, and if they do have it, we can let the SPIR-V in their device to compile it?

Speaking of shader cache and shader configuration, in context of Vulkan for example, is this what the VkPipelineCache is all about? Creating Pipeline State Objects (PSOs?) can be cached and, if found, reused when needed? Also, this happens after we compile the shaders first, right? What I'm saying is, correct me if I'm wrong, there are two things that can happen during runtime (though probably not in order), which is compiling the shader modules, and creating the pipelines IF they aren't cached or aren't created the first time, is that it? So we got 1) shaders to compile and save and 2) pipelines to cache, whenever they are needed?

So I assume to balance this, it's in between of either trying to get the best looking game where the dev needs to throw in all possible shader permutations to get the look that the artists want in any possible device at any possible moment in the game, or to limit the shader capability to reduce the amount of the combinations significantly, say creating 2D game where it's all sprites, as an extreme example. Hmm…

The x86 and x64 architectures are so dominant in the PC that is feasible to not support other CPU architectures. For this reason, you can compile at deploy time and distribute binary code.

If you wanted to run your game at truly any CPU that a user may have, then you would have to compile at the users system as well.

With GPUs, the APIs are at application level. You specify input, you specify a transformation to pixels and colours, and that's it. There is no API at CPU instruction level there. That API is internal to the GPU system, and likely different hardware has different internal CPU APIs. In other words, for GPUs you are back in the situation of needing to support any CPU architecture that exists, and thus you need to compile the code at the user system.

arly said:
So the key point here is to get more optimal shader out of the user's computer? Also I heard about this SPIR-V, but only to get pre-compiled shaders though (for example, glslc to SPIR-V bytecode binary file), but I barely understand the potential usage outside of it. Like, if you say “it's not available everywhere”, do you mean it's not available on users' computers, and if they do have it, we can let the SPIR-V in their device to compile it?

SPIR-V is not machine code any GPU could execute. Similar to DXIL for DirectX, it is just another ‘programming language’ (actually ‘byte code’) the GPU driver needs to compile to the final machine instructions on the client.
So the pipeline actually goes like: Transpile GLSL high level language to SPIR-V byte code (lower level, hardly readable by humans), then - by the GPU driver - eventually transpile the byte code to some other, proprietary byte code GPU IHVs use internally, then finally compile the machine code from that.

Notice there are about two transpiling steps involved, which means to translate from one language to another, but not the machine language the hardware could actually run.
On PC there is no way to avoid this. For example, NVidia does not even specify their machine code instructions. So the only way to execute anything on a NV GPU is to use their compiler coming with their (black boxed) drivers and software.
That's a very different situation from what we have on CPU, where the instruction set is public and standardized, and we have many compilers available to choose from.

We currently have a similar case with the BVH used for ray tracing, but here it gives a real problem: The BVH data structures are black boxed and not specified. So we can not access and change the data, nor can we precompute BVH and stream it from disk.
We must generate the BVH at runtime, which has a high cost. And because we can't change the BVH data, we can not have any fine grained LOD solution with ray tracing.
Any time a small patch of surface for our model changes detail, we need to regenerate the BVH for the whole model, since there is no option for local BVH updates given by APIs. Because it's likely that all models change some of their detail as we move through the scene, so we have to rebuild the BVH for the entire scene each frame, which is not possible in realtime. In other words: Raytracing API design was a total failure. It's actually preventing any progress on the still open LOD problem, which isn't less important than lighting.
That's why UE5 Nanite geometry can't be traced, and Epic has to use low poly proxies without LOD instead. That's likely also why we still see no competitor to Nanite from other engine makers - they better wait until the amateurs at MS and NV fix their broken API design or lift black boxes.

That just said to illustrate that shader compile times are not that much of a problem. There are much bigger and similar problems, but people don't notice them since they don't know what they miss.

arly said:
Speaking of shader cache and shader configuration, in context of Vulkan for example, is this what the VkPipelineCache is all about? Creating Pipeline State Objects (PSOs?) can be cached and, if found, reused when needed? Also, this happens after we compile the shaders first, right? What I'm saying is, correct me if I'm wrong, there are two things that can happen during runtime (though probably not in order), which is compiling the shader modules, and creating the pipelines IF they aren't cached or aren't created the first time, is that it? So we got 1) shaders to compile and save and 2) pipelines to cache, whenever they are needed?

Yeah that's right. Maybe worth to mention that you likely need to recompile shaders if users update their graphics driver.

arly said:
So I assume to balance this, it's in between of either trying to get the best looking game where the dev needs to throw in all possible shader permutations to get the look that the artists want in any possible device at any possible moment in the game, or to limit the shader capability to reduce the amount of the combinations significantly, say creating 2D game where it's all sprites, as an extreme example. Hmm…

One cause of the problem is that shaders (source code) is currently treated like content. Shaders are basically assets. Artists generate them like textures on the fly, so their number tends to become large.
That's maybe not how it should be. Alternatively we could setup a limited number of material options the artists can choose from. And then they have to work with that, without an option to create new materials on demand, needed just for a single model.
However, that's what people often try to do already. We try to find good compromises, but still compile times became a problem.
Personally i look at it this way: We got rid of loading times, but now we have traversal and compiler shutters. Nothing comes for free, and we have to live with that.

Thanks for the further explanation guys! Also interesting read on the BVH used for RT.

This topic is closed to new replies.

Advertisement