I love shader permutations! (not sure if ironic or not)

Started by
10 comments, last by Jason Z 10 years, 1 month ago

OK, I have reached the conclusion that permutations works extremely well when you know what you want to render and need the appropriate shaders for that and only that. With my template system you'd need at most a few dozen permutations and you really don't need to generate or load any more. So no more hundreds/thousands of permutations.

But If you are writing an engine with a material editor that should support a lot of options out of the box without any shader editing and intentionally limit the options to keep the permutations low, you still end up with thousands. And if you really don't limit and go wild, getting a theoretical number of over 500.000 is achievable.

So permutations are out of the question for the engine editor.

I started experimenting with a new solution. I'm experimenting a lot right now before the engine gets too advanced because alter it is going to be much more difficult to change anything. I based my attempt on:


The FX framework is very outdated, and a left-over from D3D9... In D3D11, they released the source code for it so you can keep using it if you like, or you can migrate away from it or customize it... Internally, it just makes a big "globals" cbuffer per shader, which is very inefficient. e.g. if 99% of the shader variables don't change, but 1% do, then the entire "globals" cbuffer has to be updated anyway.
You should definitely structure your renderer around the concept of cbuffers instead of individual shader variables if it's going to exist into the future past D3D9.
I've got a post here where I describe how I emulate cbuffers on D3D9 (which ended up being more efficient than using fx files on D3D9 for me) http://www.gamedev.net/topic/618167-emulating-cbuffers/

I had extremely limited experience with shaders without FX, but the DX10 constant buffer solution still seemed like a great idea. So I tried to implement something like you described there for DX9.

I had to overcome a few hurdles. For the first hour I just couldn't render anything in my hello world test. First time using raw shaders and I did not know about the matrix packing order discrepancy. FX and shaders use different conventions or so it seems.

The second obstacle was constant initialization. I guess that without the FX framework, something like "float gloss: register(c6) = 0.1;" will leave gloss as zero. And no more preshaders, but who needs them anyway :).

But after I fixed these I started rendering again. The solution is far from elegant. The "ghetto" constant buffer structure hopefully has compatibility with DX9 and 10, but I did have to add a lot of filler members to get the alignment right. Then there is the problem of SharpDX/C#: SetVertexShaderConstant and SetPixelShaderConstant really don't take that convenient parameters. I did not want to go with some marshaling or array solution, so I'm passing a pointer to these functions. The only overload that takes a pointer takes a *Matrix. So I'm making sure he structure is Matrix aligned and I'm casting a pointer to it to Matrix in unsafe code. Ugly...

The way I'm using this right now is based on the material templates. When you resolve one, this time a pixel shader only for it will be generated and compiled on the fly.

There are two scenarios:

  • In game. You will have several active material templates so compiling them is slowish but not that bad. The good news is that you can cache them. There is no need to compile every time because a template does not really need to change.
  • In editor mode, when editing the templates: now, this is problematic. I trimmed the shaders as low as I could, but the compiler is still very slow. The best case scenario is 30 ms, but some more complex templates take 300 ms. And these numbers are just going to go up.

I'll add multi-threaded resolve to editor mode so that your GUI is not frozen and a helpful message in the corner like "FXC is shiting itself compiling 20 lines of pixel shader that calls 2 functions. Please wait..." and leave it at that.

I need to continue my engine.

But someday I'll revisit this subject. I was thinking of either writing my own compiler (I'm actually good at compiler writing) or an easier solution: try binary adjustment and concatenation of permutation atomics. Computing a point light is always the same with only some input constant different, yet FXC is having problems and probably evaluates the same pixel shader function as many times as it finds it. I'm sure some clever trick can be used to speed up simple permutation based pixel shader compilation at least an order of magnitude.

Advertisement


So, what are you then? You have a huge rep...

I am a developer in the automotive industry, where I build data manipulation and visualization tools. I'm also a co-author of the book in my signature below as well as contributing to a few others, and I have been fortunate enough to get the Microsoft MVP award in DirectX and more recently in C++. But I have been around these forums for quite some time, and I really enjoy trying to help others learn graphics programming the same way I did!

This topic is closed to new replies.

Advertisement