I think you're going into the wrong part of the Hlms I wanted to highlight. You're looking at the template system it uses and how shaders get compiled; while what I wanted to highlight that:
- There are properties that involve Renderable information (i.e. a Mesh without normals cannot use lighting. A mesh without tangents cannot use normal mapping, a material that uses UV set #4 for the diffuse texture cannot be used with a mesh that only has 2 set of UVs, etc)
- There are properties that are per Material (i.e. material has a normal map, material uses parallax mapping, material uses 4 textures, uses transparency, etc)
- There are properties that are per Pass (i.e. this is a shadow mapping pass, this is a pass with shadow mapping and 3 active lights, this is a pass without shadow mapping and one active light, this is the Early-Z pass, this is a deferred rendering pass, this is the light accumulation pass, etc)
When assigning a Material to a Renderable, the Hlms analyzes both the Material and Renderable and creates a hash (and stores the property combination in a cache). Two different materials and two different renderables could perfectly end up with the same hash (i.e. both meshes have the same vertex formats, both materials use the same features but have different values).
Right before rendering a pass, the Hlms analyzes the Pass and creates a hash (and stores the property combination of the pass in a cache).
While rendering the pass, both the hash stored in each Renderable (which contains Renderable+Material information) and the Pass hash are combined to form the final hash. This final hash is used to pull the actual shaders and PSO needed to render (and if it doesn't exist, one is created).
The key here I wanted to highlight is that you have to make a system that accounts all 3 sources of information, and that a 64-bit key won't be enough if you use one bit for each setting.
Ah Okay, I think I''m starting to understand it better. If you don't mind me asking, what would you say are the benefits of taking the approach you have with hlms? If a template class system with a class factory was implemented, then you would be able to circumvent the need to add a large amount of mark up to your .hlsl file, parse said file, and creating a second process for situations in which the preprocessor doesn't work.
The goals were the following:
- Speed up iteration: A C++ class that patches up wherever the standard preprocessor doesn't work means that when you need to change something, you have to build the exe again, then running the exe which needs to load all the assets. This has an iteration time of between 20 seconds and 3 minutes depending on complexity. If you made a mistake, you need to modify your code and repeat. Modifying an Hlms shader template and reloading can be done without all that, and takes a couple milliseconds. This is massive improvement and a big major point.
- Make up for broken compilers: You're developing for HLSL, but in GLES-land on Android, it is a disaster. There is a major vendor whose for loops don't work at all because the dev misinterpreted the glsl ES 2.0 spec. They've released a fix for Lollipop, but older versions (and there's a lot of KitKat phones out there) still run that unpatched driver. So, the Hlms provides @foreach which allows us to manually unroll the loops (btw other vendors are really bad at unrolling loops). Some GLSL ES drivers are broken to the point where the only safe thing to do with macros is #ifdef #else #endif. But forget about #define DIFFUSE material.xyz * otherValue.x - and then using DIFFUSE instead of material.xyz * otherValue.x.
- Reuse snippets as much as possible. Multi-line macros on HLSL/GLSL need to be appended a '\' at the end of each line. The Hlms templates have @piece for this.
- The generated shader should be relatively efficient. Most approaches to uber-shaders end up with a nice modular system that results in a horribly slow shader; because it often delegates modularity to external functions and leaving unused code in the file; hoping the compiler will optimize it heavily by removing dead code and inlining all functions and avoid redundant calculation that was done multiple times inside each function. Pieces allow us to fine-tune the generated output and avoid redundant calculations.
C++ Templates heavily increase compilation time, so that makes it a no-no. But even then, your solution assumes these options can be resolved at compile time while often this has to be evaluated at run time. You're basically moving the problem from an external file (our Hlms shader templates) back into C++, which we wanted to avoid.
Note that the Hlms doesn't actually dictate how you write an implementation. You can just avoid the whole meta-preprocessor we provide (i.e. never use @property, @foreach, @counter, @piece, etc) and do it the way you want: using the HLSL's preprocessor and stitching the leftovers from C++.
--
There is no single way to achieve to the same path and the Hlms allows you to do it any way you want.
Like I said I think you're focusing on the template parsing side whereas I wanted you to look that, at design-level, you have 3 sources of information (Renderable, Material, Pass) and there is information you can bake once when assigning the material to the Renderable, information that needs to be evaluated per Pass, and a bit of work that must be done at render time per Renderable (i.e. merging baked hash in Renderable with the Pass hash).
Also you will need some sort of cache system with a 32-bit hash value, instead of a 64-bit bitfield.