Ah, that explains it. Now, speaking of which - to OP, are you sure you are not using an unnecessarily old GLSL?
Yeah I was just sticking with the terminology already present in the thread.
Quite, was a bit puzzled myself when the GLSL first surfaced. OGL sure has long lasting developmental issues (stemming from: crippling need for consensus / compatibility / support).
GL2's API for dealing with shaders, uniforms, attributes and varyings is absolutely terrible compared to the equivalents in D3D9, or the more modern APIs of D3D10/GL3...
Did not quite understand what you said here (not sure which parts refer to D3D and which OGL).
BTW when using interface blocks for uniforms, the default behaviour in GL is similar to D3D in that all uniforms in the block will be "active" regardless of whether they're used or not though - no optimisation to remove unused uniforms is done. It's nice that GL gives you a few options here though (assuming that every GL implementation acts the same way with these options...).
About uniforms with OGL: Uniform buffers are not part of program object and hence are not directly bound to any program. Plain uniforms are bound to program object - however, it appears that everyone compiles an internal buffer for thous under the hood and the two cases are indistinguishable at hardware level.
So, in either case, there is no special "loading" code generated for uniforms and the only optimization of removing unused stuff one can speak of is ... well, just do not use the parts of the uniform you do not use ... duh. As the underlying hardware is the same then D3D is bound to end up doing the exact same thing here (unused uniforms are, in all regards that matter, thrown out - regardless of what is seen/reported at API side).
Ie. only buffers are bound - the individual uniforms are just offsets in machine code.
Oh, and thank goodness for std140 or my head would explode in agony.
What optimizations (makes zero difference at driver/OGL/GPU side)? You mean CPU side, ie. filling buffers with data? Yeah, it would be pretty painful not to use std140. IIRC, it was added at the same time as interface blocks - so, if you can use interface blocks then you can always use the fixed format also ... a bit late here to go digging to check it though.
With the default GL behaviour, the layout of the block isn't guaranteed, which means you can't precompile your buffers either. The choice to allow this optimisation to take place means that you're unable to perform other optimisatons.
Yep, got that when i re-read the "separate program objects" extension ( http://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt ) - it uses "max-and-match" to describe it. It is core in 4.3. Not using it any time soon, nice to have the option though (as most shader programs do not particularly benefit from "whole-program-optimization").
Once you've created the individual D3D shader programs for each stage (vertex, pixel, etc), it's assumed that you can use them (in what you call a mix-and-match) fashion straight away, as long as you're careful to only mix-and-match shaders with interfaces that match exactly, without the runtime doing any further processing/linking.
Ee.. you lost me here :/. I think you implied an argument where there was none. I was conveying "wonderment" as perceived by me - it is not an argument from me nor from the "wonderer".
That's a pretty silly argument.
Having precompiled intermediate is one of the most reoccurring requests at OGL side (even after binary blobs got already added) - having D3D brought up as an example time and time and time again. So, whats the holdup? If it makes sense for OGL then why is it not added?
But if i would speculate anyway then the reason it has not been added to OGL might be:
* Khronos is slow and half the time i just want to throw my shoe at them.
* Instead of one specification one has to hope is implemented correctly () - now there would be two.
* Consensus lock / competition ... ie. no MS as arbitrator to break the lock.
* Insufficient demand from thous that matter (learnin-ogl-complaining-a-lot persons do not matter).
* The question whether it would be worthwhile for OGL specifically has not been confidently settled.
* There are more important matters to attend to - maybe later.
* All the above.
PS. i would like to have an GLSL intermediate option - as you said, in case of shader explosion (as i call it), it gets problematic (uncached runs).
Then i would say that one is doing something wrong. One does not need thousands of shaders to show the splash screen ;)
wait for 10 minutes the first time they load the game.
... again, i am not against an intermediate, i would just like to point out that its absence is not as widespread and grave problem as it is often portrayed to be.
Yep, that is silly indeed. Cannot quite use what one does not have - D3D, as far as i gather from your responses, does not have the option (no intermediate / whole-program-optimization) to begin with. Asking why the option that does not exist is not used more often ... well, good question.
Sure, you can trade runtime performance in order to reduce build times, but to be a bit silly again, if this is such a feasible option, why do large games not do it?
Having an extra specification and implementation is unlikely to be less problematic than not having the extra.
Another reason why runtime compilation in GL-land is a bad thing, is because the quality of the GLSL implementation varies widely between drivers.
... continuation: It is unnecessary in OGL too - GLSL etc is well specified. If implementers fail to follow the spec then changing the spec content (intermediate spec etc) to make them somehow read the darn thing and not fuck up implementing that ... is silly.
To deal with this, Unity has gone as far as to build their own GLSL compiler, which parses their GLSL code and then emits clean, standardized and optimized GLSL code, to make sure that it runs the same on every implementation. Such a process is unnecessary in D3D due to there being a single, standard compiler implementation.
Leaving the bad example aside, what you wanted to say, if i may, is that a third party (Khronos) compiler would be helpful as it would leave only the intermediate for driver.
Perhaps. However, I highly doubt it would be any less buggy. Compilers are not rocket science (having written a few myself) - the inconsistencies stem from lesser user base and some extremely lazy driver developers and shader writers not reading the spec either. "Khronos compiler" would not have fixed any of thous.
I hope we are not annoying OP with this, a bit, OT tangent (assessing the merits of intermediate language in context of OGL and D3D). At least i can say i know more about D3D side than before - yay and thanks for that . Got my answer, which turned out to be relevant to OP too as OGL has the D3D mix-and-match option too, which if used has indeed the same limitations.
edit: yep, definitely bedtime.