For reference, as your OGL information is horrifically out of date, this is how it goes in OGL land:
First, shader stage inputs and outputs, regardless of stage, are called ... input and output: "in" / "out". The, imho pointless, notion of "attributes" and "varyings" in glsl source have been deprecated - good riddance.
Yeah I was just sticking with the terminology already present in the thread.
The new in/out system in GLSL is much more sensible, especially once you add more stages to the pipeline between vertex and pixel
Seeing the OP is using this terminology though, perhaps they're using GL2 instead of GL3 or GL4, which limits their options
BTW when using interface blocks for uniforms, the default behaviour in GL is similar to D3D in that all uniforms in the block will be "active" regardless of whether they're used or not though - no optimisation to remove unused uniforms is done. It's nice that GL gives you a few options here though (assuming that every GL implementation acts the same way with these options...).
D3D's behaviour is similar to the std140 layout option, where the memory layout of the buffer is defined by the order of the variables in your block and some packing rules. No optimisation will be done on the layout of the "cbuffer" (D3D uniform interface block), due to it acting as a layout definition for your buffers.
With the default GL behaviour, the layout of the block isn't guaranteed, which means you can't precompile your buffers either. The choice to allow this optimisation to take place means that you're unable to perform other optimisatons.
Again though, assuming your GL implementation is up to date, you've got the option of enabling packing rules or optimisations, or neither.
Shader stages are compiled in isolation (that is the way it has always been), and linked together into shader program(s) later. If an input is not used then it is thrown away - D3D does probably the same, no?
This is the difference I was trying to point out
D3D does not have an explicit linking step, which is the only place where it's safe to perform optimizations on the layout of the interface structures.
In D3D9 there was an implicit linking step, but it's gone in D3D10. Many GL implementations are also notorious for doing lazy linking, like this implicit step, with many engines issuing a "fake draw call" after binding shaders to ensure that they're actually compiled/linked when you wanted them to be, to avoid CPU performance hiccups during gameplay...
Once you've created the individual D3D shader programs for each stage (vertex, pixel, etc), it's assumed that you can use them (in what you call a mix-and-match) fashion straight away, as long as you're careful to only mix-and-match shaders with interfaces that match exactly, without the runtime doing any further processing/linking.
In D3D9, there was some leeway in whether the interfaces matched exactly, but this requires the runtime to do some checking/linking work as a regular part of draw-calls (part of the reason it's draw calls are more expensive than D3D10/GL), so this feature was scrapped. Now, it's up to the programmer to make sure that their shaders will correctly link together as they author them, so that at runtime no validation/fix-ups needs to be done.
PS. shader programs can be extracted as binary blobs for caching to skip all of the compiling/linking altogether - i have never found any reason to use them myself (ie. never suffered shader count explosion).
This features isn't the same as D3D's compilation -- you can only extract blobs for the current GPU and driver. The developer can't precompile their shaders and just ship the blobs.
Having precompiled intermediate is one of the most reoccurring requests at OGL side (even after binary blobs got already added) - having D3D brought up as an example time and time and time again. So, whats the holdup? If it makes sense for OGL then why is it not added?
That's a pretty silly argument. To make a similarly silly one from the other sdie of the fence: in Windows 8 Metro, you can't compile HLSL shaders at runtime at all, but are forced to pre-compile them into bytecode ahead of time and ship these blobs to the customer. If runtime compilation is so important, then why was it removed (by a group that is about of equal importance/expertise to Khronos/ARB)?
Yeah, the driver's internal compiler could do a better job with the high-level source rather than pre-compiled bytecode, but the fact is that compiling HLSL/GLSL is slow. D3D's option to load pre-compiled bytecode shaders is order of magnitude faster. You may not have personally run into a problem with it, but plenty of developers have, which is why this feature is so popular. Just how large C++ games can take between minutes to hours to build, the shader code-bases in large games can take just a few seconds, or half an hour... Even with the caching option, it's very poor form to require your users to wait for 10 minutes the first time they load the game.
Sure, you can trade runtime performance in order to reduce build times, but to be a bit silly again, if this is such a feasible option, why do large games not do it?
Another reason why runtime compilation in GL-land is a bad thing, is because the quality of the GLSL implementation varies widely between drivers. To deal with this, Unity has gone as far as to build their own GLSL compiler, which parses their GLSL code and then emits clean, standardized and optimized GLSL code, to make sure that it runs the same on every implementation. Such a process is unnecessary in D3D due to there being a single, standard compiler implementation.