That's the problem. This was an accurate reflection of some pixel shader hardware in 2005, but is now not at all reflective of actual GPU hardware. In order for GL to provide you with the abstraction that uniforms are part of a program's state, the GL driver is forced to emulate this behavior, which is extremely costly. There's no need for GL to emulate the behavior of half a decade old hardware any more.
I don't see what the problem is. Uniforms are the part of the program's state.
In D3D it can be leveraged to reduce the cost of setting uniforms. D3D9's abstractions don't tie uniform values and programs together like GL does (i.e. when setting a uniform in D3D9, the value persists in that 'slot' until it's set to another value, regardless of which program is set), which allows a graphics engine to greatly simplify the process of setting uniform values.
Explicit uniform location has nothing with sharing uniforms.
If the previous draw-call has set 50 3x4 bone matrices in registers #42-192, and the current draw-call (which uses a different program) requires the same array of matrices in the same registers, then the engine can detect that condition with very little effort (e.g. a single comparison) and do zero work, relying on the fact that the GPU will already contain the correct uniform values.
In GL, because you're dealing with an emulation layer instead of a valid abstraction, the program is required to set all 50 matrices on both programs regardless, and then the GL driver is forced to perform 150 vec4 vs vec4 value comparisons in order to determine whether the new uniform values are redundant or not, and whether it should send them to the GPU registers.
The UBO abstraction is exactly the same as the D3D9 abstraction, except that multiple register banks of different sizes can be exist and/or be bound at a time, instead of there being a single global register bank. e.g.The D3D9 behavior could be emulated on GL3/D3D10 by creating a single global UBO, but that wouldn't be useful ;)
For SM3 GPUs, when comparing GL2 and D3D9, D3D9 allows a rendering engine to be several orders of magnitudes more efficient when it comes to managing uniform values.
For SM4/5 GPU's, the same is true for GL2 vs GL3 with UBOs.
If you're targeting GL3 or GL4, then GL2's way of managing uniforms should be entirely discarded, and GL3's UBO's should be used in all circumstances - because the UBO abstraction actually maps to the SM4/5 GPU hardware.
If uniforms have to be shared among multiple programs that's where UBOs should be used.