As soon as I learned about GL_ARB_separate_shader_objects I imagined that I would have huge performance benefits because with this one OpenGL would look and feel more like DirectX9+ and my cross-API code could remain similar while maintaining high performance.
To my surprise, I implemented GL_ARB_separate_shader_objects only to find out that my performance is halved, and my GPU usage dropped from ~95% to 45%. So basically, having them as a monolithic program is twice as fast as being separate. This is on an AMD HD7850 under Windows 8 and OpenGL 4.2 Core.
I originally imagined that this extension was created to boost performance by separating constant buffers and shader stages, but it seems it might have been created for people wanting to port DirectX shaders more directly, with disregard to any performance hits.
So my question, is if you have implemented this feature in a reasonable scene, what is your performance difference compared to monolitic programs ?