Sign in to follow this  

OpenGL GL_ARB_separate_shader_objects Performance ?

This topic is 1377 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts


As soon as I learned about GL_ARB_separate_shader_objects I imagined that I would have huge performance benefits because with this one OpenGL would look and feel more like DirectX9+ and my cross-API code could remain similar while maintaining high performance.

To my surprise, I implemented GL_ARB_separate_shader_objects only to find out that my performance is halved, and my GPU usage dropped from ~95% to 45%. So basically, having them as a monolithic program is twice as fast as being separate. This is on an AMD HD7850 under Windows 8 and OpenGL 4.2 Core.

I originally imagined that this extension was created to boost performance by separating constant buffers and shader stages, but it seems it might have been created for people wanting to port DirectX shaders more directly, with disregard to any performance hits.

So my question, is if you have implemented this feature in a reasonable scene, what is your performance difference compared to monolitic programs ?

Share this post

Link to post
Share on other sites

Updating my post cause I was wondering for a long time how is Nvidia doing with separate shader objects.

Turns out, not that great. Got a GTX 650 running on latest 335 drivers, got 290 FPS with SSO On, and 308 without them, so still a heavy performance penatly. And I'm doing a pipeline object per shader pair.


And I'm also seeing some weird behavior with objects that don't use textures.

Share this post

Link to post
Share on other sites

That's stunning. I've always abstained from using that extension (without even trying once!) because I had the weird "feeling" that it would probably run much slower, since the shader compiler is necessarily not able to optimize as agressively as it could otherwise.


I remember a statement from a nVidia guy years ago that basically said "Yeah, but not much of that is happening anyway, and the hardware works in a mix-and-match way in the sense of ARB_separate_shader_objects, it has been working like that for many years".


Now you've done back-to-back comparisons on two major vendors and see that while nVidia is at least in the same ballpark, both major IHVs are noticeably slower with separate shader objects. Bummer.


One should not forget that getting 50% of 300fps is still 150fps, which is way good enough (and many people will argue that practically, with VSync, there is absolutely no difference).

But of course only utilizing the GPU 50% means the customer who needs to meet your minimum hardware requirements could as well only have spent half as much money on hardware... or they could run your game with 2x or 4x supersampling at the same frame rate instead, with the hardware they have.

Share this post

Link to post
Share on other sites

When using this, make absolutely sure that your vertex shader output and fragment shader inputs match perfectly. The same variables, same semantics, same order, etc, etc. Otherwise the driver will have to patch the fragment program to fetch the correct interpolators.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Similar Content

    • By _OskaR
      I have an OpenGL application but without possibility to wite own shaders.
      I need to perform small VS modification - is possible to do it in an alternative way? Do we have apps or driver modifictions which will catch the shader sent to GPU and override it?
    • By xhcao
      Does sync be needed to read texture content after access texture image in compute shader?
      My simple code is as below,
      glBindImageTexture(0, texture[0], 0, GL_FALSE, 3, GL_READ_ONLY, GL_R32UI);
      glBindImageTexture(1, texture[1], 0, GL_FALSE, 4, GL_WRITE_ONLY, GL_R32UI);
      glDispatchCompute(1, 1, 1);
      // Does sync be needed here?
      glBindFramebuffer(GL_READ_FRAMEBUFFER, framebuffer);
                                     GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, texture[1], 0);
      glReadPixels(0, 0, kWidth, kHeight, GL_RED_INTEGER, GL_UNSIGNED_INT, outputValues);
      Compute shader is very simple, imageLoad content from texture[0], and imageStore content to texture[1]. Does need to sync after dispatchCompute?
    • By Jonathan2006
      My question: is it possible to transform multiple angular velocities so that they can be reinserted as one? My research is below:
      // This works quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); quat quaternion2 = GEMultiplyQuaternions(quaternion1, GEQuaternionFromAngleRadians(angleRadiansVector2)); quat quaternion3 = GEMultiplyQuaternions(quaternion2, GEQuaternionFromAngleRadians(angleRadiansVector3)); glMultMatrixf(GEMat4FromQuaternion(quaternion3).array); // The first two work fine but not the third. Why? quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); vec3 vector1 = GETransformQuaternionAndVector(quaternion1, angularVelocity1); quat quaternion2 = GEQuaternionFromAngleRadians(angleRadiansVector2); vec3 vector2 = GETransformQuaternionAndVector(quaternion2, angularVelocity2); // This doesn't work //quat quaternion3 = GEQuaternionFromAngleRadians(angleRadiansVector3); //vec3 vector3 = GETransformQuaternionAndVector(quaternion3, angularVelocity3); vec3 angleVelocity = GEAddVectors(vector1, vector2); // Does not work: vec3 angleVelocity = GEAddVectors(vector1, GEAddVectors(vector2, vector3)); static vec3 angleRadiansVector; vec3 angularAcceleration = GESetVector(0.0, 0.0, 0.0); // Sending it through one angular velocity later in my motion engine angleVelocity = GEAddVectors(angleVelocity, GEMultiplyVectorAndScalar(angularAcceleration, timeStep)); angleRadiansVector = GEAddVectors(angleRadiansVector, GEMultiplyVectorAndScalar(angleVelocity, timeStep)); glMultMatrixf(GEMat4FromEulerAngle(angleRadiansVector).array); Also how do I combine multiple angularAcceleration variables? Is there an easier way to transform the angular values?
    • By dpadam450
      I have this code below in both my vertex and fragment shader, however when I request glGetUniformLocation("Lights[0].diffuse") or "Lights[0].attenuation", it returns -1. It will only give me a valid uniform location if I actually use the diffuse/attenuation variables in the VERTEX shader. Because I use position in the vertex shader, it always returns a valid uniform location. I've read that I can share uniforms across both vertex and fragment, but I'm confused what this is even compiling to if this is the case.
      #define NUM_LIGHTS 2
      struct Light
          vec3 position;
          vec3 diffuse;
          float attenuation;
      uniform Light Lights[NUM_LIGHTS];
    • By pr033r
      I have a Bachelor project on topic "Implenet 3D Boid's algorithm in OpenGL". All OpenGL issues works fine for me, all rendering etc. But when I started implement the boid's algorithm it was getting worse and worse. I read article ( inspirate from another code (here: but it still doesn't work like in tutorials and videos. For example the main problem: when I apply Cohesion (one of three main laws of boids) it makes some "cycling knot". Second, when some flock touch to another it scary change the coordination or respawn in origin (x: 0, y:0. z:0). Just some streng things. 
      I followed many tutorials, change a try everything but it isn't so smooth, without lags like in another videos. I really need your help. 
      My code (optimalizing branch):
      Exe file (if you want to look) and models folder (for those who will download the sources):
      Thanks for any help...

  • Popular Now