Sign in to follow this  
Relfos

GLSL bad performance?

Recommended Posts

I'm my engine I draw cellshaded models using a GLSL shader. I'm still a beginner in GLSL, so I don't know if my code could be better optimized. My engine has a fallback path, for when GL_ARB_vertex_shader extension is not avaliable. This consists of calculating the shading and textcoords for each vertex, in the CPU. Now the problem is, I created two small demos with the engine, one is using the vertexshader and the other is using the software shader. The demo was tested in various pcs, by our fans, and registered the fps of the two demos. The fallback path is always faster, by about 2 to 10 times! Only in very few cases the GLSL is faster, and even that time, only by 3-4 fps! Following is both GLSL vertex shader code, and Delphi code. Both produce exactly same output, but the Delphi code gets in average 250fps, and the GLSL only 20-30!
uniform vec3 light;
      void main(){
              vec3 shade;
              vec3 normal = normalize(gl_NormalMatrix * gl_Normal);
  	      shade.s = dot(normal,light);
              if (shade.s<0.0) shade.s=0.0;
              gl_TexCoord[0].s = shade.s;
              gl_TexCoord[1] = gl_TextureMatrix[0] * gl_MultiTexCoord0;
              gl_Position = ftransform();
              gl_FrontColor = gl_Color;
              vec4 ecPosition = gl_ModelViewMatrix * gl_Vertex;
              gl_FogFragCoord = abs(ecPosition.z);
             }


          If VertexShaderAvaliable Then
          Begin
            glUseProgram(_Program);
            With Renderer.Light Do
              glUniform3f(_ShLight,X,Y,Z);
            Exit;
          End;

          While VertexCount>0 Do
          Begin
            _Normal:=Source.Normal;
            _Vector:=VectorRotate(_Normal,ViewMatrix);
            _Shade:=VectorDot(_Vector, Renderer.Light);	

            If _Shade<0 Then _Shade:=0;

            TexCoord^:=Source.TexCoord;
            Dest.Position:=Source.Position;
            Dest.TexCoord.U:=1.0-_Shade;
            Dest.Normal:=Source.Normal;

            Inc(Source);
            Inc(Dest);
            Inc(TexCoord);
            Dec(VertexCount);
          End;



Extra question: Its wise to clear depth/stencil many times per frame? My engine use various stencil effects for shadows, mirrors and more, and normally I clear the stencil 3 or 4 times per frame. I also clear the depth buffer right before drawing the 2d GUI stuff, plus in each frame start.

Share this post


Link to post
Share on other sites
Now, I'm definately not a pro at this, but I would try replacing the line
"if (shade.s<0.0) shade.s=0.0;" in your shader with
"shade.s = max(shade.s, 0.0)"

If I remember correctly, the if instruction is pretty slow on older graphics cards, but I'm not 100% sure. It might be worth a shot...

Good luck!

Share this post


Link to post
Share on other sites
You can optimize the shader code a bit. Try:


uniform vec3 light;

void main(){
vec3 shade;
// normalization not required - gl_NormalMatrix should not contain any scaling
vec3 normal = gl_NormalMatrix * gl_Normal;
// avoid dynamic flow control
shade.s = max(dot(normal,light),0.0);
gl_TexCoord[0].s = shade.s;
gl_TexCoord[1] = gl_TextureMatrix[0] * gl_MultiTexCoord0;
gl_Position = ftransform();
gl_FrontColor = gl_Color;
vec4 ecPosition = gl_ModelViewMatrix * gl_Vertex;
gl_FogFragCoord = abs(ecPosition.z);
}



That should give some increase in speed, but if You have slow GFX card (like Radeon X300 or any of GeForace FX familiy) it will always be slow.

Share this post


Link to post
Share on other sites
Thank you both. I see, dynamic branching generates ineficient code in most video cards. I can't test it myself, my videocard fried and I'm now forced to use software mode, but already reccompiled the test demos and released them in our game forum.

If anyone want to test it, here's the link, and please tell how many fps you get in both versions.
Demo download

Share this post


Link to post
Share on other sites
Hmmm, so the software versions still runs faster? I don't understand how that can be, since I'm copying the vertices into a buffer and transforming every vertex per frame on the CPU, while the GLSL all the work is done in the GPU.

The only thing I can think of is that since I'm only using a GLSL vertex shader and not a fragment shader, letting the fixed function calculate all the pixel stuff, can this have any impact in the performance?

Share this post


Link to post
Share on other sites
Could be... I use GLSL all the time. Here is a screenshot from my engine. 3 lights (red, green, blue) + about 5000 verices + dynamic soft shadows + bump mapping at 17-20 FPS.

Free Image Hosting at www.ImageShack.us

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this