I'm my engine I draw cellshaded models using a GLSL shader. I'm still a beginner in GLSL, so I don't know if my code could be better optimized.
My engine has a fallback path, for when GL_ARB_vertex_shader extension is not avaliable. This consists of calculating the shading and textcoords for each vertex, in the CPU.
Now the problem is, I created two small demos with the engine, one is using the vertexshader and the other is using the software shader. The demo was tested in various pcs, by our fans, and registered the fps of the two demos.
The fallback path is always faster, by about 2 to 10 times!
Only in very few cases the GLSL is faster, and even that time, only by 3-4 fps!
Following is both GLSL vertex shader code, and Delphi code.
Both produce exactly same output, but the Delphi code gets in average 250fps, and the GLSL only 20-30!
uniform vec3 light;
void main(){
vec3 shade;
vec3 normal = normalize(gl_NormalMatrix * gl_Normal);
shade.s = dot(normal,light);
if (shade.s<0.0) shade.s=0.0;
gl_TexCoord[0].s = shade.s;
gl_TexCoord[1] = gl_TextureMatrix[0] * gl_MultiTexCoord0;
gl_Position = ftransform();
gl_FrontColor = gl_Color;
vec4 ecPosition = gl_ModelViewMatrix * gl_Vertex;
gl_FogFragCoord = abs(ecPosition.z);
}
If VertexShaderAvaliable Then
Begin
glUseProgram(_Program);
With Renderer.Light Do
glUniform3f(_ShLight,X,Y,Z);
Exit;
End;
While VertexCount>0 Do
Begin
_Normal:=Source.Normal;
_Vector:=VectorRotate(_Normal,ViewMatrix);
_Shade:=VectorDot(_Vector, Renderer.Light);
If _Shade<0 Then _Shade:=0;
TexCoord^:=Source.TexCoord;
Dest.Position:=Source.Position;
Dest.TexCoord.U:=1.0-_Shade;
Dest.Normal:=Source.Normal;
Inc(Source);
Inc(Dest);
Inc(TexCoord);
Dec(VertexCount);
End;
Extra question: Its wise to clear depth/stencil many times per frame? My engine use various stencil effects for shadows, mirrors and more, and normally I clear the stencil 3 or 4 times per frame. I also clear the depth buffer right before drawing the 2d GUI stuff, plus in each frame start.