Optimized Render Pipeline & GLSL

Started by
14 comments, last by rick_appleton 18 years, 4 months ago
Hello - So I'm having a bit of difficulty with a program I'm working on. More accurately, the application drops by about 50fps when my shaders are enabled. So here's what I know... The slow down is caused by the fragment shader. I know this because: 1) My vertex shader is very short 2) The framerate reduces as resolution increases 3) Turning off shaders all together increases performance dramatically NOTE: My fragment shader isn't very complex. The most complex operation is 1 length() call. So, the optimizations I've done so far... 1) All information about the shader (Uniform locations / compiling / binding) is obtained at initialization time. 2) The shader program is only installed once - when it needs to be - and then it's uninstalled and replaced with the standard openGL pipeline for the remainder of the frame. 3) I've also attempted rendering in 2 passes like this:

/**** START OF RENDER ****/

Render(NON-SHADER OBJECTS);

glDepthFunc(GL_LEQUAL);
glDisable(GL_TEXTURE_2D);
glDisable(GL_BLEND);

Render(ALL-SHADER OBJECTS);

glDepthFunc(GL_EQUAL);
glEnable(GL_TEXTURE_2D);
glEnable(GL_BLEND);

InstallShaderProgram();
Render(ALL-SHADER OBJECTS);
UninstallShaderProgram();

glDepthFunc(GL_LEQUAL);

/**** END OF RENDER ****/


All of my attempted optimizations don't have much impact on the framerate. The entire scene is roughly 4000 triangles, which isn't very many; which is also why I'm surprised to be seeing framerates of about 14fps only when my pixel shaders are enabled. So, for those more experienced with GLSL: Are there any good places to learn about optimizing shaders? Also, do you have any input about my current situation? Thanks for all your help --Andrew
Advertisement
try NVShaderPerf, and see how many cycles your shader compiles to. >30 is bad on anything less than NV4X
It looks like my shader takes 14 cycles. After performing the test, I went to the extreme of making my shader as simple as possible.

The vertex shader merely computes the vertex position. The fragment shader reads the texture value and writes it to the frag color.

With this simple setup I see just as low frame rates as I did with the complex shader. In fact, there is almost no difference. The only weird thing I've encountered throughout my debugging is that my vertex shader appears to run in software (according to the GLSL link phase).

Could this be the cause of the bottleneck? If so, is there anything I can do about it?

Thanks,
--Andrew
Probably try installing the latest drivers for your video card
Quote:Original post by Optus
Probably try installing the latest drivers for your video card


Just got done trying that. I've downloaded the latest omega drivers for my Radeon. I'm beginning to think that it could just be my system.

I am developing on a Turion 64 / ATI Xpress 200M laptop. The Xpress200M is full of features but it could be a tad slow. The graphics card supports Pixel shader 2.0, however, it emulates the vertex shader in software.

My only grief is: the application I'm running isn't all that complex. Shouldn't the video card beable to at least hand 4K triangles or so with a simple fragment/vertex shader?
Quote:The entire scene is roughly 4000 triangles, which isn't very many; which is also why I'm surprised to be seeing framerates of about 14fps only when my pixel shaders are enabled.

the number of tris is not important for fragshaders as the number of fragmentss rendered will be the same (if depthtest == on ) no matter how many tris there are.

post your vertex + fragment shader code, theres prolly a few things u can do
Quote:Original post by zedzeek
Quote:The entire scene is roughly 4000 triangles, which isn't very many; which is also why I'm surprised to be seeing framerates of about 14fps only when my pixel shaders are enabled.

the number of tris is not important for fragshaders as the number of fragmentss rendered will be the same (if depthtest == on ) no matter how many tris there are.

post your vertex + fragment shader code, theres prolly a few things u can do


That's a good point. I went down to the minimum of fragment/shader code which boiled down to:

Vertex:
void main (void){    gl_Position  = ftransform();    gl_TexCoord0 = gl_MultiTexCoord0;}


Fragment:
uniform sample2d texture;void main (void){    vec4 color;    color        = texture2D(texture, gl_TexCoord[0].st);    gl_FragColor = color;}


And this ran just as slowly.

There are some other interesting notes, however. Earlier in development, I noticed that if I increase the complexity of the vertex shader (by adding a few varying variables and multiplying by a few matrices) that the decrease in framerate was huge. The complexity increase didn't even need to be by that much; say, 3 varying vec3's and 3 matrix multiplies.

I understand that a vertex shader running in software will be slower than in hardware, but, is a slow down as extreme as this expected? Is there anyway it can be optimized?

Also, here are some statitics I've gathered:

No pixel shaders-->
4000K triangles @ 178fps
With pixel shaders-->
4000K triangles @ 17fps
2000K triangles @ 45fps

Thanks for your help,
--Andrew
Are you doing your drawing in immediate mode? Or are you using display lists?

From the likes of this, it sounds like the bottleneck is happening in the CPU somewhere (though posted shader code might change that assumption). I ask this because I discovered first hand just how slow I could make a really fast video card (Quadro 4500) run when I was still using old code that drew each triangle without display listing.

-Edit

Also after reading above, if its emulating any part of the vertex or fragment program process, that will take a serious hit on performance.
Yeah the fact that the vertex shader is emulated in software might be a big hit.

Instead of drawing, 4000 tris, draw one quad that fills up the whole screen. If the performance improves, that means that the vertex shader was the bottleneck and not the pixelshader.
Quote:Original post by Gluc0se
Are you doing your drawing in immediate mode? Or are you using display lists?

From the likes of this, it sounds like the bottleneck is happening in the CPU somewhere (though posted shader code might change that assumption). I ask this because I discovered first hand just how slow I could make a really fast video card (Quadro 4500) run when I was still using old code that drew each triangle without display listing.

-Edit

Also after reading above, if its emulating any part of the vertex or fragment program process, that will take a serious hit on performance.


That is one thing I'm not doing. However, I'm not using display lists because none of the objects in my program are static -- so I was under the impression that display lists would be pretty useless to me.


Quote:
Yeah the fact that the vertex shader is emulated in software might be a big hit.

Instead of drawing, 4000 tris, draw one quad that fills up the whole screen. If the performance improves, that means that the vertex shader was the bottleneck and not the pixelshader.


I gave that a shot and the fps picks up quite a bit. So it sounds like it is the # of vertices being used. So in order to speed things up it seems like I'll have to:

1) Remove surfaces that aren't being seen

2) Optimize the vertex shader (I don't think I can get any simpler though)

3) Try to move things into display lists (how much of a speed increase should I expect?)

4) Get a new video card

Did I miss any options? :-)

Thanks for everyones help, your input has been quite helpful,
--Andrew

This topic is closed to new replies.

Advertisement