Sign in to follow this  
Gluc0se

GLSL Performance Question

Recommended Posts

I have a question about the performance of some commands in the GLSL fragment programs. Essentially, if I add an "if" statement to the fragment program I get a drop from about 128 FPS to 108 FPS. So my question is, should an if statement create such a large drop in FPS? The hardware I am using for this particular test is a Radeon9800 Pro, and it was run at full_screen on a 1600x1200 display. Here is the sample code from my fragment program that I used to see the difference This one does 108 FPS
 
//The images from the two seperate RenderChannels
uniform sampler2D leftside;
uniform sampler2D rightside;

//Passed in uniforms to figure out which render channel to sample from
uniform vec4 user_pos;
uniform vec3 plane_vec;

//The pixel positions in each of the render channel's eye space, and the world space
varying vec4 left_vert_position;
varying vec4 right_vert_position;

//World space pixel position
varying vec4 vert_position;

void main()
{
	vec4 out_color;
	vec3 pos_vec = vert_position.xyz - user_pos.xyz;
	
	//Channel Testing	
	float plane_check = dot(pos_vec, plane_vec);
	if (plane_check <= 0.0) //Sample from the Left Render Channel
	{	
		out_color  = texture2DProj(leftside, left_vert_position);
	}
	else		//Sample from the Right Render Channel
	{
		out_color  = texture2DProj(rightside, right_vert_position);
	}
	
	gl_FragColor = out_color;	
}

This does 128 FPS - doesn't make much sense as a fragment program. But I had to make sure all the same multiplies etc were being performed as the last one
//The images from the two seperate RenderChannels
uniform sampler2D leftside;
uniform sampler2D rightside;

//Passed in uniforms to figure out which render channel to sample from
uniform vec4 user_pos;
uniform vec3 plane_vec;

//The pixel positions in each of the render channel's eye space, and the world space
varying vec4 left_vert_position;
varying vec4 right_vert_position;

//World space pixel position
varying vec4 vert_position;

void main()
{
	vec4 out_color;
	vec3 pos_vec = vert_position.xyz - user_pos.xyz;
	
	//Channel Testing	
	float plane_check = dot(pos_vec, plane_vec);
	
	out_color = texture2DProj(leftside, left_vert_position) * texture2DProj(rightside, right_vert_position) * plane_check;
	
	gl_FragColor = out_color;	
}

After typing this all up and making sure the fragment programs were performing essentially the same operations, (minus the if statement) I guess I'm not as surprised to see the frame hit with a single if statment. The point of this whole post was to try to discover a way to speed up my fragment program. I recently switched to FrameBufferObjects from PBuffers, and VBO'd some of my geometry. In the end I gained no noticeable increase in the FPS. So I'm assuming I am fill rate limited at this point. The current frag program I use is the 108fps one. I am also doing a sanity check here in the forums to see if a Radeon9800 may be limited in some of its functionality, thus causing some of the loss in performance. Thanks for any tips or help :)

Share this post


Link to post
Share on other sites
AFAIK, branches are just about the worst thing you can do for GLSL programs. GPUs don't really have any kind of fancy branch prediction like CPUs, and all of the pipelines are moving together on the same instruction each clock anyway. Having said that, your FPS doesn't sound terrible - but you'll get more accurate results measuring frame time rather than FPS ('cos it's non-linear).

Easiest way to check if you're fill limited is to change the resolution and see if you get a corresponding change in FPS.

Share this post


Link to post
Share on other sites
I did a check, on a 512x512 window, just displaying a skybox, im at ~185fps. With a 1600x1200 I get about 108fps. So thats about 7x more pixels, and about a slowdown of about 1.7x in FPS (if thats a viable metric). So I guess its 'partially' fill limited.

So let me throw in a little background of what this program is. I'm using this to project on a large curved screen for stereoscopic viewing. The screen itself doesn't fit to any nice mathematical models, so I represent it as a mesh (that we captured with some accurate scanning).

In order to render a scene onto the screen immersively, I have to use a multi-pass system. First I render the scene from two camera angles. Conceptually they are two faces of a 'cube map' - the left and the right. I align them so their joining 'edge' is centered on your current point of view. So 45 degrees to your left is one frustum, and 45 degrees to your right is the other. The reasoning for two views is the concave curvature of the screen; you can potentially be inside the curve, getting a wide horizontal field of view. These two cube-map views are then rendered to a framebuffer object and a third and final pass projectively textures these on to the screen mesh. The fragment program is used to decided which 'side' to texture map from - left or right.

Right now, I am getting very low performance. Simple scenes cause a pretty poor FPS (like around 25 fps for simple scene of objects). I realize most of this is from the fact that it's a 3-pass system, but I still feel it should be performing faster. Maybe I'm expecting too much out of it.



Share this post


Link to post
Share on other sites
The implementation of if() in the pixel shader on a Radeon is likely to actually execute both branches, and then do a conditional set.

On the latest NVIDIA hardware, you CAN branch in a pixel shader, but if different pixels within your scene have the if() go different ways, coherence in pixel shader execution will suffer, and you may have a significant performance cost.

If each branch of the if() is small, you're better off coding the conditional assign explicitly; ie by calculating something that's 0 or 1, and then assigning output = input_a * var + input_b * (1-var) (where "var" is calculated as 0 or 1 depending on condition).

Share this post


Link to post
Share on other sites
Ya I figured it was doing both sides of the conditional thats why on the second fragment program, I had it combine both texture lookups to determine the color. Does that mean the conditional set is causing this additional overhead?

I will be upgrading to Quadro4500 cards. Which are the Geforce7800 generation. I'd imagine the branching would then speed up - not to mention the large increase in pixel pipelines.

Share this post


Link to post
Share on other sites
Just an idea: what if you put both left and right render "channels" into the same texture, let's say one next to each other? For example, instead of having 2 seperate 512x512 textures, you put both channels into a big 1024x512 texture. Then the code would be:


...

if (plane_check <= 0.0) sample_coord=left_vert_position;//Sample from the Left Render Channel
else sample_coord=right_vert_position+vec4(0.5,0,0,0);//Sample from the Right Render Channer(right half of the same texture).

out_color = texture2DProj(bothsides, sample_coord);

...



Something like that anyway(you'll need to adjust the texture coordinates, since each channel occupies half the texture in the x-axis). That way, you do only one texture lookup.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this