Why is emulating fixed functionality using shaders so slow?

Graphics and GPU Programming Programming OpenGL

Started by Achilleos June 06, 2007 09:21 AM

7 comments, last by Achilleos 16 years, 10 months ago

122

Author

June 06, 2007 09:21 AM

Hi, i am trying to implement Microsofts PixelMotionBlur using OpenGL. In order to using only one render pass i need to draw to two frame buffer attachments: 1. the original image, 2. the vertex velocity information. For the original image i normaly want to use the opengl fixed functions, but to do it in one pass, the vertex shader has to calculate everything the fixed function would do: lighting, texture appliance, etc. I only implemented the fixed function lighting but it seems already at this point, that the shader supported rendering is much slower than the usal opengl pipeline. But why? If it stays this way, i will render in two passes, because of a better performance. But i though it would be vise versa. Could my someone explain why the same calculations made in a shader instead of the fixed function pipeline is so much slower? Greetz, Achilleos.

jpetrie

13,220

June 06, 2007 09:33 AM

I'm not convinced the slowdown you're observing is actually a result of the shader, but rather some other aspect of your rendering method.

It could depend on your hardware as well. Also remember that shaders are not about speed, they are about control. Allegedly many cards don't even have fixed function paths anymore, they simply execute the programmable path with a specific program. Custom shaders won't make your code faster, unless perhaps if you write uselessly trivial shaders, which does not appear to be your problem here anyways.

In any case, can you describe your overall rendering techniques in more detail?

Achilleos

122

Author

June 06, 2007 10:07 AM

i am using a nvidia 8800 gtx, therefore i thought shader should reach the same speed than fixed function.

Below are to images, the first is the is created using only a post production shader which uses 2 fbo attachted images to generate the final image (the velocity buffers are not implemented so far, so you see only an offset of the rendering).

The second image uses a shader pair in addition for the object rendering (the post production shader renders to a window filling quad after that). The source is as follows:

#version 110void ApplyLighting(in vec3 ecPosition3,                   in vec3 normal,                   in bool LocalViewer,                   in int NumEnabledLights,                   in bool SeperateSpecular,                   out vec4 color);void main(void){  // calculate position  gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;    // calculate eye coordinate position  vec4 ecPosition = gl_ModelViewMatrix * gl_Vertex;  vec3 ecPosition3 = vec3(ecPosition) / ecPosition.w;    // calculate normal  vec3 normal = gl_NormalMatrix * gl_Normal;    // apply the lighting (fixed function)  vec4 color;  ApplyLighting(ecPosition3, normal, true, 2, false, color);    // write back the color information  gl_FrontColor = color;}

// removed images from server

If you pay attention to the fps displayed at the bottom line, you can figure out the performance loss. The fps are restricted to a maximum of 60 (vsync). And i have to defend myself: There is a cube mapped ball lying in the middle. The cube map is dynamic rendered and currently refreshed every frame (6 additional renderings of the scene).
So the application is not slow, but i dont want a performance decrease at this early state of the advanced motion blur i am up to implement.
(there is a complex physics engine and a robot control system running in the background...)

ps: why on earth is the gamedev.net website so slow?

[Edited by - Achilleos on June 6, 2007 1:07:13 PM]

jpetrie

13,220

June 06, 2007 10:23 AM

So the original runs at 60FPS (capped by vsync) and the "slower" one at 48FPS? 60FPS means you take 0.0166 seconds per frame. 48FPS means you take 0.0208 seconds per frame. That's a difference of 0.0042 seconds. Four milliseconds. That's not nearly as pathological a slowdown as your original post led me to believe.

In any case, try using ftransform() and spitting out a simple flat color from the PS. Try not binding any shader parameters you're binding -- are you doing that every frame, are you reloading the shader every frame? Et cetera. There's nothing necessarily broken with the shader you posted as far as I can tell, except the bools make me suspicious; they might be the first place I'd look, as conditionals are not necessarily the nicest things to have in shaders.

The idea is to determine where this four ms frame speed decrease is coming from. It will hard because its so small. By making the shader trivial, you might be able to tell if its the shader code itself (particularly those conditionals) or the CPU-side setup for the shader.

It's tricky to profile GPUs.

Ingrater

187

June 06, 2007 01:00 PM

Or maybe your performance Problems are elsewhere.
I would like to know what you are doing to get a gf8 down to 50 FPS with such a tiny scene? I'm rendering a huge terrain atm that has lots of polys does soft lod (no sudden changes in lod levels visible) and uses a fragment shader that mixes 3 textures out of a 4th, still at 1000 FPS.

http://3d.benjamin-thaut.de

jpetrie

13,220

June 06, 2007 01:03 PM

Quote:
I would like to know what you are doing to get a gf8 down to 50 FPS with such a tiny scene?

He's not doing anything, he's just waiting for the vsync:

Quote:
The fps are restricted to a maximum of 60 (vsync).

It's not a performance issue.

FReY

424

June 06, 2007 01:28 PM

A few comments:

1] Disable vsync when doing any profiling

2] the camera position in the scenes don't look identical, which makes me think that you moved the camera manually to get there. For more accurate comparisons you should have completely deterministic scenes, relying on no user input to setup the scene.

3] Are you averaging your framerate since the beginning of the app or keeping a running average?

4] The 2nd screenshot appears to have point sampling enabled judging by the highly pixelated screenshot whilst no such artefacts are visible in the 1st screenshot. What is the reason for this difference?

5] Try break down your scene into multiple parts. Profile each part independently - this will let you isolate the differences more easily.

6] your vshader should probably be using glPosition = ftransform()

7] ApplyLighting is not using any texture lookups is it?

8] Are your shaders performing dynamic branching in the shader? are you looping over your lights?

9] What does your pixel shader look like?

do unto others... and then run like hell.

Achilleos

122

Author

June 06, 2007 01:39 PM

I found the problem. I am currently implementing a collection of motion blur algorithms. The horrorable fps came from an unreseted value. For the accumulation buffer motion blur method i increased my renderPasses variable to 12 and never decreased it after choosing another method from the menu.

Now i am running rendering a frame in less than a ms.

The scene is that slow because there are multiple render contexts which render to different targets. The images are passed through some imaging routines for segmentation and classification. From this information simulated actuators are altered and the physics engine makes another tick.

The fps of the window i am testing in, is set invalid on change or on physics update.

Here's another image.

some scene

The application is really huge. And customizable to the tips. Thats makes it hard for me to optimize things. I working for the project since january this year. And i found a graphics engine writte in 1998 with little tweaks. There were mistakes in the source like creating a display list per triangle...

Greetz.

Achilleos

122

Author

June 06, 2007 01:58 PM

Quote:Original post by FReY
A few comments:

1] Disable vsync when doing any profiling

I currently improved the frames per second counter. So profiling should be fine now.

Quote:Original post by FReY
2] the camera position in the scenes don't look identical, which makes me think that you moved the camera manually to get there. For more accurate comparisons you should have completely deterministic scenes, relying on no user input to setup the scene.

I used a presentation camera mode which forces the display to circle the camera aroung the selected object and redraw the scene. But your are right a simple mechanism to constantly force updates should be implemented.

Quote:Original post by FReY
3] Are you averaging your framerate since the beginning of the app or keeping a running average?

changed the fps counter, as mentioned at point 1.

Quote:Original post by FReY
4] The 2nd screenshot appears to have point sampling enabled judging by the highly pixelated screenshot whilst no such artefacts are visible in the 1st screenshot. What is the reason for this difference?

I have antialiasing disabled. Driver forced multisample works with fixed funciton but not with post production shader which use a frame buffer object as input. I haven't read anywhere that its possible... Beside that, i am agreeing your sharp eye. But i don't know where mistake is. I am rendering into a fbo and use the image as input into a fragment shader when rendering a fullscreen quad. Mipmapping settings don't seem to affect the quality. The fbo is the same pixel size (width and height) as the window, so nearest filtering should be okay, but i use the linear filter method. otherwise it looks worse...

Quote:Original post by FReY
7] ApplyLighting is not using any texture lookups is it?

8] Are your shaders performing dynamic branching in the shader? are you looping over your lights?

Yes, its simply a function which runs through the lights and adds the effects (point, directional and spot light) as discribed in OpenGL Shading Language Second Edition by Randi Rost.

Quote:Original post by FReY
9] What does your pixel shader look like?

:) It simple does nothing. Writes the the the color to gl_FragData[0] and vec4(0.0, 0.0, 1.0, 1.0) tp gl_FragData[1]. I am currently implementing... and wondered about the performance decreasing this early implementation step.
But now everything seems great!

Why is emulating fixed functionality using shaders so slow?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Why is emulating fixed functionality using shaders so slow?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines