Shader performance

Started by
8 comments, last by SimmerD 19 years, 7 months ago
Hi. Right, im new to shaders and so I have made my own little vertex/fragment program using CG, running this in my engine has decreased performance tenfold. As standard, my engine without any lighting, shadows, or normal maps - just occlusion culling with standard texturing, on average the framerate = 150+fps. Now, with these shaders its around 20fps. I should note there is not a fantastic amount of geometry, maybe 100 triangles tops. Without clogging up the post with code, I'll explain whats happening in the shaders. Vertex program: calculate S,T and R texcoords for per pixel lighting. Generate binormal, rotate light vector using tbn matrix. Fragment program: 2x2D texture fetches and a 1D fetch, the 1D fetch is used for a lookup for point light attenuation. The original final color is lerped with the normalmaps dot'ed light/color. There is also a lerp of the point light color/1D lookup(for attenuation) As stated this is running _reall_slow_, and I'm new to shaders, so it could be I'm just asking too much. But then, doom3 runs faster than this, and it has much more geometry, + specular and everythign else its doing. So why is this so slow? If my explanation is crud I can paste code or something, thanks to anyone that takes the time to give me some beneficial input.
Advertisement
What video card do you have? And what driver?
fx5200, latest drivers.
Really, I get double this framerate in Doom3, so something not right. (My implementation probably)
I can confirm that fx5200's pixel shader performance is just _really_ bad!
Alexander Stockinger
Programmer
I can also confirm that the FX 5200 is really bad at pixel shading, but I have a program with all those things (using GLSL), getting ~15fps, but that's completely unoptimized, and I'm drawing ~200K triangles using regular vertex arrays. Maybe it's something else; how many state changes are you making? With only 100 tris, you couldn't be making many, but maybe that's the bottleneck...

--Buzzy
It doesn't matter that the 5200 is bad compared to modern cards. Compared to the first shader cards (GF3 for instance) it's damn quick. And the GF3 is able to use shaders. You should be able to draw 10s of thousands (if not hundreds of thousands) of polys on screen with a decent lighting model. Just maybe not a really fancy shawowed scene.
Ok, without shadows I gain an extra 10fps, but still I dont think thats enough. I mean, I'm pretty sure it should be faster than this, I dont even have a specular term.
I'll explain what im doing a little more.
Firstly the visible parts of the scene are found, next the ~previous frame is draw. This entails doing an iteration of lights that effect the scene, laying down the z, then running geometry that the light effects through the shader. This is done for each light (of which there is just two). Note, the scene is texture sorted.
After the lights have been proccessed the final texture is blended with the lighted scene. frame End.
Realised earlier I wasnt using the latest version of CG, so I fixed that, no gain though.
Also, taking away the normal mapping, and leaving just a pixel-lit and textured scene = 60+fps.
So, is this the way im going about it?, is there an allternative way of doing this that I should look into?
How are you doing the normal mapping, if you have an intensive pixel shader doing dependent texture reads that might be severely hurting your performance - I seem to remember the FX cards, while very capable in their specifications, have some fairly invisible (unless you know them) restrictions to maintain decent performance. This includes stuff like DTRs and also full precision shaders - try using halfs to increase performance, and also less temporary registers. That's all I can remember off the top of my head.

-Mezz
Normalization cubemap instead of mathematical vector normalization is also important on the 5200..
Floating point performance on the 5200 is not that fast ( only about 1/2 pixel per clock on the shader you describe sounds about right ).

If using directx, stick with ps.1.1 ( or 1.4, which can be quite fast on the 5200 for non-obvious architectural reasons ).

In opengl, use the gf3 or gf4-level texture shaders extensions.

This is what doom3 does, sticking with 8-bit math for the most part, so that helps the framerate a ton.

For instance, you can do ~4 instructions per clock in 8-bit, and only ~.5 floating point instruction per clock in 32-bit or 16-bit float.

This topic is closed to new replies.

Advertisement