Sign in to follow this  
X-0ut

Shader performance

Recommended Posts

Hi. Right, im new to shaders and so I have made my own little vertex/fragment program using CG, running this in my engine has decreased performance tenfold. As standard, my engine without any lighting, shadows, or normal maps - just occlusion culling with standard texturing, on average the framerate = 150+fps. Now, with these shaders its around 20fps. I should note there is not a fantastic amount of geometry, maybe 100 triangles tops. Without clogging up the post with code, I'll explain whats happening in the shaders. Vertex program: calculate S,T and R texcoords for per pixel lighting. Generate binormal, rotate light vector using tbn matrix. Fragment program: 2x2D texture fetches and a 1D fetch, the 1D fetch is used for a lookup for point light attenuation. The original final color is lerped with the normalmaps dot'ed light/color. There is also a lerp of the point light color/1D lookup(for attenuation) As stated this is running _reall_slow_, and I'm new to shaders, so it could be I'm just asking too much. But then, doom3 runs faster than this, and it has much more geometry, + specular and everythign else its doing. So why is this so slow? If my explanation is crud I can paste code or something, thanks to anyone that takes the time to give me some beneficial input.

Share this post


Link to post
Share on other sites
I can also confirm that the FX 5200 is really bad at pixel shading, but I have a program with all those things (using GLSL), getting ~15fps, but that's completely unoptimized, and I'm drawing ~200K triangles using regular vertex arrays. Maybe it's something else; how many state changes are you making? With only 100 tris, you couldn't be making many, but maybe that's the bottleneck...

--Buzzy

Share this post


Link to post
Share on other sites
It doesn't matter that the 5200 is bad compared to modern cards. Compared to the first shader cards (GF3 for instance) it's damn quick. And the GF3 is able to use shaders. You should be able to draw 10s of thousands (if not hundreds of thousands) of polys on screen with a decent lighting model. Just maybe not a really fancy shawowed scene.

Share this post


Link to post
Share on other sites
Ok, without shadows I gain an extra 10fps, but still I dont think thats enough. I mean, I'm pretty sure it should be faster than this, I dont even have a specular term.
I'll explain what im doing a little more.
Firstly the visible parts of the scene are found, next the ~previous frame is draw. This entails doing an iteration of lights that effect the scene, laying down the z, then running geometry that the light effects through the shader. This is done for each light (of which there is just two). Note, the scene is texture sorted.
After the lights have been proccessed the final texture is blended with the lighted scene. frame End.
Realised earlier I wasnt using the latest version of CG, so I fixed that, no gain though.
Also, taking away the normal mapping, and leaving just a pixel-lit and textured scene = 60+fps.
So, is this the way im going about it?, is there an allternative way of doing this that I should look into?

Share this post


Link to post
Share on other sites
How are you doing the normal mapping, if you have an intensive pixel shader doing dependent texture reads that might be severely hurting your performance - I seem to remember the FX cards, while very capable in their specifications, have some fairly invisible (unless you know them) restrictions to maintain decent performance. This includes stuff like DTRs and also full precision shaders - try using halfs to increase performance, and also less temporary registers. That's all I can remember off the top of my head.

-Mezz

Share this post


Link to post
Share on other sites
Floating point performance on the 5200 is not that fast ( only about 1/2 pixel per clock on the shader you describe sounds about right ).

If using directx, stick with ps.1.1 ( or 1.4, which can be quite fast on the 5200 for non-obvious architectural reasons ).

In opengl, use the gf3 or gf4-level texture shaders extensions.

This is what doom3 does, sticking with 8-bit math for the most part, so that helps the framerate a ton.

For instance, you can do ~4 instructions per clock in 8-bit, and only ~.5 floating point instruction per clock in 32-bit or 16-bit float.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this