Deferred shading nad MRTs optimization

Started by
5 comments, last by carew 15 years, 9 months ago
Hi, I have question about Deferred Shading permormance. Eg. In STALKER in benchmark tests we can see than on GF 7600GT in 1280x768 and mid details FPS is about 25, but I try build mini demo (120 000 poly per frame), and after render to 4 textures (without hi precision of it, because I put all infos to R8G8B8A8 data) by MRTs my FPS is about 30, but how if I write data to eg. R8G8B8A8 textures 1280x1024 (npot) or (2048x2048 if cards don't have npot support we have this resolution for 1280x1024) FPS will less than 20, but where shadows, particles etc? What can I optimization this deferred shading or optimization RTT (by MRTs or without it) process, because this is very slow on my test? I think than I can't optimization it, because this is standard DirectX function, so how in STALKER we have very good speed in G-Buffer stage on identical function for RTT by MRTs? Other titles with good performance in G-Buffer stage is Unreal 3...
Advertisement
Using forward rendering or deferred shading you should get much more than 20-30 fps for 120k triangles on a Gf7600. I think you need to take a step back and look at how you send your triangles to the GPU and how you render them.
Before you can even think about optimizations you need to know what has to be optimized. In other words, you need to find your bottleneck. Like Promethium I have a strong feeling that your performance issues have little to do with your GPU, and are instead the result of your CPU being bottlenecked by excessive device calls. You mentioned DirectX, which means your first step should be to run your app in PIX. Have it record a sequence of frames for you, and look at the graph that's generated: it should be clear whether or not you're CPU-bound or GPU-bound. From there you can work your way down to specific issues. IHV-specific tools liek NVPerfHUD can also be extremely useful for this kind of work.

Also...Unreal Engine 3 doesn't use fully deferred rendering. It uses shadow maps and a depth buffer to create a shadow in a deferred pass. This doesn't require the creation of a full G-Buffer, just a depth-only pass.
Quote:Original post by Promethium
Using forward rendering or deferred shading you should get much more than 20-30 fps for 120k triangles on a Gf7600. I think you need to take a step back and look at how you send your triangles to the GPU and how you render them.


Yes with forward rendering I have much more than 30 FPS, I use hardware buffers for vertex;) I have drastic fps down when I use RTT but I think than it's normal? This is my RTT performance in demo with only 1300 (In this demo I don't use Hardware Buffers for vertex so FPS isn't big, but I need only compare performance to RTT) poly:
* Without RTT: 540
* With RTT 1024x1024 R8G8B8A8: 460
* With RTT 1024x1024 Floating R16G16B16A16: 420
* With RTT 1024x1024 Floating R32G32B32A32: 260
Is it normal performance compare to FPS in Forward Rendering?

MJP thanks for help, now I test my application with gpu performance tool and I try optimize my app:) I'm downloading NVIDIA PerfHUD for my GeForce 7600 GT...
I found that SetRenderTarget works faster than RenderToTexture (in my program, can't say about others) so that might be worth a try if you think thats the problem. I've also had huge performance problems on my 6800 with using floating point textures (32 bit per channel, not the 16 bit per channel - thats ok). I don't know if that might be similar with 7600.
I believe the ROP's in NV40 and G70 (mainly the same architecture) worked at 1/2 speed for FP16 and 1/4 speed for FP32. Obviously that's going to hurt for your G-Buffer pass (which you want to be as quick as possible), especially when you add in the added bandwidth requirements. In these cases an early-z pass might help to save fillrate and bandwidth (NV40 and G70 do 2x z-writes, IIRC).

Also I remember reading that NV40 worked best for 3 render-targets and under. I don't know if that also applies to G70, but if I had to guess I'd say it would. So cutting things down to 3 RT's might be an optimization to pursue.
Thanks for help, but I needn't optimization only for GF7600 or other one model, I need universal optimization techniques also for Radeon cards. I know than optimization for one model card is faster but isn't better than universal optimization. I try with Early Z... BTW. I test it also on Radeon 2600 and I have very similar results...

This topic is closed to new replies.

Advertisement