Jump to content
  • Advertisement
Sign in to follow this  
carew

Deferred shading nad MRTs optimization

This topic is 3738 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I have question about Deferred Shading permormance. Eg. In STALKER in benchmark tests we can see than on GF 7600GT in 1280x768 and mid details FPS is about 25, but I try build mini demo (120 000 poly per frame), and after render to 4 textures (without hi precision of it, because I put all infos to R8G8B8A8 data) by MRTs my FPS is about 30, but how if I write data to eg. R8G8B8A8 textures 1280x1024 (npot) or (2048x2048 if cards don't have npot support we have this resolution for 1280x1024) FPS will less than 20, but where shadows, particles etc? What can I optimization this deferred shading or optimization RTT (by MRTs or without it) process, because this is very slow on my test? I think than I can't optimization it, because this is standard DirectX function, so how in STALKER we have very good speed in G-Buffer stage on identical function for RTT by MRTs? Other titles with good performance in G-Buffer stage is Unreal 3...

Share this post


Link to post
Share on other sites
Advertisement
Using forward rendering or deferred shading you should get much more than 20-30 fps for 120k triangles on a Gf7600. I think you need to take a step back and look at how you send your triangles to the GPU and how you render them.

Share this post


Link to post
Share on other sites
Before you can even think about optimizations you need to know what has to be optimized. In other words, you need to find your bottleneck. Like Promethium I have a strong feeling that your performance issues have little to do with your GPU, and are instead the result of your CPU being bottlenecked by excessive device calls. You mentioned DirectX, which means your first step should be to run your app in PIX. Have it record a sequence of frames for you, and look at the graph that's generated: it should be clear whether or not you're CPU-bound or GPU-bound. From there you can work your way down to specific issues. IHV-specific tools liek NVPerfHUD can also be extremely useful for this kind of work.

Also...Unreal Engine 3 doesn't use fully deferred rendering. It uses shadow maps and a depth buffer to create a shadow in a deferred pass. This doesn't require the creation of a full G-Buffer, just a depth-only pass.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promethium
Using forward rendering or deferred shading you should get much more than 20-30 fps for 120k triangles on a Gf7600. I think you need to take a step back and look at how you send your triangles to the GPU and how you render them.


Yes with forward rendering I have much more than 30 FPS, I use hardware buffers for vertex;) I have drastic fps down when I use RTT but I think than it's normal? This is my RTT performance in demo with only 1300 (In this demo I don't use Hardware Buffers for vertex so FPS isn't big, but I need only compare performance to RTT) poly:
* Without RTT: 540
* With RTT 1024x1024 R8G8B8A8: 460
* With RTT 1024x1024 Floating R16G16B16A16: 420
* With RTT 1024x1024 Floating R32G32B32A32: 260
Is it normal performance compare to FPS in Forward Rendering?

MJP thanks for help, now I test my application with gpu performance tool and I try optimize my app:) I'm downloading NVIDIA PerfHUD for my GeForce 7600 GT...

Share this post


Link to post
Share on other sites
I found that SetRenderTarget works faster than RenderToTexture (in my program, can't say about others) so that might be worth a try if you think thats the problem. I've also had huge performance problems on my 6800 with using floating point textures (32 bit per channel, not the 16 bit per channel - thats ok). I don't know if that might be similar with 7600.

Share this post


Link to post
Share on other sites
I believe the ROP's in NV40 and G70 (mainly the same architecture) worked at 1/2 speed for FP16 and 1/4 speed for FP32. Obviously that's going to hurt for your G-Buffer pass (which you want to be as quick as possible), especially when you add in the added bandwidth requirements. In these cases an early-z pass might help to save fillrate and bandwidth (NV40 and G70 do 2x z-writes, IIRC).

Also I remember reading that NV40 worked best for 3 render-targets and under. I don't know if that also applies to G70, but if I had to guess I'd say it would. So cutting things down to 3 RT's might be an optimization to pursue.

Share this post


Link to post
Share on other sites
Thanks for help, but I needn't optimization only for GF7600 or other one model, I need universal optimization techniques also for Radeon cards. I know than optimization for one model card is faster but isn't better than universal optimization. I try with Early Z... BTW. I test it also on Radeon 2600 and I have very similar results...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!