Sign in to follow this  

Performance when using multiple render targets

This topic is 4206 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there again! I have a problem regarding multiple render targets, especially the performance I get when using them. I have a scene that I used to render at 530 fps using a simple N*L lighting shader and a single render target. Now I gave multiple render targets a try, and setup 4 render targets. The shader that fills these 4 RTs is even more simple than N*L lighting, it justs fills them with the info already computed by the vertex shader and does a texture read, so even less than the per pixel N*L I did before using MRT. But of course it fills 4 render targets. Now I get just 130fps, which is less than a fourth of the 530fps before. The render targets are created using D3DUSAGE_RENDERTARGET and in D3DPOOL_DEFAULT. Render target 1,2 and 3 are D3DFMT_X8R8G8B8, render target 4 is D3DFMT_R32F (contains depth). Is it normal to get such a performance drop? If it is, why should I even bother using MRTs? I could just render the whole scene 4 times and should be at about the same time. If it is not (which I surely hope) what should I consider when using MRTs to increase performance?

Share this post


Link to post
Share on other sites

Well you cannot produce same results with one RT as with four RTs.

Using multiple RTs will use more fillrate. Roughly, with 4 RTs you'll have only 1/4 of the fill rate. The question is mostly about the memory bandwidth. Even if your shaders are simple, reading and writing of the data will become the bottle neck.

Cheers

Share this post


Link to post
Share on other sites
yeah, sure, I know I cannot achieve the same results with 4 render targets that I do with 1, but that is less than a 1/4th performancewise, and I´m still not doing 4 times the draw-calls and changing textures and so on I´d do with 4 passes for example.
Dunno, but I just wouldn´t have expected that it will take more than four times as long with 4 render targets than it did with one... so I thought I might have missed something when creating the render targets or anything?

Share this post


Link to post
Share on other sites
Considering the nature of your program, I really doubt that using an "even more simple shader" will help.

530 FPS / 4 = 132.5 FPS

You got 130 FPS using a more simple shader. It seems to me that there is very little GPU math going on and a whole lot of fillrate being hit. Since theoretically, you should have gotten 132.5 FPS with the more difficult shader, it doesn't appear to me that the GPU is stalling at actual math, but instead at the rate at which it can throw pixels onto each RT, thus, the memory bandwidth.

I would have to concur with Demus79 that the bottleneck is memory bandwidth.

Share this post


Link to post
Share on other sites

What I meant by the "same results" is that you cannot do (for example) deferred shading with only 1 RT.

The performance is another thing. And yes I agree that your app seems fillrate limited. Consider that you are writing and reading 4 times more pixels than with 1 RT. But 130 fps isn't that bad, you don't know the final performance before you start to throw in some more complex shaders.

Can you tell us, what are you trying to accomplish with multiple RTs?

Share this post


Link to post
Share on other sites
actually I´m really trying to get a deferred renderer, because I like that lights don´t interfere with batching ;)
Currently I´m using one RT for diffuse color (X8R8G8B8), one for normals (X8R8G8B8), one for specular (X8R8G8B8, too) and one for scene depth (R32F).
I will definitely pack the specular into the color RT or the normals RT and get rid of one RT completely. So I´ll end up using 3 RTs, I hope I didn´t miss anything really important.
Currently lighting or anything fancy isn´t implemented, just started today.
One odd thing currently is, that the scene depth seems a bit weird: The R32F texture is, when I render a quad using it, completely white, even putting a
float4 filled with the depth for rgb and an alpha component of 1 was completely white until I divided by w. After that division at least the closest objects result in about 80% or 90% brightness.
Guess I´ll have to think about that search the web some more (any good search phrases appreciated).
Until today I have thought that Z-component after perspective projection should be in [-1,1], but it appears it isn´t.

I guess I´ll just stick with that performance for now and hope it doesn´t get too bad after the fancy stuff is there. ;)

Share this post


Link to post
Share on other sites
Acid2 created a deferred lighting sample in C# that might be of use to you. I know too little about the technique to tell you anything conclusive, but from the parts I've seen from his sample you don't need to calculate and store the scene depth. My guess is that this also might be a bottleneck, since you're not using the same format for all render targets.

Hope this helps :)

Share this post


Link to post
Share on other sites

This topic is 4206 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this