Jump to content
  • Advertisement
Sign in to follow this  
CDProp

OpenGL Is it normal for 4x MSAA (with custom resolve) to cost ~3ms vs no MSAA?

This topic is 2084 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Greetings,

 

I have tone mapping working in my app, and I'd like to add multisampling. So that I may perform tone mapping before the MSAA resolve, I'm doing my own custom MSAA resolve by a) binding the multisampled texture and rendering a full-screen quad, b) using texelFetch to grab the individual samples, tone map them, and then blend them.

 

When I do this, it looks great, but it takes a big wet bite out of my render times (about 3ms). I don't think it's the tone mapping operator, because even if I only grab one sample instead of all 4, the performance seems about the same. This is for a 1920x1080 buffer on my GTX 680.

 

If 3ms is typical, then I'm okay with it, but since it's a good 20% of my frame time at 60Hz, I want to make sure it's not excessive. Obviously I can't afford to give everything 3ms. I'm using OpenSceneGraph, and so it can be difficult to tell what exactly is going on in terms of the underlying OpenGL, but I'm currently running it through gDEBugger to see what I can find. If you can think of anything I should be looking for, let me know.

Edited by CDProp

Share this post


Link to post
Share on other sites
Advertisement

My advice would be to do some profiling and measure which calls exactly cost the most time.

Then you might be able to improve bits without too much compromise on the visible end result.

I have no idea honestly if 3ms is normal, sounds like quite a bit though

Share this post


Link to post
Share on other sites

In my experience doing a custom resolve can be significantly slower than than the driver-provided resolve. On certain hardware I'm familiar with there's various reasons for this, but I don't think I can share those reasons due to NDA. However in general it's safe to assume that the driver can make use of hardware-specific details to accelerate the process, and that it might have to do extra work in order to provide you with the raw subsample data so that you can read it in a shader.

Share this post


Link to post
Share on other sites

Thanks, guys.

 

If it legitimately does take ~3ms, is that a price you would pay for doing (say) tone mapping before the resolve?

Share this post


Link to post
Share on other sites

If your platform offers the ability to configure graphics settings, sure. I think a lot of folks don't mind the *option* to burn GPU horsepower should their hardware afford them the opportunity. For lower-spec systems, post-process-based antialiasing systems could offer a reasonable, low-cost(!) alternative.

Share this post


Link to post
Share on other sites

I have a little more data. Today, I set it up to run normal (non-explicit) MSAA that just blits the multisampled texture to a texture, and then renders that texture to the back buffer, so that I could do a performance comparison. With a scene of perhaps medium-low complexity, running simple shaders (forward-rendered), tone mapped, but no other post-processed effects, I have the following render times:

 

No MSAA: 1.89ms

4x MSAA (standard): 2.67ms

4x MSAA (explicit): 4.69ms

 

Again, this is at 1080p with a GTX 680.

 

So, about a 0.77ms difference with just plain MSAA and a 2.81ms difference for explicit MSAA, which means that the explicit MSAA is costing me an extra 2ms. Oddly, this number goes down slightly (1.79ms) if I reduce the scene complexity a bit. This is somewhat alarming, because I don't see how scene complexity can affect the MSAA resolve (I'm blending all four pixels regardless of whether they're on an edge or not; maybe the default resolve does something smarter).

 

So, I don't know. I don't have much of a choice, it seems. If I want to do tone mapping and gamma correction correctly, explicit multisampling seems to be the way to go. The best I can do is profile and make sure that I'm optimizing my app/shader code. *shrug*

Edited by CDProp

Share this post


Link to post
Share on other sites

I have a little more data. Today, I set it up to run normal (non-explicit) MSAA that just blits the multisampled texture to a texture, and then renders that texture to the back buffer, so that I could do a performance comparison. With a scene of perhaps medium-low complexity, running simple shaders (forward-rendered), tone mapped, but no other post-processed effects, I have the following render times:

 

No MSAA: 1.89ms

4x MSAA (standard): 2.67ms

4x MSAA (explicit): 4.69ms

 

Again, this is at 1080p with a GTX 680.

 

So, about a 0.77ms difference with just plain MSAA and a 2.81ms difference for explicit MSAA, which means that the explicit MSAA is costing me an extra 2ms. Oddly, this number goes down slightly (1.79ms) if I reduce the scene complexity a bit. This is somewhat alarming, because I don't see how scene complexity can affect the MSAA resolve (I'm blending all four pixels regardless of whether they're on an edge or not; maybe the default resolve does something smarter).

 

So, I don't know. I don't have much of a choice, it seems. If I want to do tone mapping and gamma correction correctly, explicit multisampling seems to be the way to go. The best I can do is profile and make sure that I'm optimizing my app/shader code. *shrug*

 

Can always just ditch MSAA. Give alternate AA techniques a look, personally I'm a fan of Crytek's implementation of SMAA: http://www.crytek.com/download/Sousa_Graphics_Gems_CryENGINE3.pdf

 

Shouldn't take much more than your standard MSAA results on a 680, and looks about as good I'd say. Plus if you're using deferred you don't have to worry about fiddling with transparencies, as it's essentially all post.

Share this post


Link to post
Share on other sites
Just to be sure, depending on what you're going to add in the aimed scene/ your goal, you might be looking at optimization to early. If you turn on v-sync, anything up to 16.67ms will do fine. If this is the case, add more nice stuff, also gives more energy then looking for optimization :)

Share this post


Link to post
Share on other sites

That makes a lot of sense, thank you. It might be worth playing with some edge detection, so that I'm only doing the full blend on edge pixels. I'm not sure that the performance savings will be worth the overhead, but it will be a fun experiment, anyway.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!