Sign in to follow this  
CDProp

OpenGL Is it normal for 4x MSAA (with custom resolve) to cost ~3ms vs no MSAA?

Recommended Posts

CDProp    1451

Greetings,

 

I have tone mapping working in my app, and I'd like to add multisampling. So that I may perform tone mapping before the MSAA resolve, I'm doing my own custom MSAA resolve by a) binding the multisampled texture and rendering a full-screen quad, b) using texelFetch to grab the individual samples, tone map them, and then blend them.

 

When I do this, it looks great, but it takes a big wet bite out of my render times (about 3ms). I don't think it's the tone mapping operator, because even if I only grab one sample instead of all 4, the performance seems about the same. This is for a 1920x1080 buffer on my GTX 680.

 

If 3ms is typical, then I'm okay with it, but since it's a good 20% of my frame time at 60Hz, I want to make sure it's not excessive. Obviously I can't afford to give everything 3ms. I'm using OpenSceneGraph, and so it can be difficult to tell what exactly is going on in terms of the underlying OpenGL, but I'm currently running it through gDEBugger to see what I can find. If you can think of anything I should be looking for, let me know.

Edited by CDProp

Share this post


Link to post
Share on other sites
cozzie    5029

My advice would be to do some profiling and measure which calls exactly cost the most time.

Then you might be able to improve bits without too much compromise on the visible end result.

I have no idea honestly if 3ms is normal, sounds like quite a bit though

Share this post


Link to post
Share on other sites
MJP    19755

In my experience doing a custom resolve can be significantly slower than than the driver-provided resolve. On certain hardware I'm familiar with there's various reasons for this, but I don't think I can share those reasons due to NDA. However in general it's safe to assume that the driver can make use of hardware-specific details to accelerate the process, and that it might have to do extra work in order to provide you with the raw subsample data so that you can read it in a shader.

Share this post


Link to post
Share on other sites
CDProp    1451

Thanks, guys.

 

If it legitimately does take ~3ms, is that a price you would pay for doing (say) tone mapping before the resolve?

Share this post


Link to post
Share on other sites
InvalidPointer    1842

If your platform offers the ability to configure graphics settings, sure. I think a lot of folks don't mind the *option* to burn GPU horsepower should their hardware afford them the opportunity. For lower-spec systems, post-process-based antialiasing systems could offer a reasonable, low-cost(!) alternative.

Share this post


Link to post
Share on other sites
CDProp    1451

I have a little more data. Today, I set it up to run normal (non-explicit) MSAA that just blits the multisampled texture to a texture, and then renders that texture to the back buffer, so that I could do a performance comparison. With a scene of perhaps medium-low complexity, running simple shaders (forward-rendered), tone mapped, but no other post-processed effects, I have the following render times:

 

No MSAA: 1.89ms

4x MSAA (standard): 2.67ms

4x MSAA (explicit): 4.69ms

 

Again, this is at 1080p with a GTX 680.

 

So, about a 0.77ms difference with just plain MSAA and a 2.81ms difference for explicit MSAA, which means that the explicit MSAA is costing me an extra 2ms. Oddly, this number goes down slightly (1.79ms) if I reduce the scene complexity a bit. This is somewhat alarming, because I don't see how scene complexity can affect the MSAA resolve (I'm blending all four pixels regardless of whether they're on an edge or not; maybe the default resolve does something smarter).

 

So, I don't know. I don't have much of a choice, it seems. If I want to do tone mapping and gamma correction correctly, explicit multisampling seems to be the way to go. The best I can do is profile and make sure that I'm optimizing my app/shader code. *shrug*

Edited by CDProp

Share this post


Link to post
Share on other sites
FreneticPonE    3294

I have a little more data. Today, I set it up to run normal (non-explicit) MSAA that just blits the multisampled texture to a texture, and then renders that texture to the back buffer, so that I could do a performance comparison. With a scene of perhaps medium-low complexity, running simple shaders (forward-rendered), tone mapped, but no other post-processed effects, I have the following render times:

 

No MSAA: 1.89ms

4x MSAA (standard): 2.67ms

4x MSAA (explicit): 4.69ms

 

Again, this is at 1080p with a GTX 680.

 

So, about a 0.77ms difference with just plain MSAA and a 2.81ms difference for explicit MSAA, which means that the explicit MSAA is costing me an extra 2ms. Oddly, this number goes down slightly (1.79ms) if I reduce the scene complexity a bit. This is somewhat alarming, because I don't see how scene complexity can affect the MSAA resolve (I'm blending all four pixels regardless of whether they're on an edge or not; maybe the default resolve does something smarter).

 

So, I don't know. I don't have much of a choice, it seems. If I want to do tone mapping and gamma correction correctly, explicit multisampling seems to be the way to go. The best I can do is profile and make sure that I'm optimizing my app/shader code. *shrug*

 

Can always just ditch MSAA. Give alternate AA techniques a look, personally I'm a fan of Crytek's implementation of SMAA: http://www.crytek.com/download/Sousa_Graphics_Gems_CryENGINE3.pdf

 

Shouldn't take much more than your standard MSAA results on a 680, and looks about as good I'd say. Plus if you're using deferred you don't have to worry about fiddling with transparencies, as it's essentially all post.

Share this post


Link to post
Share on other sites
cozzie    5029
Just to be sure, depending on what you're going to add in the aimed scene/ your goal, you might be looking at optimization to early. If you turn on v-sync, anything up to 16.67ms will do fine. If this is the case, add more nice stuff, also gives more energy then looking for optimization :)

Share this post


Link to post
Share on other sites
CDProp    1451

That makes a lot of sense, thank you. It might be worth playing with some edge detection, so that I'm only doing the full blend on edge pixels. I'm not sure that the performance savings will be worth the overhead, but it will be a fun experiment, anyway.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By povilaslt2
      Hello. I'm Programmer who is in search of 2D game project who preferably uses OpenGL and C++. You can see my projects in GitHub. Project genre doesn't matter (except MMO's :D).
    • By ZeldaFan555
      Hello, My name is Matt. I am a programmer. I mostly use Java, but can use C++ and various other languages. I'm looking for someone to partner up with for random projects, preferably using OpenGL, though I'd be open to just about anything. If you're interested you can contact me on Skype or on here, thank you!
      Skype: Mangodoor408
    • By tyhender
      Hello, my name is Mark. I'm hobby programmer. 
      So recently,I thought that it's good idea to find people to create a full 3D engine. I'm looking for people experienced in scripting 3D shaders and implementing physics into engine(game)(we are going to use the React physics engine). 
      And,ye,no money =D I'm just looking for hobbyists that will be proud of their work. If engine(or game) will have financial succes,well,then maybe =D
      Sorry for late replies.
      I mostly give more information when people PM me,but this post is REALLY short,even for me =D
      So here's few more points:
      Engine will use openGL and SDL for graphics. It will use React3D physics library for physics simulation. Engine(most probably,atleast for the first part) won't have graphical fron-end,it will be a framework . I think final engine should be enough to set up an FPS in a couple of minutes. A bit about my self:
      I've been programming for 7 years total. I learned very slowly it as "secondary interesting thing" for like 3 years, but then began to script more seriously.  My primary language is C++,which we are going to use for the engine. Yes,I did 3D graphics with physics simulation before. No, my portfolio isn't very impressive. I'm working on that No,I wasn't employed officially. If anybody need to know more PM me. 
       
    • By Zaphyk
      I am developing my engine using the OpenGL 3.3 compatibility profile. It runs as expected on my NVIDIA card and on my Intel Card however when I tried it on an AMD setup it ran 3 times worse than on the other setups. Could this be a AMD driver thing or is this probably a problem with my OGL code? Could a different code standard create such bad performance?
    • By Kjell Andersson
      I'm trying to get some legacy OpenGL code to run with a shader pipeline,
      The legacy code uses glVertexPointer(), glColorPointer(), glNormalPointer() and glTexCoordPointer() to supply the vertex information.
      I know that it should be using setVertexAttribPointer() etc to clearly define the layout but that is not an option right now since the legacy code can't be modified to that extent.
      I've got a version 330 vertex shader to somewhat work:
      #version 330 uniform mat4 osg_ModelViewProjectionMatrix; uniform mat4 osg_ModelViewMatrix; layout(location = 0) in vec4 Vertex; layout(location = 2) in vec4 Normal; // Velocity layout(location = 3) in vec3 TexCoord; // TODO: is this the right layout location? out VertexData { vec4 color; vec3 velocity; float size; } VertexOut; void main(void) { vec4 p0 = Vertex; vec4 p1 = Vertex + vec4(Normal.x, Normal.y, Normal.z, 0.0f); vec3 velocity = (osg_ModelViewProjectionMatrix * p1 - osg_ModelViewProjectionMatrix * p0).xyz; VertexOut.velocity = velocity; VertexOut.size = TexCoord.y; gl_Position = osg_ModelViewMatrix * Vertex; } What works is the Vertex and Normal information that the legacy C++ OpenGL code seem to provide in layout location 0 and 2. This is fine.
      What I'm not getting to work is the TexCoord information that is supplied by a glTexCoordPointer() call in C++.
      Question:
      What layout location is the old standard pipeline using for glTexCoordPointer()? Or is this undefined?
       
      Side note: I'm trying to get an OpenSceneGraph 3.4.0 particle system to use custom vertex, geometry and fragment shaders for rendering the particles.
  • Popular Now