MRT and Anti-aliasing

Started by
7 comments, last by Tispe 10 years, 7 months ago

I have two render targets, 0 = color and 1 = depth.

Now I'm trying to get anti-aliasing to work, since it's not working anymore.

I read that I could use IDirect3DDevice9::StretchRect to get anti-aliasing to work, how do I use the function correctly?

I'm getting a black screen If I enabled anti-aliasing.

Advertisement

This is just guessing but if you render to a surface with a 4x larger area (double length and height), then downsample it using StretchRect it gives the effect of AA.

Unfortunately, D3D9 doesn't allow MSAA targets to be used with MRT:

Multiple Render Targets (Direct3D 9)

  • No antialiasing is supported.

If you want to support this, you can emulate MRT with multiple passes -- first render the scene to your first target, then render the scene again to your second target.

@Hodgman: Hmm.. don't you think that will lower the performance?

What modern games do?

@Tispe: Which is better for performance? the technique that you mentioned or rendering the scene twice?

Wasn't the problem with DirectX 9 that while you could create all your targets with some MSAA technique, because of the way you did lighting and accumulated all the targets with the lighting the results were not correct? And DirectX 10.1 gave you access to all the samples in a multisampled target and also gave you access to the depth buffer in a similar manner?

Unfortunately, D3D9 doesn't allow MSAA targets to be used with MRT:

Multiple Render Targets (Direct3D 9)

  • No antialiasing is supported.

If you want to support this, you can emulate MRT with multiple passes -- first render the scene to your first target, then render the scene again to your second target.

Would this look correct? Rendering the albedo, normal and specular targets all separately, with MSAA on and combining them in the final shader would give artifact-less and correct rendering? I'm asking just about the correctness. I know this would be slow, because you need to render the whole thing 3 times into separate multi-sampled targets and resolve each to a texture before combination.

@Hodgman: Hmm.. don't you think that will lower the performance?

What modern games do?

@Tispe: Which is better for performance? the technique that you mentioned or rendering the scene twice?

I'm not using deferred rendering so I have full access to MSAA (but not CSAA, damn you XNA), but I also optionally support post-processing AA. I have FXAA support, which is incredibly fast but gives in my opinion very blurry results. And I have support for SMAA which gives extremely good results on non-moving pictures, even better than MSAA x8 sometimes, but has higher temporal anti-aliasing. And it is a lot slower than FXAA, but faster than MSAA 8x. Good thing that you cost per frame is constant.

Both are very easy to implement. If you know how to render a full-screen quad and do something simple like bloom, you will easily take a reference sample of FXAA or SMAA and integrate them into your engine within minutes.

There is also a mater of preference. I can't stand the blurriness of FXAA but others don't notice it. For a test, if you have NVidia try out 32x CSAA with an ultra setting SMAA. You may really like the look, with both techniques rarely building up temporal aliasing constructively. Just don't expect high framerates :).

@Hodgman: Hmm.. don't you think that will lower the performance?

Yes, quite possibly. But your current version doesn't even work, so I'd value bad performance higher than no performance wink.png biggrin.png

However, in some cases, this may actually increase performance. Drawing the scene twice, where the first time you only draw depth, and the second time you use your real pixel shaders, is known as a "z pre pass", "depth pre pass", "zpp", etc.

Doom 3 chose to do this on purpose, because it lets the GPU take full advantage of the depth buffer, to avoid overdraw.

Say you've got a camera looking through 3 walls:
Cam -> |A| |B| |C|
If you draw C, then B, then A, then you're running 3 different pixel shaders, even though only the last one (A) counts -- it overwrites the previous results. That's "overdraw".

By drawing the whole scene's depth buffer first, then in the second pass, B & C will be skipped, because they fail the depth test.
In scenes that have a lot of overdraw, then a ZPP may actually improve performance.

What modern games do?

Use DX11 where you can use MSAA textures and MRT at the same time wink.png

Another solution though would be to use a post-process anti-aliasing solution, like FXAA instead of MSAA, as mentioned above.

Wasn't the problem with DirectX 9 that while you could create all your targets with some MSAA technique, because of the way you did lighting and accumulated all the targets with the lighting the results were not correct? And DirectX 10.1 gave you access to all the samples in a multisampled target and also gave you access to the depth buffer in a similar manner?

Yes, D3D9 gives no way to access the sub-samples (besides averaging them all together in a standard resolve step), so even if you do manage to render out an MSAA G-buffer, you can't make use of it.

Would this look correct? Rendering the albedo, normal and specular targets all separately, with MSAA on and....

Nope. Medo3337 isn't rendering a G-buffer. He's just using forward rendering, but also outputting depth to a colour texture.

The averaging of depth will also be wrong, but probably still close enough to correct for most purposes.

However, in some cases, this may actually increase performance. Drawing the scene twice, where the first time you only draw depth, and the second time you use your real pixel shaders, is known as a "z pre pass", "depth pre pass", "zpp", etc.


Doom 3 chose to do this on purpose, because it lets the GPU take full advantage of the depth buffer, to avoid overdraw.

Say you've got a camera looking through 3 walls:
Cam -> |A| |B| |C|
If you draw C, then B, then A, then you're running 3 different pixel shaders, even though only the last one (A) counts -- it overwrites the previous results. That's "overdraw".

By drawing the whole scene's depth buffer first, then in the second pass, B & C will be skipped, because they fail the depth test.
In scenes that have a lot of overdraw, then a ZPP may actually improve performance.

I've heard of Z-pass before. I actually read about it. To quote directly:

Double-Speed Z-Only and Stencil Rendering

All GeForce Series GPUs (FX and later) render at double speed when rendering
only depth or stencil values. To enable this special rendering mode, you must
follow the following rules:
  • Color writes are disabled
  • Texkill has not been applied to any fragments (clip, discard)
  • Depth replace (oDepth, texm3x2depth, texdepth) has not been applied to any fragments
  • Alpha test is disabled
  • No color key is used in any of the active textures
See section 6.4.1 for information on NULL render targets with double speed Z.
3.6.2. Z-cull Optimization
Z-cull optimization improves performance by avoiding the rendering of
occluded surfaces. If the occluded surfaces have expensive shaders applied to
them, z-cull can save a large amount of computation time. See section 4.8 for a
discussion on Z-cull and how to best use it.
3.6.3. Lay Down Depth First (“Z-only rendering”)
The best way to take advantage of the two aforementioned performance
features is to “lay down depth first.” By this, we mean that you should use
double-speed depth rendering to draw your scene (without shading) as a first
pass. This then establishes the closest surfaces to the viewer. Now you can
render the scene again, but with full shading. Z-cull will automatically cull out
fragments that aren't visible, meaning that you save on shading computations.
Laying down depth first requires its own render pass, but can be a performance
win if many occluded surfaces have expensive shading applied to them. Doublespeed rendering is less efficient as triangles get small. And, small triangles can
reduce z-cull efficiency.

Bu I'm not sure exactly what I should do to benefit 100% from this technique and the "double speed" Z rendering and how to satisfy all those points.

Then there is this section that makes things even more confusing:

CULL and EarlyZ: Coarse and

Fine-grained Z and Stencil
Culling
NVIDIA GeForce 6 series and later GPUs can perform a coarse level Z and
Stencil culling. Thanks to this optimization large blocks of pixels will not be
scheduled for pixel shading if they are determined to be definitely occluded.
In addition, GeForce 8 series and later GPUs can also perform fine-grained Z
and Stencil culling, which allow the GPU to skip the shading of occluded pixels.
These hardware optimizations are automatically enabled when possible, so they
are mostly transparent to developers. However, it is good to know when they
cannot be enabled or when they can underperform to ensure that you are taking
advantage of them.
Coarse Z/Stencil culling (also known as ZCULL) will not be able to cull any
pixels in the following cases:
1. If you don’t use Clears (instead of fullscreen quads that write depth) to
clear the depth-stencil buffer.
2. If the pixel shader writes depth.
3. If you change the direction of the depth test while writing depth.
ZCULL will not cull any pixels until the next depth buffer Clear.
4. If stencil writes are enabled while doing stencil testing (no stencil
culling)
5. On GeForce 8 series, if the DepthStencilView has
Texture2D[MS]Array dimension
Also note that ZCULL will perform less efficiently in the following
circumstances
1. If the depth buffer was written using a different depth test direction
than that used for testing
2. If the depth of the scene contains a lot of high frequency information
(i.e.: the depth varies a lot within a few pixels)
3. If you allocate too many large depth buffers.
4. If using DXGI_FORMAT_D32_FLOAT format
Similarly, fine-grained Z/Stencil culling (also known as EarlyZ) is disabled in
the following cases:
1. If the pixel shader outputs depth
2. If the pixel shader uses the .z component of an input attribute with the
SV_Position semantic (only on GeForce 8 series in D3D10)
3. If Depth or Stencil writes are enabled, or Occlusion Queries are
enabled, and one of the following is true:
• Alpha-test is enabled
• Pixel Shader kills pixels (clip(), texkil, discard)
• Alpha To Coverage is enabled
• SampleMask is not 0xFFFFFFFF (SampleMask is set in
D3D10 using OMSetBlendState and in D3D9 setting the
D3DRS_MULTISAMPLEMASK renderstate)

I think I could use the way Tispe suggested for now by making the backbuffer size larger than the scene and then resize it using device->StretchRect() to get anti-aliasing effect

Now how do I use device->StretchRect() correctly to do that?


device->StretchRect(pSourceTextureSurface, NULL, pBackBufferSurface, NULL, D3DTEXF_LINEAR);

You can render to a texture with double the resolution then get the texture surface and pass it as pSourceTextureSurface. Remember to release() the surfaces to decrease the reference count.

Edit: Perhpas just using a plain surface with device->CreateOffscreenPlainSurface(). D3DPOOL_DEFAULT is the appropriate pool for use with the IDirect3DDevice9::StretchRect. This surface must then be released when resetting the device.

This topic is closed to new replies.

Advertisement