Deferred texturing

Started by
8 comments, last by FreneticPonE 6 years, 10 months ago

The most expensive pass in my rendering pipeline processes highly tessellated geometry and thus suffers from a high degree of quad overdraw. At the highest quality setting, the tessellation generates sub pixel triangles and the performance loss is quite drastic.

An obvious optimization is deferred texturing, which consists in running a pre pass which rasterizes the geometry and saves to some buffers all the data required by the original rendering pass: texture coordinates, derivatives etc. The original pass is then replaced by either a fullscreen triangle or a compute shader, with optimal quad utilization.

My question is: are gradient-based sampling functions nowadays less efficient than the ones not taking the gradient as argument ? That's my main worry and I could not find any information on this. I know that Deus Ex Manking uses this technique but I'd like to have some confirmation before coding a prototype. Also, apart from the need for intermediate buffers, are there other non obvious disadvantages ?

Thanks!

Stefano

Advertisement

A.) Fix your tessellation problem. There are plenty of ways to reduce polycount via any modeling tool: Decimate/remesh etc. This is not just an issue with texures but writing to the depth buffer and other buffers several times, not just texture fetching.

B.) Unless youre entire scene fits into a giant texture array or massive texture, then how would you fetch random textures from a gbuffer index?

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

At the highest quality setting, the tessellation generates sub pixel triangles and the performance loss is quite drastic

I don't know much about tessellation but doesn't having subpixel triangles mean you're tessellating too much?

My question is: are gradient-based sampling functions nowadays less efficient than the ones not taking the gradient as argument ?

I don't know personally but I do know of two deferred texturing papers that might have the answer.

http://jcgt.org/published/0002/02/04/

http://www.gdcvault.com/play/1023792/4K-Rendering-Breakthrough-The-Filtered

https://onedrive.live.com/view.aspx?resid=EBE7DEDA70D06DA0!115&app=PowerPoint&authkey=!AP-pDh4IMUug6vs

-potential energy is easily made kinetic-

This will not help with your performance issues. Sub pixel triangles (even sub quad triangles) are going to hurt you, and your pre-pass won't be a fast depth only pass either.

What you describe though sounds very close to Oxide's Image Based Lighting. They effectively build up UV unwrapped Gbuffers for their different meshes (offline), then at runtime do the lighting and shading in image space (uv unwrap space), and then the actual scene rasterization happens at the end with in theory the most simple of shaders. It allows decoupling the resolution of rendering (rasterization pass) from the lighting/shading resolution. You could also update the shading at a lower frequency, but you might then get issues with view dependent effects like reflections / specular.

Explicitly passing gradients can be slower in some situations, but it depends on the hardware as well as the specifics of the shader itself. Sending the gradients requires sending more data from the shader unit to the texture unit, and in some cases it can cause texture sampling to run at a lower rate.

There are 3 ways (that I know of) that you can handle sampling your textures in your deferred pass:

  1. Pack all scene textures into atlases or arrays
  2. Use virtual texturing techniques, which effectively lets you achieve atlasing through hardware or software indirections
  3. Use bindless resources

When I experimented with this I went with #3 using D3D12, and it worked just fine. The one thing you need to watch out for with bindless techniques is divergent sampling: if different threads within a warp/wavefront sample from different texture descriptors, the performance will go down by a factor proportional to the number of different textures. Generally your materials tend to be coherent in screen-space so it's not too much of an issue, but if you have areas with many different materials only covering a small number of pixels then your performance may suffer.

I also disagree with ATEFred that it won't help you at all. One of the primary advantages of deferred techniques is that it can decouple your heavy shading from your geometric complexity, which helps you to avoid the reduced efficiency from sub-pixel triangles. Deferred texturing in particular aims to go even further than traditional deferred techniques by only writing a slim G-Buffer and moving all texture sampling to a deferred pass, which makes it even more ideal appealing for your situation. Obviously your G-Buffer pass is still going to be slower than it would be with a lower geometric complexity that has a more ideal triangle-to-pixel ratio, but in general the more you can decouple your shading from rasterization the less you'll be impacted by poor quad utilization. That said, you should always be thorough about gathering performance numbers using apples-to-apples comparison as much as you can, so that you can make your choices based on concrete data. Some things can vary quite a bit depending on the scene, material complexity, the GPU, drivers, etc.

It sounds to me like passing through the actual tessellation pipeline, in which case deferred texturing/full "visibility buffer" would indeed help, as you're only tessellating onscreen triangles. In fact it helps so much DICE found that a full async geometry pass to only draw visible geometry to the g-buffer (traditional deferred) significantly increased tessellation pipeline perf.

So the idea is perfectly sound, though it does sound like overtessellation is going on, as you're still not getting any benefit out of tessellating to subpixel triangles.

Thank you for the replies.

I should have provided more information in my original post. I would use deferred texturing for the rendering of water normals only, not for the whole scene. This pass is particularly expensive due to the pixel shader complexity (it combines several wave layers), many texture fetches with anisotropic filtering and the abovementioned quad overdraw (a term I borrowed from RenderDoc). I use the projective technique for the geometry and in order to minimize temporal aliasing, caused by sampling slightly different points on the water plane as the camera moves, the tessellation must be very high, especially near the horizon. I tried a couple of stabilization approaches but sadly none worked to my satisfaction.

@MPJ Luckily divergent sampling is not a problem in my case as the shader uses the same set of textures for the whole pass.

I will work on an implementation in the weekend and let you know my findings.

Thanks again!

Oh! In that case you want to do something completely different. What you want to do is actually simplify your ocean rendering as it gets farther away, tessellating less and less, and letting normal maps, and eventually a brdf take over. Take a look here: http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8659.2009.01618.x/abstract

There's other papers on the same thing, including LEAN mapping and etc. that take on similar problems.

Thank you for the link. I am aware of LEAN mapping and the Bruneton paper. In fact, I already use a baked C-LEAN variance texture for the computation of the wave variance and the filtering of all functions that depend on the normal.

The issue I am having is related to the undersampling of the displacement map and the temporal aliasing implicit in the projective grid technique.

Anyway, by biasing the tessellation towards the horizon I managed to reduce these artifacts. I also refactored the pipeline and now the water geometry and shading phases are entirely decoupled, with a noticeable performance improvement on my 5 years old laptop :)

I also uploaded a new demo on my website along with new features. I'd really appreciate the feedback of anyone reading this thread !
Thanks !
Mandatory screenshot:

That would probably be because CLEAN mapping doesn't take ani-stropic filtering into account, so error will increase as you get parallel to the surface, IE more towards the horizon for your water. And if you want a non deferred texturing solution to better tessellation culling (not tessellating non visible triangles) run an async compute pass of the geometry, then discard non visible triangles before tessellation stage.

Regardless, looking forward to the demo, the crap public wifi I'm on definitely won't be downloading it in any reasonable time but I'll check it out later. Glad your performance is going up!

This topic is closed to new replies.

Advertisement