• Advertisement
Sign in to follow this  

Question about GI and Pipelines

This topic is 690 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This is probably a stupid question, but does Voxel Cone Tracing require the use of a deferred pipeline? Or can it go either way? And if it can go either way, is it generally more efficient to run it on a deferred pipeline?


I've been seriously contemplating which pipeline I wanted my engine to use. Seeing as I don't expect to use a lot of dynamic lights, I figured I could probably do just fine with a Forward+ Rendering with Clustered shading, and skip the worries about bandwidth. I've heard about being able to use compute shaders to create a Z-prepass along with the render in one pass... still looking for research on that.

Share this post


Link to post
Share on other sites
Advertisement
No, it would be possible with either. 
...but, the voxel dataset itself can also be deferred or forward lit :lol: You can voxelize material properties and then light the voxels, or you can light them during the voxelization process! Which gives forward scene + forward voxels, forward scene + deferred voxels, deferred scene + forward voxels, deferred scene + deferred voxels... 
 
Lighting in general tends to be more efficient in a deferred pipeline, which is why people use it :)

You can write most of your lighting code to not care about whether you're using forward or deferred, and then have two different sets of shaders that both call this shared lighting code. That lets you fairly easily support both for testing :) Edited by Hodgman

Share this post


Link to post
Share on other sites

alternating between forward and deferred is not that difficult. in fact a lot of deferred renderers still support forward for transparent objects. On the other side, forward shaders still have some deferred passes e.g. for ssao-alike techniques.

You should keep it flexible what channels you output into a gbuffer.

 

if you want to go crazy with the "next" big thing, you might investigate into texture space (aka objects space, aka world space) lighting, where you decouple lighting from rasterization, by baking lighting into textures.

Share this post


Link to post
Share on other sites
So why is the deferred more efficient? I thought forward plus would be nearly par for trade offs. And it'd depend on the systems hardware

Share this post


Link to post
Share on other sites

With deferred you're only ever shading visible pixels, with forward your shading triangles even if they don't end up being sampled. Forward+ just limits the shading to relevant screenspace tiles, which can apply to both deferred and forward shading. Deferred has a higher set up cost, but if you're scaling towards enough lights/shading it can end up cheaper in the end.

 

The biggest relevance here is that you want GI, and right now pretty much all GI passes need a more detailed pass based on deferred passes of some kind, as large scale GI can't simultaneously have enough resolution to scale to small details. So SSAO/SSR, or some variation of which that needs deferred information is often used. Even mostly forward titles like The Witcher 3 and The Order 1886 use deferred information for small/detail scale GI effects.

Share this post


Link to post
Share on other sites

Except that a Z prepass will reduce the effects of an overdraw by only making changes to visible fragments :P. However, it requires the scene to be rendered twice, I've heard that it's possible to do it just once by using Computer Shaders to do the Prepass for you. I can never find information on that though.

Edited by Tangletail

Share this post


Link to post
Share on other sites

Except that a Z prepass will reduce the effects of an overdraw by only making changes to visible fragments :P. However, it requires the scene to be rendered twice, I've heard that it's possible to do it just once by using Computer Shaders to do the Prepass for you. I can never find information on that though.

 

It should? You're just doing early depth rejection either way. For current consoles it might be interesting to see if you could do an async triangle depth pass using compute while the rasterizer is doing say, shadow maps. Then you'd have your depth rejection for a forward pass. You'd still be doing two geometry passes, but you'd be doing them more efficiently.

 

Of course this supposes you're not filling the relevant compute bubble with something else already. There's GI, "Fine pruned" lights for tiled lighting rejection, er... etc. etc. to fill an async compute cue with. 

Share this post


Link to post
Share on other sites

Except that a Z prepass will reduce the effects of an overdraw by only making changes to visible fragments :P. However, it requires the scene to be rendered twice, I've heard that it's possible to do it just once by using Computer Shaders to do the Prepass for you. I can never find information on that though.

There's also "pixel quad efficiency", which will be lower in forward.
GPU's always run pixel shaders on a 2x2 "quad" of pixels, not actually on individual pixels. If your triangle edges cut through this 2x2 sized grid, then there will be some wasted computation -- the GPU will execute the pixel shader on the full quad, and then throw away the results that aren't needed.
So, pixel sized triangles will run the PS 4x slower than 2x2-pixel pixel sized triangles.
In forward rendering with high-poly meshes, this can be a big inefficiency. In deferred, you only pay this inefficiency during the GBuffer creation step, but not during your lighting step (especially if lighting via compute shader).

IIRC, AMD also runs 16 quads per work group, Nvidia runs 8, and Intel runs 2. GPUs may or may not be able to run multiple triangles within a single group.
I can't remember this detail so... if the GPU is unable to pack quads from multiple triangles into a work group, then on AMD your triangles need to cover 16 quads (16 to 64 pixels) in order to get full work group efficiency.

Lastly, there's shader complexity. An AMD compute unit can "hyperthread" between 1 and 10 work-groups simultaneously, in theory (up to 640 pixels). However, the actual number of work-groups that it can "hyperthread" like this (which they call the "occupancy" value) depends on how complex your shader is (actually: how many temporary registers it requires). A simple shader can have occupancy of 10, while a complex shader might have occupancy of 1 or 2.
This is an extremely important value to optimize your shaders for, as in very, very basic terms, the latency of memory fetches can be divided by this number - i.e. it's an opportunity to make RAM seem 10x faster.
Forward uses a single, complex shader to do everything in one pass. Deferred breaks that shader in half, and does it in two simpler passes, which makes optimizing for occupancy easier.
The lighting pass of deferred can also be done on the compute hardware, which opens up optimization techniques that are not available to pixel shaders... and on modern AMD hardware, also lets you use "async compute" to run it in parallel with rasterization workloads.

So basically, there's never a simple winner :lol:

Share this post


Link to post
Share on other sites

These are a good start:

https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/

http://renderingpipeline.com/graphics-literature/low-level-gpu-documentation/

 

NVidia doesn't like to share information publicly... usually only publishing presentations like this, and their marketing people love to step in and deliberately attempt to blur the line between hardware design and software techniques (which are often cross-vendor, or even applicable to older NVidia GPUs)...

 

But AMD and Intel give out enough info that you could write your own hardware drivers if you wanted to! In fact, Intel started being this open so that the Linux community could/would write their own drivers :)

e.g.

https://01.org/linuxgraphics/documentation/hardware-specification-prms

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol03-gpu_overview.pdf

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture.pdf

^^ The Instruction Set Architecture documents explain the way that the hardware actually works -- or, the language(s) that their driver is translating all your D3D calls into.

https://01.org/linuxgraphics/documentation/hardware-specification-prms

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol03-gpu_overview.pdf

 

AMD has also recently started the http://gpuopen.com/ site, which has some gems on there.

Edited by Hodgman

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement