Sign in to follow this  

Tile-based deferred shading questions

This topic is 822 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been reading the following paper on tile-based deferred shading https://software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf and the referenced paper http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf and I naturally have some questions :)

 

  1. It dosn't mention anything about shadowmapping. I assume you would only batch lights together like this if they do not cast shadows? Otherwise you'd end up with alot of shadowmaps.
  2. I assume this is for point lights only. For determining visible light sources in each tile, do you use the depth to recompute view space position and then test against view space light radius? Is there a better way?
  3. Do you use a structured buffer for passing all lights constants and then a cbuffer with number of lights?
  4. How does blending work in this case when you output from compute shader? In my normal point light pass where I do one light at a time and output lighting from pixel shader the additive blending is automatically performed.

Thanks

 

Share this post


Link to post
Share on other sites

[*]It dosn't mention anything about shadowmapping. I assume you would only batch lights together like this if they do not cast shadows? Otherwise you'd end up with alot of shadowmaps.


Shadowmaps are useful for shadow casting objects and lights.

Modern games still do not have a lot of shadow casting lights as that would tank performance. (or you can have shadow casting for "free" in something like voxel GI with many light sources.. but that in itself is expensive so not really "free").

What usually happens in games.. is that lights that cast shadows are either hand optimized (one per scene or per subsections) or automatically optimized (the engine selects the closer light source or the brightest one relative to distance <- eg the sun would always be far but typically the brightest).

Ideally you profile based on your minimum spec (or provide a slider so that you can scale perf for a wider range of hardware) that allows you to determine how many shadow maps you want (at most) per scene. If you subdivide your scene into subsections it can also be useful to have hard fall offs for lights rather than physically accurate ones (they would decrease to zero at infinity).

Share this post


Link to post
Share on other sites
  1. See LeGreg's post.
     
  2. You can handle other light types as well if you like - I mean, why not?
    There are many different ways to perform the tile check. Usually you extract min and max depth per tile to create a mini-frustum. This is actually the easy part, since it is rather expensive to check lights against this frustum. Usually some conservative heuristic is used for this task: These slides are on the derived "Clustured Deferred Shading" approach (which may not be necessarily better than the much simpler Tiled Deferred Shading), but contain some good insights about the culling which might be useful for you.
     
  3. What kind of data-type is best may depend on the specific implementation and the hardware. Without having it tried myself, I would guess that your suggested types may be a good choice :). There should be lot of random access within the light-list, while there are a few global constants like the light count.
     
  4. Manual blending with the compute shader should be possible: Read + write at the same location - as far as I remember at least OpenGL specifies when that has to work.
    But I strongly advise against it since it is rather expensive and may lead undefined behavior (I'm not sure on this one). Just use the lighting via compute shader as your first pass into the given target. Since all lighting should be performed there, it is the first pass that writes to the "output-buffer" anyways, right?

Share this post


Link to post
Share on other sites
1. You can certainly handle shadow casting lights, you just need all of the shadow maps to be in memory at the same time. For the last game I worked on we kept spotlight shadow maps in a texture array with 16 elements, and then had a separate 4 element array for the directional light cascades.

2. You can handle other light types, like spotlights. Lights are culled per-tile by computing the planes of a sub-frustum that surrounds the tile, and then testing the light's bounding volume for intersection with that frustum. So for a point light you do a sphere/frustum test, and for a spotlight you can do a cone/frustum test. Just be aware that both sphere/frustum and cone/frustum can have false positives when you're doing the typical "test the volume against each plane" approach.

In case you didn't get this from the paper, the reason they do this is so that each thread group can cull a bunch of lights in parallel. So basically during the culling phase you assign a different light to each of your N threads in your thread group, and then append each intersecting light to a list in thread group shared memory. Then in the second phase each thread loops over the entire list of intersecting lights, and computes the light contribution for a single pixel.

3. Sure, that works. I'm pretty sure that's how the sample does it as well.

4. One of the main advantages of this approach is that you avoid blending. Basically you combine the light contributions for all lights (or at least, many lights) inside of your compute shader, and then write out the combined result to your texture. This saves a lot of bandwidth, since you don't have to do read/modify/write for every single light source. If you need to do multiple tiled passes, you can still do that with a compute shader approach. Just be aware that in D3D11 you can't read from a RWTexture2D unless it has R32_FLOAT/R32_UINT/R32_INT format*. This means that you can't do a manual read/modify/write for a R16G16B16A16_FLOAT texture. If you want to do use an fp16 format, you'll need to ping pong between two textures so that you can read from one and write to the other.

*This restriction was was relaxed for FEATURE_LEVEL_12_0 hardware, which now supports typed UAV loads for additional formats. Edited by MJP

Share this post


Link to post
Share on other sites

This topic is 822 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this