• Create Account

# Stippled Deferred Translucency

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

8 replies to this topic

### #1Hodgman  Moderators   -  Reputation: 19629

Like
4Likes
Like

Posted 11 September 2013 - 11:36 PM

I've been working on a translucency system for my deferred renderer, inspired by the translucency support present in "inferred rendering", which works by interleaving samples using a screen-door pattern.

In my system, I'm first rendering the opaque objects to the G-buffer as usual, and then rendering translucent objects using a similar stippling pattern scheme to inferred rendering.

The stipple pattern changes depending on the number of layers that are present per pixel, in order to maximize the resolution of the closest layer:

#Layers
0    1    2    3    4+
Pattern:
B|B  B|1  B|2  B|3  4|3
-+-  -+-  -+-  -+-  -+-
B|B  1|1  1|1  2|1  2|1  

B = background. 1 = closest layer to viewer, 2 = 2nd closest layer, etc.
i.e. If there's only 1 layer over the opaque scene, 3/4 pixels are overwritten with that translucent data, and 1/4 retains the background.
If there's 4 or more layers, then the first 4 layers each get 1/4 pixels, and the 5th layer onwards are discarded.

The "closest layer" is determined by rendering order -- translucent objects are sorted to be drawn from front-to-back (instead of the typical back-to-front order), and it's assumed in the stippling scheme that the first object to write to a pixel is the first layer, etc.

I then perform lighting on the G-Buffer as usual.

After lighting, the pixels in each 2x2 block are sorted into layers, the 2nd/3rd/4th layers are blurred appropriately (to simulate translucent scattering), and then the layers are composited into the final rendering.

The obvious downside is a loss of resolution in areas of the screen containing translucent objects.
If there's only 1 layer, this loss is minimal (1/4 of pixels are missing on the closest layer), but if there's 3+ layers, the loss is more extreme (3/4 pixels missing). I use FXAA to minimize the resulting aliasing artefacts.

The other obvious downside is that you start discarding whole layers when more than 4 triangles overlap.

The rendering order in the below image is actually unsorted (instead of the ideal order of front-to-back) to demonstrate the per-2x2-quad sorting. As long as the first 4 layers are drawn first (with small errors in ordering), then the sorting stage actually fixes most artefacts, with the main artefact being non-ideal resolution distribution for the 2-layer case.

The draw order is:
1) Opaque objects (brown rock floor, red/black/silver mechanical components, inner wheel)
2) The orange gel tires over the inner wheel.
3) The black wing.
4) the blue floor.

As you can see, it's working pretty well for me, and the performance beats forward rendering due to less pixels being shaded (forward rendering will shade every pixel on every layer). Plus the ability to very easily get blurred transmission is very cool (and was a feature my artists wanted from the beginning).

However, it's all very ad-hoc and quickly thrown together, so I'm trying to research similar solutions that are out in the wild. I feel there's probably improvements to be made in both my stippling / stencil-routing step, and in my reconstruction filter.

So far I've only found:

• Inferred rendering (the original inspiration), but it uses an additional geometry pass instead of a screen-space reconstruction filter.
• Stencil-routed A-buffers and K-buffers - similar but using MSAA instead of whole pixels. Seems like a good idea
• Basic deferred stippling methods, either with no reconstruction filter at all, or a simple averaging reconstruction filter.

If you know of other techniques like this, please share

Edited by Hodgman, 11 September 2013 - 11:42 PM.

### #2Ohforf sake  Members   -  Reputation: 797

Like
2Likes
Like

Posted 12 September 2013 - 01:25 AM

How do you handle ssao and particle effects in this approach? Do you keep the stippled depth buffer?

We use stippling and limited it to 2 layers. It is good enough for what we need, but especially ssao and particles are really bothering me.

### #3ATEFred  Members   -  Reputation: 808

Like
2Likes
Like

Posted 12 September 2013 - 03:32 AM

For alpha lighting, I generate a volume texture locked to the camera with lighting information (warped to match the frustum). Atm I fill this in a CS, similar to the usual CS light culling pass. I store both single non directional approximated lighting value, and a separate set of directional values. This allows me to do either a simple texture fetch to get rough lighting info when applying it (for particles for example), or higher quality directional application with 3 texture fetches and a few ALU ops.

It's a pretty simple system atm, downsides are lower lighting resolution in the distance, and it's not exactly free to generate. (That might be possible to optimize by at least partially generating it on CPU though). Also, no specular atm...

Pros are cheap lighting, even for a huge number of particles, and semi cheap volumetric lighting / light shafts for any shadow casting light in the scene, as I also march through the volume when I apply my directional light volumetric shadows (simple raymarch through shadowmap).

### #4Krypt0n  Crossbones+   -  Reputation: 1756

Like
2Likes
Like

Posted 12 September 2013 - 04:37 AM

it might be just as fast, but with better quality, to create a seperate texture for alpha blended objects.

1. as gpus work on 2x2 quads, you poison the depth buffer and any common deferred optimization (depth bound check, stencil culling) as the 2x2 pixel quad is processed even if just one pixel is valid. the results of the other pixels are discarded, of course -> not slower

2. you have still the full res backbuffer for solids, but now also a separate target with alpha -> better quality

3. you could use a lower res alpha buffer, you're interleaving anyway which is like less resolution, so why not an alpha buffer with lower res from the beginning? you'd need to render into the high-res first, then you'd do a resolve where you combine the alpha-gbuffer, kinda filling empty pixel, then you shade this lower res buffer ->I estimate this to be faster than interleaved.

### #5Hodgman  Moderators   -  Reputation: 19629

Like
0Likes
Like

Posted 12 September 2013 - 08:45 AM

How do you handle ssao and particle effects in this approach?

I do SSAO using a half-resolution depth buffer (and bilaterally upsample the results). To get the half-res depth, I take the minimum value in each 2x2 quad, so SSAO is only computed for the closest layer.

I haven't yet decided how to best solve interactions with traditional alpha surfaces, which I guess is what you mean by particles issues.

Ideally, when rendering particles into the stippled lighting buffer (before the reconstruction filter is executed), the particles would perform a depth test against all 4 layers, and then only write/blend over the pixels that correspond with the furthest layer that they are in front of...

For alpha lighting, I generate a volume texture locked to the camera with lighting information (warped to match the frustum).

That's similar to clustered shading, but instead of storing a list of lights per cell/texel in the volume, you're storing the (approximate) radiance at that location. I was thinking of using something similar for things where approximate lighting is ok, like smoke particles. Does it work well for you in these cases?

BTW, if you stored the light in each cell as SH, you could extract the dominant light direction and colour from this representation, and use it for some fake specular highlights ;)

1. as gpus work on 2x2 quads, you poison the depth buffer and any common deferred optimization (depth bound check, stencil culling) as the 2x2 pixel quad is processed even if just one pixel is valid. the results of the other pixels are discarded, of course -> not slower

I'm currently using tiled/clustered deferred shading, so loss of depth/stencil optimizations isn't a worry ;) The varying depths in the layers does still cause data inefficiencies in both tiled (tile-frustums with large depth ranges) and clustered (data/branch coherency) though.
I was thinking about addressing these by running a de-stippling pass over the G-Buffers, which simply re-arranges them before the lighting step so that where the top-left pixels from each 2x2-quad fill the top-left quarter of the buffer, and so on for the other three.
e.g. the transform would rearrange the like below, so it looks like you've got 4 half-res views of the scene packed/atlased next to each other.

121212    111222
343434 -> 111222
121212    333444
343434    333444

The coherency of each of these sub-buffers would then be improved during lighting. Afterwards, I'd have to run the inverse of this transformation over the lighting result to get the actual image, instead of 4 near-identical sub-images.

3. you could use a lower res alpha buffer, you're interleaving anyway which is like less resolution, so why not an alpha buffer with lower res from the beginning?

In the case where there's only one layer of translucency, the background is at half-resolution (1/4th as many pixels), but the translucent layer is only missing 1/4th of it's pixels (still almost full resolution). In this case, the alpha layer does not appear like a half-res rendering -- the quality is very sharp.

Also in the case of two layers, the front-most layer is only half-res in one axis instead of both axes (1/2 as many pixels).

I just realized I had some debug code enabled when taking the above screenshot, where it behaves as if every layer is half res, even if more information about that layer is available.
Here's a close-up (zoomed 4x) of the front layer having 3/4 pixels, and 1/4 pixels: http://i.imgur.com/TLUGQHh.png
As the background is blurred, the lack of resolution there isn't as much of an issue.

Edited by Hodgman, 12 September 2013 - 09:07 AM.

### #6Frenetic Pony  Members   -  Reputation: 873

Like
1Likes
Like

Posted 12 September 2013 - 03:30 PM

You may be interested in what Bungie is doing for Destiny: http://advances.realtimerendering.com/s2013/Tatarchuk-Destiny-SIGGRAPH2013.pdf

For transparencies they are, essentially, using multiple tiny spherical harmonic light probes to light their transparencies (a but like what ATEfred is doing). So instead of lower res you get a more proxy lighting, and something of a more complex pipeline. They also have stuff on using an eighth, 1/8th! res buffer for particles, and how they manages to avoid aliasing artefacts on the edges.

The other thing I can think of is what Epic apparently does with UE4, and that's just brute force more G-buffer layers altogether. A layer of transparency in front? An entire other filled g-buffer for it. A rather direct way to use all that memory and bandwidth I suppose. Still, you could combine it with stippling, going for up to twice the buffer, keeping more resolution for more important layers or going for up to eight layers if you want.

If you go with multiple g-buffers you could also consider thin g-buffers. Compact normals down to X and Y and reconstruct Z. Use Bungies "material ID" to compact that down to 1 channel, compact color down ala what Crytek does for Crysis 3 (there's so many presentations on it, I'm not sure which one it is). Bungie also skips separate specular channels by just hacking spec color based on diffuse color. Point would be to save as much as you can on your extra g-buffers.

Edited by Frenetic Pony, 12 September 2013 - 04:02 PM.

### #7ATEFred  Members   -  Reputation: 808

Like
1Likes
Like

Posted 13 September 2013 - 02:15 AM

That's similar to clustered shading, but instead of storing a list of lights per cell/texel in the volume, you're storing the (approximate) radiance at that location. I was thinking of using something similar for things where approximate lighting is ok, like smoke particles. Does it work well for you in these cases?

BTW, if you stored the light in each cell as SH, you could extract the dominant light direction and colour from this representation, and use it for some fake specular highlights ;)

That's pretty much it. It works really well for particles and fog with the single directionless approximated value, and it's lightning fast, once it is generated. I'll have to get a video capture done at some point.

atm I use HL2 basis rather than SH (simply because it was easier to prototype, and for alpha geo I only really care about camera facing stuff). Getting dominant direction from SH sounds like a good idea, now sure how computationally expensive it is? I'll need to look it up.

### #8MJP  Moderators   -  Reputation: 7613

Like
3Likes
Like

Posted 13 September 2013 - 11:40 AM

atm I use HL2 basis rather than SH (simply because it was easier to prototype, and for alpha geo I only really care about camera facing stuff). Getting dominant direction from SH sounds like a good idea, now sure how computationally expensive it is? I'll need to look it up.

It's very cheap.

//-------------------------------------------------------------------------------------------------
// Computes the "optimal linear direction" for a set of SH coefficients
//-------------------------------------------------------------------------------------------------
float3 OptimalLinearDirection(in SH4Color sh)
{
float x = dot(sh.c[3], 1.0f / 3.0f);
float y = dot(sh.c[1], 1.0f / 3.0f);
float z = dot(sh.c[2], 1.0f / 3.0f);
return normalize(float3(x, y, z));
}

//-------------------------------------------------------------------------------------------------
// Computes the direction and color of a directional light that approximates a set of SH
// coefficients. Uses Peter Pike-Sloan's method from "Stupid SH Tricks"
//-------------------------------------------------------------------------------------------------
void ApproximateDirectionalLight(in SH4Color sh, out float3 direction, out float3 color)
{
direction = OptimalLinearDirection(sh);
SH4Color dirSH = ProjectOntoSH4(direction, 1.0f);
dirSH.c[0] = 0.0f;
sh.c[0] = 0.0f;
color = SHDotProduct(dirSH, sh) * 867.0f / (316.0f * Pi);
}


### #9ATEFred  Members   -  Reputation: 808

Like
0Likes
Like

Posted 13 September 2013 - 01:00 PM

atm I use HL2 basis rather than SH (simply because it was easier to prototype, and for alpha geo I only really care about camera facing stuff). Getting dominant direction from SH sounds like a good idea, now sure how computationally expensive it is? I'll need to look it up.

It's very cheap.

//-------------------------------------------------------------------------------------------------
// Computes the "optimal linear direction" for a set of SH coefficients
//-------------------------------------------------------------------------------------------------
float3 OptimalLinearDirection(in SH4Color sh)
{
float x = dot(sh.c[3], 1.0f / 3.0f);
float y = dot(sh.c[1], 1.0f / 3.0f);
float z = dot(sh.c[2], 1.0f / 3.0f);
return normalize(float3(x, y, z));
}

//-------------------------------------------------------------------------------------------------
// Computes the direction and color of a directional light that approximates a set of SH
// coefficients. Uses Peter Pike-Sloan's method from "Stupid SH Tricks"
//-------------------------------------------------------------------------------------------------
void ApproximateDirectionalLight(in SH4Color sh, out float3 direction, out float3 color)
{
direction = OptimalLinearDirection(sh);
SH4Color dirSH = ProjectOntoSH4(direction, 1.0f);
dirSH.c[0] = 0.0f;
sh.c[0] = 0.0f;
color = SHDotProduct(dirSH, sh) * 867.0f / (316.0f * Pi);
}


Awesome, thanks for the info! I'll give this a whirl this weekend!

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS