Light Propagation Volumes: flickering when injecting RSM VPLs

Started by
11 comments, last by EricPolman 7 years, 10 months ago

Hello, I am running into a situation that I haven't seen anyone else experience in implementing Light Propagation Volumes. I have an RSM of let's say 512x512. That's 262144 VPLs. If I would downsample this to 128x128, there's still 16384 left.

When trying to inject this in my volume texture (using Compute Shaders and UAVs) I, logically, get flickering lights due to race conditions. How can I avoid this problem?

I have seen approaches using Vertex Shader/Geom Shader/Pixel Shader combinations, but I don't understand how that would prevent it.

Advertisement

Are you culling small objects before the RSM?

You can get issues where one frame they are in the RSM texture and the next frame they are not, as they can be sub-pixel. This causes flickering to occur.

Thanks for the reply!

Nope, not culling anything at the moment. And no movement in lighting and geometry.

In one implementation (http://blog.blackhc.net/2010/07/light-propagation-volumes/ not mine) they use a VS/GS/PS combination and that seems stable. Do pixel shader blend operations guarantee thread safety?

Can you explain why you have a race condition? Is the injection shader simply writing results into cells, which possibly stomps other writes into the same cell?

Can you fix that by, e.g. pre-clearing cells to zero and then safely adding overlapping results together instead of having one stomp the other?


result.positionWS /= numSamples;
    result.normalWS /= numSamples;
    result.normalWS = normalize(result.normalWS);
    result.flux /= numSamples;
    
    float surfelWeight = 6;
    float4 coeffs = dirToCosineLobe(result.normalWS) / PI * surfelWeight;
    float4 r4 = coeffs * result.flux.r;
    float4 g4 = coeffs * result.flux.g;
    float4 b4 = coeffs * result.flux.b;

    uint3 gridPos = getGridPos(result.positionWS);

    lpvR[gridPos] = (lpvR[gridPos] + r4) / 2;
    lpvG[gridPos] = (lpvG[gridPos] + g4) / 2;
    lpvB[gridPos] = (lpvB[gridPos] + b4) / 2;

The compute shader takes a sample from the RSM (result) and converts that to the coefficients which I need to store. Before this shader I clear the lpvR, lpvG, lpvB volume textures and in this step I add the VPLs from the RSM to the LPV volume textures. Simply assigning or trying to do a "running average" both result in flickering. This flickering is worse when simply assigning/stomping.

You should definitely get rid of that race condition.

The running average version is still a race condition too -- the read-modify-write of that resource is not atomic.

You could try using the Interlocked* functions to perform atomic increments into that grid, or implement it using the graphics pipeline instead of compute, so that you can use the fixed-function blending capabilities.

Shouldn't you inject them using ADD blending to conserve energy (or atomic adds in your case)? It's been a while since I implemented this... memory might be wrong.

Ah, forgot to look at the code. You mean you're getting race conditions because you're just using the basic + operator. In that case, you need InterlockedAdd (or atomicAdd in OpenGL compute shaders).

Thanks for the reply!

Nope, not culling anything at the moment. And no movement in lighting and geometry.

In one implementation (http://blog.blackhc.net/2010/07/light-propagation-volumes/ not mine) they use a VS/GS/PS combination and that seems stable. Do pixel shader blend operations guarantee thread safety?

Yes, blending makes sure that there are no race conditions so that's also a good solution. Whether you implement this technique using compute or VS/GS/PS is probably a matter of where you still have performance left. If your app isn't hard on the rasterizer, you can use the VS/GS/PS implementation because that's where most of the work will be. I remember testing it and the performance of this implementation mainly depended on how many VPLs fall into a single cell on average, presumably because that number scales directly with the amount of work the blending sorter/atomic make-surer (or whatever it is) has to do.

edit: Whoops, replied twice accidentally.

@Hodgman: interlocked was one of the things I considered, but interlockedadd for floats does not exist. I have tried casting to ints, storing, and converting back, but int4 doesn't work for InterlockedAdd, so I would need 12 UAVs (4 per colour channel) to store it. Seems like a hack when other people don't have to do it :P

@agleed: The last solution seems like the go-to solution when I look at other people's code (and now I know why they didn't just use a compute shader)

I will try the VS/GS/PS solution and report back! :) thanks for the help guys!

Yay the VS/GS/PS approach fixed it! And it is not significantly slower or anything.

Direct lighting (and small ambient term): http://scrnsht.me/u/yPb/raw

Combined: http://scrnsht.me/u/wPb/raw

Indirect lighting only (and small ambient term): http://scrnsht.me/u/xPb/raw

Still quite some self-illumination and incorrect bleeding, but for now, I am quite happy and satisfied.

This topic is closed to new replies.

Advertisement