Voxel Cone Tracing Experiment - Part 2 Progress

Started by
53 comments, last by FreneticPonE 10 years, 4 months ago

float L=0.1;
float4 T=0;
float3 NewPos;
for(int i=0;i<10;i++){
NewPos=RealPos+R*L; // RealPos - current position, R- reflection
T=mul(float4(NewPos,1),mat_ViewProj); // Projecting new position to screen.
T.xy=0.5+0.5*float2(1,-1)*T.xy/T.w;
NewPos=GetWorldPos( GBufferPositions.Load(uint2(gbufferDim.xy* T),0),T.xy,mat_ViewProjI); // Find world position

L=length(RealPos-NewPos); // new distance
}

T.xy - texturecoord of reflected pixel
Advertisement

I've managed to increase the speed of my ssR to 5.3ms at the cost of reduced quality by using variable step distance - so now i'm using 20 steps instead of 50.

[attachment=18456:giboxssr10.png]

Even if I get it down to 10 steps and remove the additional backface cover, it will still be 3.1ms - is this fast enough? or can it be optimized further?

So I've managed to remove some of the artifacts from my soft shadows:

Previously, when I had used front-face culling I got the following issue:

[attachment=18552:givoxshadows8-0.jpg]

This was due to backfaces not being captured by the shadow-caster camera when at overlapping surfaces, thus leading to a gap of missing information in the depth test. There's also the issue of back-face self shadowing artifacts.

Using back-face culling (only rendering the front-face) resolves this problem, however, leads to the following problem:

[attachment=18553:givoxshadows8-1.jpg]

Which is front-face self shadowing artifacts - any sort of bias does not resolve this problem because it is caused by the jittering process during depth testing.

I came up with a solution that resolves all these issues for direct lighting shadows, which is to also store an individual object id for each object in the scene from the shadow-caster's point of view. During depth testing, I then compare the object id from the player camera's point of view with that from the shadow-caster's point of view and make it so that each object does not cast its own shadow onto itself:

[attachment=18554:givoxshadows8-2.jpg]

Now this is all good for direct lighting, because everything that is not directly lit I set to zero, including shadows, and then I add the indirect light to that zero - so there's a smooth transition between the shadow and the non-lit part of each object.

[attachment=18557:givoxshadleak2.jpg]

For indirectly lit scenes with no direct lighting at all (i.e. emissively lit by objects), things are a bit different. I don't separate a secondary bounce with the subsequent bounces, all bounces are tied together - thus I cannot just set a secondary bounce as the "direct lighting" and everything else including shadows to zero, then add the subsequent bounces. This would require an additional voxel texture and I would need to double the number of cone traces.

I cheat by making the shadowed parts of the scene darker than the non-shadowed parts (when a more accurate algorithm would be to make shadowed areas zero and add subsequent bounces to those areas). This, together with the removal of any self-shadowing leads to shadow leaking:

[attachment=18555:givoxshadleak1.jpg][attachment=18556:givoxshadleak0.jpg]

So I think I have two options:

  1. Add another voxel texture for the second bounce and double the number of cone traces (most expensive).
  2. Switch back to back-face rendering with front-face culling for the shadow mapping only for emissive lighting shadows (lots of ugly artifacts).

I wonder if anyone can come up with any other ideas.

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!

Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.


Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.

Actually I'm using the task manager to get the amount of ram that my application is using.


Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.

Actually I'm using the task manager to get the amount of ram that my application is using.

Sounds like you hit your video cards memory limit and the drivers are now using system memory - which is also why your frame rate tanks. Task Manager only shows system memory usage, not the memory internal to the video card.

Just a general idea regarding the light-info accumulation concept which was floating around my head for some time now and I finally want to get rid of :

Instead of cone-tracing per screen-pixel (which is how the technique works default wise IIRC), couldn't you seperate your view frustrum into cells (similar to what you do for clustered shading, but perhaps with cube-shaped cells), accumulate the light information in these represented by spherical harmonics using cone-tracing and finally use this SH - 'volume' to light your scene?

You would of course end up with low frequent information only suitable for diffuse lighting (like when using light propagation volumes, but still with less quantization since you would not (necessarily) propagate the information iteratively (or at least with fewer steps if you choose to do so to keep the trace range shorter)) but on the other hand you could probably reduce the amount of required cone-traces considerably (you also would only need to fill cells with intersecting geometry (if you choose not to propagate iteratively)) and, to some extend, resolve the correlation between the amount of traces and the output pixel count.

Just an idea.

That's a similar idea to what others already did, which is just downsample before tracing and then upsample the results (with some trickery for fine edges). The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with. Still, it's an idea if you're really performance bound.

I think I mentioned a similar idea but just for particles, which are going to be diffuse only anyway for the most part and would be really helpful with layers of transparency. And now that I think about it, it would also work well for highly distant objects. While specular doesn't actually fall off of course, anything but primary specular (say from the sun) shouldn't be too noticeable really far away.

As for transparency, "inferred" or stippled transparency rendering would be really useful for cone tracing. I'm not sure you could also downsample the tracing simultaneously, but it would still prevent tracing from multiple layers of transparency.

As for using a directed acylic graph. I've been thinking that you'd need to separately store albedo/position information, mipmap that, and then figure out a way to apply lighting to different portions dynamically and uniquely using the indirection table. If you're missing what I'm talking about, a Directed Acylic Graph would converge identical copies of voxel areas into just one copy, and then use a table or "indirection table" to direct the tracing to where each copied block was in worldspace.

The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with.

As I understand it, the diffuse part is actually the costly one because of the large amount of cones you need to trace per pixel in the default solution. So for rather sharp glossy highlights you could keep tracing them per pixel without the intermediate accumulation step into the SH-volume. But that's of course just the theory.

This topic is closed to new replies.

Advertisement