Jump to content

  • Log In with Google      Sign In   
  • Create Account

Voxel Cone Tracing Experiment - Part 2 Progress


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
54 replies to this topic

#41 Che@ter   Members   -  Reputation: 248

Like
1Likes
Like

Posted 19 October 2013 - 06:46 AM

float L=0.1;
float4 T=0;
float3 NewPos;
for(int i=0;i<10;i++){
NewPos=RealPos+R*L; // RealPos - current position, R- reflection
T=mul(float4(NewPos,1),mat_ViewProj); // Projecting new position to screen.
T.xy=0.5+0.5*float2(1,-1)*T.xy/T.w;
NewPos=GetWorldPos( GBufferPositions.Load(uint2(gbufferDim.xy* T),0),T.xy,mat_ViewProjI); // Find world position

L=length(RealPos-NewPos); // new distance
}

T.xy - texturecoord of reflected pixel


Sponsor:

#42 gboxentertainment   Members   -  Reputation: 766

Like
3Likes
Like

Posted 20 October 2013 - 04:16 AM

I've managed to increase the speed of my ssR to 5.3ms at the cost of reduced quality by using variable step distance - so now i'm using 20 steps instead of 50.

 

giboxssr10.png

 

Even if I get it down to 10 steps and remove the additional backface cover, it will still be 3.1ms - is this fast enough? or can it be optimized further?



#43 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 26 October 2013 - 10:58 PM

So I've managed to remove some of the artifacts from my soft shadows:

Previously, when I had used front-face culling I got the following issue:

givoxshadows8-0.jpg

 

This was due to backfaces not being captured by the shadow-caster camera when at overlapping surfaces, thus leading to a gap of missing information in the depth test. There's also the issue of back-face self shadowing artifacts.

 

Using back-face culling (only rendering the front-face) resolves this problem, however, leads to the following problem:

givoxshadows8-1.jpg

Which is front-face self shadowing artifacts - any sort of bias does not resolve this problem because it is caused by the jittering process during depth testing.

 

I came up with a solution that resolves all these issues for direct lighting shadows, which is to also store an individual object id for each object in the scene from the shadow-caster's point of view. During depth testing, I then compare the object id from the player camera's point of view with that from the shadow-caster's point of view and make it so that each object does not cast its own shadow onto itself:

givoxshadows8-2.jpg

 

Now this is all good for direct lighting, because everything that is not directly lit I set to zero, including shadows, and then I add the indirect light to that zero - so there's a smooth transition between the shadow and the non-lit part of each object.

givoxshadleak2.jpg

 

For indirectly lit scenes with no direct lighting at all (i.e. emissively lit by objects), things are a bit different. I don't separate a secondary bounce with the subsequent bounces, all bounces are tied together - thus I cannot just set a secondary bounce as the "direct lighting" and everything else including shadows to zero, then add the subsequent bounces. This would require an additional voxel texture and I would need to double the number of cone traces.

I cheat by making the shadowed parts of the scene darker than the non-shadowed parts (when a more accurate algorithm would be to make shadowed areas zero and add subsequent bounces to those areas). This, together with the removal of any self-shadowing leads to shadow leaking:

givoxshadleak1.jpg givoxshadleak0.jpg

 

So I think I have two options:

  1. Add another voxel texture for the second bounce and double the number of cone traces (most expensive).
  2. Switch back to back-face rendering with front-face culling for the shadow mapping only for emissive lighting shadows (lots of ugly artifacts).

I wonder if anyone can come up with any other ideas.



#44 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 10 November 2013 - 02:25 AM

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

 

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

 

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!



#45 Frenetic Pony   Members   -  Reputation: 1313

Like
0Likes
Like

Posted 10 November 2013 - 04:17 PM

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

 

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

 

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!

 

Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.


Edited by Frenetic Pony, 10 November 2013 - 04:17 PM.


#46 gboxentertainment   Members   -  Reputation: 766

Like
0Likes
Like

Posted 10 November 2013 - 04:24 PM


Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.

 

Actually I'm using the task manager to get the amount of ram that my application is using.



#47 Digitalfragment   Members   -  Reputation: 845

Like
3Likes
Like

Posted 10 November 2013 - 04:42 PM

 


Obviously your profiler is broken somehow, as I doubt your experiment manages to hold ever increasing data in the same exact amount of ram.

 

Actually I'm using the task manager to get the amount of ram that my application is using.

 

Sounds like you hit your video cards memory limit and the drivers are now using system memory - which is also why your frame rate tanks. Task Manager only shows system memory usage, not the memory internal to the video card.



#48 Tasty Texel   Members   -  Reputation: 1295

Like
1Likes
Like

Posted 11 November 2013 - 04:43 AM

Just a general idea regarding the light-info accumulation concept which was floating around my head for some time now and I finally want to get rid of :

Instead of cone-tracing per screen-pixel (which is how the technique works default wise IIRC), couldn't you seperate your view frustrum into cells (similar to what you do for clustered shading, but perhaps with cube-shaped cells), accumulate the light information in these represented by spherical harmonics using cone-tracing and finally use this SH - 'volume' to light your scene?

You would of course end up with low frequent information only suitable for diffuse lighting (like when using light propagation volumes, but still with less quantization since you would not (necessarily) propagate the information iteratively (or at least with fewer steps if you choose to do so to keep the trace range shorter)) but on the other hand you could probably reduce the amount of required cone-traces considerably (you also would only need to fill cells with intersecting geometry (if you choose not to propagate iteratively)) and, to some extend, resolve the correlation between the amount of traces and the output pixel count.

Just an idea.


Edited by Bummel, 11 November 2013 - 04:57 AM.


#49 Frenetic Pony   Members   -  Reputation: 1313

Like
1Likes
Like

Posted 11 November 2013 - 03:33 PM

That's a similar idea to what others already did, which is just downsample before tracing and then upsample the results (with some trickery for fine edges). The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with. Still, it's an idea if you're really performance bound.

 

I think I mentioned a similar idea but just for particles, which are going to be diffuse only anyway for the most part and would be really helpful with layers of transparency. And now that I think about it, it would also work well for highly distant objects. While specular doesn't actually fall off of course, anything but primary specular (say from the sun) shouldn't be too noticeable really far away.

 

As for transparency, "inferred" or stippled transparency rendering would be really useful for cone tracing. I'm not sure you could also downsample the tracing simultaneously, but it would still prevent tracing from multiple layers of transparency.

 

As for using a directed acylic graph. I've been thinking that you'd need to separately store albedo/position information, mipmap that, and then figure out a way to apply lighting to different portions dynamically and uniquely using the indirection table. If you're missing what I'm talking about, a Directed Acylic Graph would converge identical copies of voxel areas into just one copy, and then use a table or "indirection table" to direct the tracing to where each copied block was in worldspace.



#50 Tasty Texel   Members   -  Reputation: 1295

Like
0Likes
Like

Posted 11 November 2013 - 04:21 PM

 The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with.

As I understand it, the diffuse part is actually the costly one because of the large amount of cones you need to trace per pixel in the default solution. So for rather sharp glossy highlights you could keep tracing them per pixel without the intermediate accumulation step into the SH-volume. But that's of course just the theory.



#51 Frenetic Pony   Members   -  Reputation: 1313

Like
0Likes
Like

Posted 11 November 2013 - 05:05 PM

 

 The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with.

As I understand it, the diffuse part is actually the costly one because of the large amount of cones you need to trace per pixel in the default solution. So for rather sharp glossy highlights you could keep tracing them per pixel without the intermediate accumulation step into the SH-volume. But that's of course just the theory.

 

 

Actually, it's the specular that's the most costly. This requires the most samples, especially for the high gloss, as you need to keep tracing through the octree/texture until you hit the right mip level or even the right voxel. Since it's all mipmapped diffuse may need several traces, but relatively few samples as you only need to trace a few steps into the tree.

 

But I hadn't thought of doing diffuse as a cell structure while keeping specular per pixel. It might not be a huge savings but it would be a savings.


Edited by Frenetic Pony, 11 November 2013 - 05:07 PM.


#52 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 16 November 2013 - 10:05 PM


Sounds like you hit your video cards memory limit and the drivers are now using system memory - which is also why your frame rate tanks. Task Manager only shows system memory usage, not the memory internal to the video card.

 

Good point. So it turns out that it was to do with my voxel visualizer that was causing the massive increase in system ram. I've turned that off and it doesn't seem to have any effect on framerate.

Looking at gpu ram, it makes sense now - 64 voxel depth (with all other resources) uses up about 750mb. This increases to 1.8gb when using 512 voxel depth.


Edited by gboxentertainment, 17 November 2013 - 06:26 AM.


#53 gboxentertainment   Members   -  Reputation: 766

Like
0Likes
Like

Posted 21 December 2013 - 09:01 PM

Now I haven't posted for a while on my progress with this engine - that's because I've been too busy at work and have put a halt to any development. However, now I'm willing to startup again during the holidays.

 

My current scene is only very small and I am planning to extend it efficiently to a larger world. I want to stay away from octrees for now and cascades are out of the question due to the many artifacts that it results in.

 

My idea is a substitute for partially resident textures on video cards that do not support it yet. I have found that the optimal resolution is 64x64x64 voxels and that there is little difference in quality between this and 32x32x32. I want to create a grid of voxel textures in a way where the camera will be located in a 64x64x64 voxel texture which is surrounded by 32x32x32 voxel textures at every dimension. When the camera travels outside of that voxel volume, the next volume will become 64x64x64 resolution and the previous one will become 32x32x32. I'm hoping that i can trace cones into multiple voxel textures by using some sort of offset.

 

Has anyone tried something similar before?



#54 Kaptein   Prime Members   -  Reputation: 2150

Like
0Likes
Like

Posted 22 December 2013 - 04:58 AM

I also don't use octtrees, rather my own very specific implementation that works the same way, but only for culling.

First I check which direction the camera is pointing towards the most, then I cut off planes of voxels I don't need to render. After that It's (Octtree-like) recursive frustum tests.

The memory footprint is kind of high, but I don't mind. My card has 2gb vid mem, which I think should be the minimum :P

If you need to extend the size, the only thing that really helps is more video memory. I don't think there is a way to get high fps by modifying alot of memory mid-frame.

 

You also have to consider the fact that once things "really get going," you don't have the luxury of mostly homogenous volumes anymore.

Not sure how you can do what you want to there, since you have to rebuild textures? Do you have any collision with the world data?

 

I don't really do the same thing you do.. With 80x32x80 sectors i get 72fps (144/2) when I reflect the world for my ocean shader. Where one sector is 16x8x16 voxels.

I also don't have any kind of AA, yet. I had to forego MSAA because it required way too much effort, for too little gain.

 

Do you separate your processes into the smallest possible shaders? If done right it can increase your FPS alot, overall. If a single shader does too much it may just become painfully slow, especially on hardware you and I aren't testing on.

Do you produce depth textures? If you don't need the alpha component you can just write the linear depth into .a in your color texture, and from that you can skip a texture read in many places.



#55 Frenetic Pony   Members   -  Reputation: 1313

Like
0Likes
Like

Posted 22 December 2013 - 05:18 PM

Honestly, if I'm understanding what you want to do correctly, it just sounds a lot like cascades. Besides decreased... texel resolution of the voxels, voxel resolution to world space, you get the idea. Anyway, that should be something that's always done, you're going to need less resolution the farther away you go from origin anyway, even to approximate high frequency reflections this should true enough. But as you said, there' trouble with cascade transitions.

 

I'm not sure I've heard of anyone doing as such though. And if you don't find anything to voxelize just clearing the entire voxel texture to a format of "empty/full" should save a lot of memory. You dont' want to waste a lot of memory on completely empty space.
 






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS