Voxel grid traversal algorithm

Started by
15 comments, last by matt77hias 6 years ago

What techniques does one normally use for traversing a voxel grid with a ray (i.e. voxel cone tracing)?

Does one use a traversal algorithm that does not skip voxels along a ray such as the ancient DDA (which requires quite some branching that can maybe be eliminated using some binary arithmetic tricks) or does one use something less precise such as ray marching using a fixed step size which has the potential of stepping over voxels?

🧙

Advertisement

I'm currently using the one that skips voxels on the way.

I've had some attempts on DDA, but the general problem with those is that they ended up being far inferior in performance.

I can even share:


// P - ray origin position
// V - ray direction
// cone_ratio - using 1 large cone lead to poor results, multiple ones with some sane ratio looks a lot better
// max_dist - maximum acceptable distance for cone to travel (usable for AO F.e.)
// step_mult - in case we want "faster" and even less-precise stepping
float4 ConeTraceGI(in float3 P, in float3 V, in float cone_ratio, in float max_dist, in float step_mult)
{
	float min_voxel_diameter = resolution.y;
	float min_voxel_diameter_inv = 1.0 / min_voxel_diameter;

	float4 accum = 0.f;

	// push out the starting point to avoid self-intersection
	float dist = min_voxel_diameter;

	// Marching!
	while (dist <= max_dist && accum.a < 1.0f)
	{
		// Calculate which level to sample
		float sample_diameter = max(min_voxel_diameter, cone_ratio * dist);
		float sample_lod = log2(sample_diameter * min_voxel_diameter_inv);

		// Step
		float3 sample_pos = P + V * dist;
		dist += sample_diameter * step_mult;

		// Sample from 3D texture (in performance superior to Sparse Voxel Octrees, hence use these)
		float4 sample_value = voxelData.SampleLevel(shadowSampler, sample_pos, sample_lod);

		float a = 1.0 - accum.a;
		accum += sample_value * a;
	}

	return accum;
}

This is the result for such GI (this is in-editor):

temp.thumb.png.315569ee9a4a2c4c3602cbc50e066e80.png

The performance is also no that bad (considering this is an editor - and we're running full deferred pipeline at 4x MSAA (including 4x MSAA for calculating GI and reflections) ... along with a lot of other things).

Overall I'm quite satisfied with my VXGI implementation. Even though I also have support for octree-based version (not 3D texture based one), it ends up being few times slower.

To also give you a hint on performance - here is another one with profiler shown:

temp.thumb.png.d1f58f8db66f054da35c2535f1c5f4e2.png

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

11 hours ago, Vilem Otte said:

Even though I also have support for octree-based version (not 3D texture based one), it ends up being few times slower.

Just an octree or an SVO?

The memory savings of the sparsity do not seem to outweigh the performance penalty, especially when the voxelization is computed every frame.

11 hours ago, Vilem Otte said:

float min_voxel_diameter = resolution.y;

This is just the size of the voxel cube, right?

 

11 hours ago, Vilem Otte said:

dist += sample_diameter * step_mult;

Why do you chose the step based on the diameter of the cone (sample_diameter) instead of the distance from the apex (dist)?

🧙

11 hours ago, Vilem Otte said:

To also give you a hint on performance - here is another one with profiler shown:

The "voxelize" entry says "0.00ms", shouldn't this be the most expensive one?

🧙

2 hours ago, matt77hias said:

Just an octree or an SVO?

The memory savings of the sparsity do not seem to outweigh the performance penalty, especially when the voxelization is computed every frame.

SVO - it is only necesasry when you literally couldn't fit your 3D texture into memory. The voxelization, building tree, etc. is not that expensive - the traversal is what is much more expensive. Also using properly hardware trilinear filtering ends up in increased memory usage.

I could eventually switch to it to show the differences (in memory and performance). And give you here more proper figures - if you want.

2 hours ago, matt77hias said:

This is just the size of the voxel cube, right?

Actually it's 1.0f / resolution - it's size of voxel cube within 0.0 - 1.0 boundaries (as we're just looking up texture).

2 hours ago, matt77hias said:

Why do you chose the step based on the diameter of the cone (sample_diameter) instead of the distance from the apex (dist)?

The sample_diameter is actually maximum from minimum voxel size (in texture space coordinates) and 'cone_ratio * dist' - where dist is distance travelled and cone_ratio is representing apex angle.

 

2 hours ago, matt77hias said:

The "voxelize" entry says "0.00ms", shouldn't this be the most expensive one?

Sorry for that, here is updated one (my tone mapping parameters are a bit 'overdone' though):

temp.thumb.png.90831b334fae9d20dccdd498afbcd32d.png

This one has disabled MSAA for GI and Reflection cone tracing buffers, and enabled MSAA for deferred shading (hence different numbers), this doesn't matter for voxelization though - it took ~4ms here. I enabled voxelization of whole scene every frame.

So why was there 0 before? Simply because I voxelize only 'when required'. This is especially useful in editor!

I can post more detailed analysis on that if you are interested in that part (how long actual voxelization takes, how long separate miplevel generation take - for specific resolutions, etc.), as I've written this into profiling back when I worked on SVO implementation - so it's just about switching some defines and writing it out.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Why doesn't one normally increase the distance such that the next iteration will sample another MIP?

🧙

It does, with each ray marching step increase dist amount (Ln. 25 in the code snippet).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

4 minutes ago, Vilem Otte said:

It does, with each ray marching step increase dist amount (Ln. 25 in the code snippet).

But sample_lod can stay the same between consecutive iterations?

🧙

20 minutes ago, matt77hias said:

But sample_lod can stay the same between consecutive iterations?

It's a float - therefore you will get fractions.

If I remember correctly hardware trilinear filtering will work like this (I wrote it from my head - so I might've made some typos!):


float xf = frac(coord.x * TEXTURE_SIZE_X);
float yf = frac(coord.y * TEXTURE_SIZE_Y);
float yz = frac(coord.y * TEXTURE_SIZE_Z);
uint m0 = (uint)miplevel;
uint m1 = min(m0 + 1, TEXTURE_MIPCOUNT);
float m = frac(miplevel);

float4 c0 = 
  lerp(
    lerp(
        lerp(texture.SampleLevel(sampler, coord + (0, 0, 0), m0), texture.SampleLevel(sampler, coord + (1, 0, 0), m0), xf),
        lerp(texture.SampleLevel(sampler, coord + (0, 1, 0), m0), texture.SampleLevel(sampler, coord + (1, 0, 0), m0), xf),
        yf),
    lerp(
        lerp(texture.SampleLevel(sampler, coord + (0, 0, 1), m0), texture.SampleLevel(sampler, coord + (1, 0, 1), m0), xf),
        lerp(texture.SampleLevel(sampler, coord + (0, 1, 1), m0), texture.SampleLevel(sampler, coord + (1, 0, 1), m0), xf),
        yf),
    zf);

float4 c1 = 
  lerp(
    lerp(
        lerp(texture.SampleLevel(sampler, coord + (0, 0, 0), m1), texture.SampleLevel(sampler, coord + (1, 0, 0), m1), xf),
        lerp(texture.SampleLevel(sampler, coord + (0, 1, 0), m1), texture.SampleLevel(sampler, coord + (1, 0, 0), m1), xf),
        yf),
    lerp(
        lerp(texture.SampleLevel(sampler, coord + (0, 0, 1), m1), texture.SampleLevel(sampler, coord + (1, 0, 1), m1), xf),
        lerp(texture.SampleLevel(sampler, coord + (0, 1, 1), m1), texture.SampleLevel(sampler, coord + (1, 0, 1), m1), xf),
        yf),
    zf);

return lerp(c0, c1, m);

So you actually interpolate between mip levels too - as per description here https://msdn.microsoft.com/en-us/library/windows/desktop/bb509699(v=vs.85).aspx:

LOD

[in] A number that specifies the mipmap level. If the value is ≤ 0, the zero'th (biggest map) is used. The fractional value (if supplied) is used to interpolate between two mipmap levels.

 

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

For VCT, I have written code very similiar to what Villem Otte just described. For general voxel visualization where I care much more about exactness than I do about performance, I've used DDA-like algorithms where I step through each voxel into its neighbor, potentially checking many empty voxels along the way. For debugging purposes this can be fast enough on a beefy desktop GPU. You can accelerate that march by generating a distance field from the voxels, which lets you skip empty space and potentially reduce your iteration counts by quite a bit. I've done this and it definitely helps, but you would probably have to go deeper and try to reduce intra-warp divergence if you really wanted to make things fast.

This topic is closed to new replies.

Advertisement