Voxel Cone Tracing Experiment - Part 2 Progress

Started by
53 comments, last by FreneticPonE 10 years, 3 months ago

Another reason for the transparency is that you are sampling the voxels along the view ray but none of the samples hits the exact center of an opaque voxel so they never achieve full opacity. This is because the voxel volume texture is configured to use bilinear filtering so whenever you sample an opaque voxel without hitting its center you get a partially transparent result that is the interpolation between the opaque voxel and the neighboring transparent ones.

Advertisement

Heh, I'll ask again since it seems my question was skipped: how are you downsampling your voxel data? This might play a role in the issue you're having.

I'm pretty certain I've solved my issue. But for your interest:

The same method that this guy uses:

http://www.geeks3d.com/20121214/voxel-cone-tracing-global-illumination-in-opengl-4-3/

I use hardware mip-mapping and then filtering with the compute shader by transferring values for each 3x3x3 voxels to neighbouring voxels in each direction.

Btw, I've also posted a video at the bottom of the start of my thread.


I think Frenetic mentioned to me once about something that I guess would be called "Voxel Temporal Anti-aliasing".
I tried to implement it but couldn't get it working - but now that I think about it - I only tried it for the lowest level voxels (highest res).
Majority of the flickering actually occurs at mid-level.

I think the way to implement it is to fade in/fade out the color of adjacent voxels based on the position of the object within each voxel so that it smoothly interpolates instead of jumping from 0 to 1.

I'm trying to re-attempt this but I just realised I don't actually know how to track how much each part of an object is within a voxel. Originally I had thought of accessing the geometry shader stage of my voxelization and storing the difference between the position of each voxel and triangle.

Then I set my input for alpha into the voxel texture as 1-density. But this doesn't help the flickering - it actually makes it worse because when my alpha value is set to one, it reduces the color bleeding of the entire scene. Thus it flickers between color bleeding and little color bleeding - leading to flashing.

In my attempts of trying to reduce flickering I have come across an unusual discovery:

There is much less mid-level flickering in the z direction than there is in the x direction. i.e. Flickering caused by voxels at lod = 1 and lod = 2 is much less in the z direction (as opposed to lod = 0 (densest level), where there is the same amount of flickering no matter what direction).

Looking at the debug view I noticed that when I switch to lod = 1 debug, there seems to be much more information captured in the voxels in the x direction, than in the z direction:

[attachment=18352:gibox-debug0.jpg]

As you can notice in the above image, the voxels are a lot thicker on the facing surfaces of the left and right walls than on the wall at the very back.

Originally, a while ago when I had first discovered this, I had thought that this must be caused by some incorrect filtering during the mip-map stage. But I have checked my code many many times and voxels are filtered in exactly the same manner in each direction.

There must be something that I'm doing wrong during the mip-mapping stage which requires further investigation.

So it turns out I had misinterpreted the original code that my method is based on. I am in fact manually mip-mapping to downsample my voxels.


#version 430

layout(local_size_x = 16, local_size_y = 8, local_size_z = 1) in;

layout(binding = 0, rgba8) uniform image3D srcMip;
layout(binding = 1, rgba8) uniform image3D dstMip;

uniform uint direction;

void main()
{
	ivec3 dstSize = imageSize(dstMip);

	if(gl_GlobalInvocationID.x >= dstSize.x || gl_GlobalInvocationID.y >= dstSize.y || gl_GlobalInvocationID.z >= dstSize.z){
		// out of range, ignore
	} else {
		ivec3 dstPos = ivec3(gl_GlobalInvocationID);
		ivec3 srcPos = dstPos*2;
		vec4 outColor;

		vec4 srcCol0 = imageLoad(srcMip, srcPos + ivec3(0,0,0));
		vec4 srcCol1 = imageLoad(srcMip, srcPos + ivec3(1,0,0));
		vec4 srcCol2 = imageLoad(srcMip, srcPos + ivec3(2,0,0));
		vec4 srcCol3 = imageLoad(srcMip, srcPos + ivec3(0,1,0));
		vec4 srcCol4 = imageLoad(srcMip, srcPos + ivec3(1,1,0));
		vec4 srcCol5 = imageLoad(srcMip, srcPos + ivec3(2,1,0));
		vec4 srcCol6 = imageLoad(srcMip, srcPos + ivec3(0,2,0));
		vec4 srcCol7 = imageLoad(srcMip, srcPos + ivec3(1,2,0));
		vec4 srcCol8 = imageLoad(srcMip, srcPos + ivec3(2,2,0));

		vec4 srcCol9 = imageLoad(srcMip, srcPos + ivec3(0,0,1));
		vec4 srcCol10 = imageLoad(srcMip, srcPos + ivec3(1,0,1));
		vec4 srcCol11 = imageLoad(srcMip, srcPos + ivec3(2,0,1));
		vec4 srcCol12 = imageLoad(srcMip, srcPos + ivec3(0,1,1));
		vec4 srcCol13 = imageLoad(srcMip, srcPos + ivec3(1,1,1));
		vec4 srcCol14 = imageLoad(srcMip, srcPos + ivec3(2,1,1));
		vec4 srcCol15 = imageLoad(srcMip, srcPos + ivec3(0,2,1));
		vec4 srcCol16 = imageLoad(srcMip, srcPos + ivec3(1,2,1));
		vec4 srcCol17 = imageLoad(srcMip, srcPos + ivec3(2,2,1));

		vec4 srcCol18 = imageLoad(srcMip, srcPos + ivec3(0,0,2));
		vec4 srcCol19 = imageLoad(srcMip, srcPos + ivec3(1,0,2));
		vec4 srcCol20 = imageLoad(srcMip, srcPos + ivec3(2,0,2));
		vec4 srcCol21 = imageLoad(srcMip, srcPos + ivec3(0,1,2));
		vec4 srcCol22 = imageLoad(srcMip, srcPos + ivec3(1,1,2));
		vec4 srcCol23 = imageLoad(srcMip, srcPos + ivec3(2,1,2));
		vec4 srcCol24 = imageLoad(srcMip, srcPos + ivec3(0,2,2));
		vec4 srcCol25 = imageLoad(srcMip, srcPos + ivec3(1,2,2));
		vec4 srcCol26 = imageLoad(srcMip, srcPos + ivec3(2,2,2));

	if(direction == 0) {
		//+X direction
		outColor.xyz = mix(srcCol0.xyz, srcCol1.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol1.xyz, srcCol2.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol3.xyz, srcCol4.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol4.xyz, srcCol5.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol6.xyz, srcCol7.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol7.xyz, srcCol8.xyz, 1.0 - srcCol7.w)

					+ mix(srcCol9.xyz, srcCol10.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol10.xyz, srcCol11.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol12.xyz, srcCol13.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol13.xyz, srcCol14.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol15.xyz, srcCol16.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol16.xyz, srcCol17.xyz, 1.0 - srcCol16.w)
					
					+ mix(srcCol18.xyz, srcCol19.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol19.xyz, srcCol20.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol21.xyz, srcCol22.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol22.xyz, srcCol23.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol24.xyz, srcCol25.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol25.xyz, srcCol26.xyz, 1.0 - srcCol25.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol20.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol24.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol26.w);

	} else if(direction == 1) {
		//-X direction
		outColor.xyz = mix(srcCol1.xyz, srcCol0.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol2.xyz, srcCol1.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol4.xyz, srcCol3.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol5.xyz, srcCol4.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol7.xyz, srcCol6.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol8.xyz, srcCol7.xyz, 1.0 - srcCol8.w)

					+ mix(srcCol10.xyz, srcCol9.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol11.xyz, srcCol10.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol13.xyz, srcCol12.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol14.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol16.xyz, srcCol15.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol17.xyz, srcCol16.xyz, 1.0 - srcCol17.w)
					
					+ mix(srcCol19.xyz, srcCol18.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol20.xyz, srcCol19.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol22.xyz, srcCol21.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol23.xyz, srcCol22.xyz, 1.0 - srcCol23.w)
					+ mix(srcCol25.xyz, srcCol24.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol26.xyz, srcCol25.xyz, 1.0 - srcCol26.w);
		outColor.w = 4.0 - (1.0 - srcCol1.w) * (1.0 - srcCol0.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol7.w)

						- (1.0 - srcCol10.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol16.w)
						
						- (1.0 - srcCol19.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol26.w) * (1.0 - srcCol25.w);

	} else if(direction == 2) {
		//+Y direction
		outColor.xyz = mix(srcCol0.xyz, srcCol3.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol3.xyz, srcCol6.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol1.xyz, srcCol4.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol4.xyz, srcCol7.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol2.xyz, srcCol5.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol5.xyz, srcCol8.xyz, 1.0 - srcCol5.w)

					+ mix(srcCol9.xyz, srcCol12.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol12.xyz, srcCol15.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol10.xyz, srcCol13.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol13.xyz, srcCol16.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol11.xyz, srcCol14.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol14.xyz, srcCol17.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol18.xyz, srcCol21.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol21.xyz, srcCol24.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol19.xyz, srcCol22.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol22.xyz, srcCol25.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol20.xyz, srcCol23.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol23.xyz, srcCol26.xyz, 1.0 - srcCol23.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol26.w);
	} 
	else if(direction == 3) {
		//-Y direction
		outColor.xyz = mix(srcCol3.xyz, srcCol0.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol6.xyz, srcCol3.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol4.xyz, srcCol1.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol7.xyz, srcCol4.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol5.xyz, srcCol2.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol8.xyz, srcCol5.xyz, 1.0 - srcCol8.w)
					
					+ mix(srcCol15.xyz, srcCol12.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol12.xyz, srcCol9.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol16.xyz, srcCol13.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol13.xyz, srcCol10.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol17.xyz, srcCol14.xyz, 1.0 - srcCol17.w)
					+ mix(srcCol14.xyz, srcCol11.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol24.xyz, srcCol21.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol21.xyz, srcCol18.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol25.xyz, srcCol22.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol22.xyz, srcCol19.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol26.xyz, srcCol23.xyz, 1.0 - srcCol26.w)
					+ mix(srcCol23.xyz, srcCol20.xyz, 1.0 - srcCol23.w);
		outColor.w = 4.0 - (1.0 - srcCol3.w) * (1.0 - srcCol0.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol5.w)
						
						- (1.0 - srcCol15.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol11.w)
						
						- (1.0 - srcCol24.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol20.w);
	} else if(direction == 4) {
		//+Z direction
		outColor.xyz = mix(srcCol0.xyz, srcCol9.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol9.xyz, srcCol18.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol1.xyz, srcCol10.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol10.xyz, srcCol19.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol2.xyz, srcCol11.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol11.xyz, srcCol20.xyz, 1.0 - srcCol11.w)

					+ mix(srcCol3.xyz, srcCol12.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol12.xyz, srcCol21.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol4.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol13.xyz, srcCol22.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol5.xyz, srcCol14.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol14.xyz, srcCol23.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol6.xyz, srcCol15.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol15.xyz, srcCol24.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol7.xyz, srcCol16.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol16.xyz, srcCol25.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol8.xyz, srcCol17.xyz, 1.0 - srcCol8.w)
					+ mix(srcCol17.xyz, srcCol26.xyz, 1.0 - srcCol17.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	} else if(direction == 5) {
		//-Z direction
		outColor.xyz = mix(srcCol9.xyz, srcCol0.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol18.xyz, srcCol9.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol10.xyz, srcCol1.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol19.xyz, srcCol10.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol11.xyz, srcCol2.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol20.xyz, srcCol11.xyz, 1.0 - srcCol20.w)

					+ mix(srcCol12.xyz, srcCol3.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol21.xyz, srcCol12.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol13.xyz, srcCol4.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol22.xyz, srcCol13.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol14.xyz, srcCol5.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol23.xyz, srcCol14.xyz, 1.0 - srcCol23.w)
					
					+ mix(srcCol15.xyz, srcCol6.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol24.xyz, srcCol15.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol16.xyz, srcCol7.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol25.xyz, srcCol16.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol17.xyz, srcCol8.xyz, 1.0 - srcCol17.w)
					+ mix(srcCol26.xyz, srcCol17.xyz, 1.0 - srcCol26.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	}
	
		outColor.xyz *= 0.2;

		imageStore(dstMip, dstPos, outColor);
	}
}

However, I figured out that I have only been transferring my values in the -X direction! because in my shader code:


void TextureManager::mipMapPass(GLuint shader, GLuint tex, int dir, int voxDim)
{
	glUseProgram(shader);

	int workGroupSize[3] = {};
	glGetProgramiv(shader, GL_COMPUTE_WORK_GROUP_SIZE, workGroupSize);
	if (workGroupSize[0] * workGroupSize[1] * workGroupSize[2] == 0){
		cout << "failed to load compute shader" << endl;
		return;
	}

	int mipLevels = GetBitIndex(voxDim) + 1;
	for (int mip = 1; mip < mipLevels; mip++)
	{
		glUniform1ui(glGetUniformLocation(shader, "direction"), dir);

		glBindImageTexture(0, tex, mip - 1, GL_TRUE, 0, GL_READ_ONLY, GL_RGBA8);
		glBindImageTexture(1, tex, mip, GL_TRUE, 0, GL_WRITE_ONLY, GL_RGBA8);

		glDispatchCompute(
			((voxDim >> 1) + workGroupSize[0] - 1) / workGroupSize[0],
			((voxDim >> 1) + workGroupSize[1] - 1) / workGroupSize[1],
			((voxDim >> 1) + workGroupSize[2] - 1) / workGroupSize[2]);
	}
}

I only run this function once with dir = 0.

The issue that I'm facing now is when I run this function 6 times for each direction, it doesn't accumulate the values in each direction, instead overwriting the 3d texture with the last direction.

If I use hardware generation of mipmaps and then do the transfer of values to neighbouring voxels in each direction for each mip level, the results come out wrong.

Here's what I've done to amend the problem - now all mip levels are filtered equally in every direction:


#version 430

layout(local_size_x = 16, local_size_y = 8, local_size_z = 1) in;

layout(binding = 0, rgba8) uniform image3D srcMip;
layout(binding = 1, rgba8) uniform image3D dstMip;

uniform uint direction;

void main()
{
	ivec3 dstSize = imageSize(dstMip);

	if(gl_GlobalInvocationID.x >= dstSize.x || gl_GlobalInvocationID.y >= dstSize.y || gl_GlobalInvocationID.z >= dstSize.z){
		// out of range, ignore
	} else {
		ivec3 dstPos = ivec3(gl_GlobalInvocationID);
		ivec3 srcPos = dstPos*2;
		vec4 outColor;

		vec4 srcCol0 = imageLoad(srcMip, srcPos + ivec3(-1,-1,-1));
		vec4 srcCol1 = imageLoad(srcMip, srcPos + ivec3(0,-1,-1));
		vec4 srcCol2 = imageLoad(srcMip, srcPos + ivec3(1,-1,-1));
		vec4 srcCol3 = imageLoad(srcMip, srcPos + ivec3(-1,0,-1));
		vec4 srcCol4 = imageLoad(srcMip, srcPos + ivec3(0,0,-1));
		vec4 srcCol5 = imageLoad(srcMip, srcPos + ivec3(1,0,-1));
		vec4 srcCol6 = imageLoad(srcMip, srcPos + ivec3(-1,1,-1));
		vec4 srcCol7 = imageLoad(srcMip, srcPos + ivec3(0,1,-1));
		vec4 srcCol8 = imageLoad(srcMip, srcPos + ivec3(1,1,-1));

		vec4 srcCol9 = imageLoad(srcMip, srcPos + ivec3(-1,-1,0));
		vec4 srcCol10 = imageLoad(srcMip, srcPos + ivec3(0,-1,0));
		vec4 srcCol11 = imageLoad(srcMip, srcPos + ivec3(1,-1,0));
		vec4 srcCol12 = imageLoad(srcMip, srcPos + ivec3(-1,0,0));
		vec4 srcCol13 = imageLoad(srcMip, srcPos + ivec3(0,0,0));
		vec4 srcCol14 = imageLoad(srcMip, srcPos + ivec3(1,0,0));
		vec4 srcCol15 = imageLoad(srcMip, srcPos + ivec3(-1,1,0));
		vec4 srcCol16 = imageLoad(srcMip, srcPos + ivec3(0,1,0));
		vec4 srcCol17 = imageLoad(srcMip, srcPos + ivec3(1,1,0));

		vec4 srcCol18 = imageLoad(srcMip, srcPos + ivec3(-1,-1,1));
		vec4 srcCol19 = imageLoad(srcMip, srcPos + ivec3(0,-1,1));
		vec4 srcCol20 = imageLoad(srcMip, srcPos + ivec3(1,-1,1));
		vec4 srcCol21 = imageLoad(srcMip, srcPos + ivec3(-1,0,1));
		vec4 srcCol22 = imageLoad(srcMip, srcPos + ivec3(0,0,1));
		vec4 srcCol23 = imageLoad(srcMip, srcPos + ivec3(1,0,1));
		vec4 srcCol24 = imageLoad(srcMip, srcPos + ivec3(-1,1,1));
		vec4 srcCol25 = imageLoad(srcMip, srcPos + ivec3(0,1,1));
		vec4 srcCol26 = imageLoad(srcMip, srcPos + ivec3(1,1,1));

		//+X direction
		outColor.xyz = mix(srcCol0.xyz, srcCol1.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol1.xyz, srcCol2.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol3.xyz, srcCol4.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol4.xyz, srcCol5.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol6.xyz, srcCol7.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol7.xyz, srcCol8.xyz, 1.0 - srcCol7.w)

					+ mix(srcCol9.xyz, srcCol10.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol10.xyz, srcCol11.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol12.xyz, srcCol13.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol13.xyz, srcCol14.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol15.xyz, srcCol16.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol16.xyz, srcCol17.xyz, 1.0 - srcCol16.w)
					
					+ mix(srcCol18.xyz, srcCol19.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol19.xyz, srcCol20.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol21.xyz, srcCol22.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol22.xyz, srcCol23.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol24.xyz, srcCol25.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol25.xyz, srcCol26.xyz, 1.0 - srcCol25.w)
		//+Y direction			
					+ mix(srcCol0.xyz, srcCol3.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol3.xyz, srcCol6.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol1.xyz, srcCol4.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol4.xyz, srcCol7.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol2.xyz, srcCol5.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol5.xyz, srcCol8.xyz, 1.0 - srcCol5.w)

					+ mix(srcCol9.xyz, srcCol12.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol12.xyz, srcCol15.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol10.xyz, srcCol13.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol13.xyz, srcCol16.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol11.xyz, srcCol14.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol14.xyz, srcCol17.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol18.xyz, srcCol21.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol21.xyz, srcCol24.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol19.xyz, srcCol22.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol22.xyz, srcCol25.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol20.xyz, srcCol23.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol23.xyz, srcCol26.xyz, 1.0 - srcCol23.w)
		//+Z direction			
					+ mix(srcCol0.xyz, srcCol9.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol9.xyz, srcCol18.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol1.xyz, srcCol10.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol10.xyz, srcCol19.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol2.xyz, srcCol11.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol11.xyz, srcCol20.xyz, 1.0 - srcCol11.w)

					+ mix(srcCol3.xyz, srcCol12.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol12.xyz, srcCol21.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol4.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol13.xyz, srcCol22.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol5.xyz, srcCol14.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol14.xyz, srcCol23.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol6.xyz, srcCol15.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol15.xyz, srcCol24.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol7.xyz, srcCol16.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol16.xyz, srcCol25.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol8.xyz, srcCol17.xyz, 1.0 - srcCol8.w)
					+ mix(srcCol17.xyz, srcCol26.xyz, 1.0 - srcCol17.w);
//+X direction
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol20.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol24.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol26.w)
//+Y direction						
						- (1.0 - srcCol0.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol26.w)
//+Z direction
						- (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	
		outColor.xyz *= 0.05;

		imageStore(dstMip, dstPos, outColor);
	}
}

I've made it so that voxels are filtered in each direction in a single pass. It is a little slower but still runs just under 20fps (probably on avg about 19fps).

Also, when I sample the voxels for the cone tracing I apply a 1/4 of a voxel offset for each mip level to balance it better.

[attachment=18371:giboxv4-0.png][attachment=18372:giboxv4-1.png]

[attachment=18373:giboxv4-2.png][attachment=18374:giboxv4-3.png]

So as you can see, things are starting to look a bit more physically accurate (even though theoretically its not).

Now I've just got to work out how to speed things up by turning this into a 3-pass (trilateral) filtering algorithm.

Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?


Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?

No, mip-mapping is still pretty cheap in the scale of things (but the cost may accumulate later when I try to implement cascades).

Right now the main bottlenecks for performances are soft-shadowing, ssR and ssao.

I haven't bothered to set up any sort of way of querying actual cost of each feature so I can't really tell you accurately where the costs would come from. All I can tell you is the framerate of what I remember from before I implemented soft-shadowing, ssR and ssao - where it was running at 50fps with the same cone tracing features (except now I have the modified mip-mapping). I think soft-shadowing (for main pointlight and 3 emissive objects) pushed it down to ~35fps, ssR pushed it down to about ~25fps and ssao pushed it to ~20fps.

I believe that there is a lot of cost to binding all these textures, so I think the next step for me in improving performance would be to dwell into bindless graphics. I also want to try and get partially resident textures working on my nvidia card with OpenGL 4.4, but I haven't found any resources to help me with this - has anyone been able to implement this?

It's well worth spending a day to implement a basic GPU profiling system (I guess using ARB_timer_query in GL? I've only done it in D3D so far.) so you can see how many milliseconds are being burnt by your different shaders/passes. You can then print them to the screen, or the console, or a file, etc, and get decent statistics for all of your different features in one go.

Measuring performance with FPS is quite annoying on the other hand, requiring you to take before/after FPS measurements when turning a feature on and off.
50fps = 20ms/frame for drawing the scene, voxelizing it, cone tracing and direct lighting?
35fps = 28.5ms/frame == 8.5ms increase for adding soft-shadowing
25fps = 40ms/frame == 11.5ms increase for adding SSR
20fps = 50ms/frame == 10ms increase for adding SSAO

I believe that there is a lot of cost to binding all these textures, so I think the next step for me in improving performance would be to dwell into bindless graphics.

That will probably only be a CPU-side optimization, and by the sounds of it, your application is probably bottlenecked by the GPU workload.

Awesome results BTW biggrin.png

It's well worth spending a day to implement a basic GPU profiling system (I guess using ARB_timer_query in GL? I've only done it in D3D so far.) so you can see how many milliseconds are being burnt by your different shaders/passes. You can then print them to the screen, or the console, or a file, etc, and get decent statistics for all of your different features in one go.

This is an excellent idea, especially since there are plenty of optimizations to pare down SSAO/SSR, those are pretty well established and researched. You'd really be looking at cone tracing for trying novel optimizations, though I can think of several already done.

One is downsampling before or otherwise binning together pixel blocks for the diffuse trace, which would work well with pixels relatively close to each other but miss thin and edge objects in the right cases. Epic also does this with the specular trace though they never explained how other than a hand wavy "then you upsample and scatter".

To reduce the number of samples for specular trace you can check a lower mip level of the volume for the alpha to see if its empty and if you should skip it. I'm also interested in, and may eventually get back to trying to figure out realtime signed distance fields. This should give you a minimum step size you can skip to for tracing, reducing the amount of samples you need.

As for a huge area (if you want to go that far), volume LOD and a Directed Acylic Graph (as I mentioned earlier) should reduce memory consumption a lot since you're using volume textures and not a sparse octree, though the paper is based on doing as such for an octree so I'm not sure how a uniform volume texture would play out.

This topic is closed to new replies.

Advertisement