Jump to content

  • Log In with Google      Sign In   
  • Create Account

Voxel Cone Tracing Experiment - Part 2 Progress


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
54 replies to this topic

#21 jcabeleira   Members   -  Reputation: 686

Like
1Likes
Like

Posted 02 October 2013 - 08:30 AM

Another reason for the transparency is that you are sampling the voxels along the view ray but none of the samples hits the exact center of an opaque voxel so they never achieve full opacity. This is because the voxel volume texture is configured to use bilinear filtering so whenever you sample an opaque voxel without hitting its center you get a partially transparent result that is the interpolation between the opaque voxel and the neighboring transparent ones.



Sponsor:

#22 gboxentertainment   Members   -  Reputation: 766

Like
0Likes
Like

Posted 05 October 2013 - 01:31 AM


Heh, I'll ask again since it seems my question was skipped: how are you downsampling your voxel data? This might play a role in the issue you're having.

 

I'm pretty certain I've solved my issue. But for your interest:

The same method that this guy uses:

http://www.geeks3d.com/20121214/voxel-cone-tracing-global-illumination-in-opengl-4-3/

 

I use hardware mip-mapping and then filtering with the compute shader by transferring values for each 3x3x3 voxels to neighbouring voxels in each direction.

 

Btw, I've also posted a video at the bottom of the start of my thread.


Edited by gboxentertainment, 05 October 2013 - 01:32 AM.


#23 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 06 October 2013 - 03:11 AM


I think Frenetic mentioned to me once about something that I guess would be called "Voxel Temporal Anti-aliasing".
I tried to implement it but couldn't get it working - but now that I think about it - I only tried it for the lowest level voxels (highest res).
Majority of the flickering actually occurs at mid-level.
 
I think the way to implement it is to fade in/fade out the color of adjacent voxels based on the position of the object within each voxel so that it smoothly interpolates instead of jumping from 0 to 1.

 

I'm trying to re-attempt this but I just realised I don't actually know how to track how much each part of an object is within a voxel. Originally I had thought of accessing the geometry shader stage of my voxelization and storing the difference between the position of each voxel and triangle.

Then I set my input for alpha into the voxel texture as 1-density. But this doesn't help the flickering - it actually makes it worse because when my alpha value is set to one, it reduces the color bleeding of the entire scene. Thus it flickers between color bleeding and little color bleeding - leading to flashing.


Edited by gboxentertainment, 06 October 2013 - 04:05 AM.


#24 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 12 October 2013 - 04:00 AM

In my attempts of trying to reduce flickering I have come across an unusual discovery:

 

There is much less mid-level flickering in the z direction than there is in the x direction. i.e. Flickering caused by voxels at lod = 1 and lod = 2 is much less in the z direction (as opposed to lod = 0 (densest level), where there is the same amount of flickering no matter what direction).

 

Looking at the debug view I noticed that when I switch to lod = 1 debug, there seems to be much more information captured in the voxels in the x direction, than in the z direction:

 

gibox-debug0.jpg

 

As you can notice in the above image, the voxels are a lot thicker on the facing surfaces of the left and right walls than on the wall at the very back.

 

Originally, a while ago when I had first discovered this, I had thought that this must be caused by some incorrect filtering during the mip-map stage. But I have checked my code many many times and voxels are filtered in exactly the same manner in each direction.

 

There must be something that I'm doing wrong during the mip-mapping stage which requires further investigation.



#25 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 12 October 2013 - 09:56 PM

So it turns out I had misinterpreted the original code that my method is based on. I am in fact manually mip-mapping to downsample my voxels.

#version 430

layout(local_size_x = 16, local_size_y = 8, local_size_z = 1) in;

layout(binding = 0, rgba8) uniform image3D srcMip;
layout(binding = 1, rgba8) uniform image3D dstMip;

uniform uint direction;

void main()
{
	ivec3 dstSize = imageSize(dstMip);

	if(gl_GlobalInvocationID.x >= dstSize.x || gl_GlobalInvocationID.y >= dstSize.y || gl_GlobalInvocationID.z >= dstSize.z){
		// out of range, ignore
	} else {
		ivec3 dstPos = ivec3(gl_GlobalInvocationID);
		ivec3 srcPos = dstPos*2;
		vec4 outColor;

		vec4 srcCol0 = imageLoad(srcMip, srcPos + ivec3(0,0,0));
		vec4 srcCol1 = imageLoad(srcMip, srcPos + ivec3(1,0,0));
		vec4 srcCol2 = imageLoad(srcMip, srcPos + ivec3(2,0,0));
		vec4 srcCol3 = imageLoad(srcMip, srcPos + ivec3(0,1,0));
		vec4 srcCol4 = imageLoad(srcMip, srcPos + ivec3(1,1,0));
		vec4 srcCol5 = imageLoad(srcMip, srcPos + ivec3(2,1,0));
		vec4 srcCol6 = imageLoad(srcMip, srcPos + ivec3(0,2,0));
		vec4 srcCol7 = imageLoad(srcMip, srcPos + ivec3(1,2,0));
		vec4 srcCol8 = imageLoad(srcMip, srcPos + ivec3(2,2,0));

		vec4 srcCol9 = imageLoad(srcMip, srcPos + ivec3(0,0,1));
		vec4 srcCol10 = imageLoad(srcMip, srcPos + ivec3(1,0,1));
		vec4 srcCol11 = imageLoad(srcMip, srcPos + ivec3(2,0,1));
		vec4 srcCol12 = imageLoad(srcMip, srcPos + ivec3(0,1,1));
		vec4 srcCol13 = imageLoad(srcMip, srcPos + ivec3(1,1,1));
		vec4 srcCol14 = imageLoad(srcMip, srcPos + ivec3(2,1,1));
		vec4 srcCol15 = imageLoad(srcMip, srcPos + ivec3(0,2,1));
		vec4 srcCol16 = imageLoad(srcMip, srcPos + ivec3(1,2,1));
		vec4 srcCol17 = imageLoad(srcMip, srcPos + ivec3(2,2,1));

		vec4 srcCol18 = imageLoad(srcMip, srcPos + ivec3(0,0,2));
		vec4 srcCol19 = imageLoad(srcMip, srcPos + ivec3(1,0,2));
		vec4 srcCol20 = imageLoad(srcMip, srcPos + ivec3(2,0,2));
		vec4 srcCol21 = imageLoad(srcMip, srcPos + ivec3(0,1,2));
		vec4 srcCol22 = imageLoad(srcMip, srcPos + ivec3(1,1,2));
		vec4 srcCol23 = imageLoad(srcMip, srcPos + ivec3(2,1,2));
		vec4 srcCol24 = imageLoad(srcMip, srcPos + ivec3(0,2,2));
		vec4 srcCol25 = imageLoad(srcMip, srcPos + ivec3(1,2,2));
		vec4 srcCol26 = imageLoad(srcMip, srcPos + ivec3(2,2,2));

	if(direction == 0) {
		//+X direction
		outColor.xyz = mix(srcCol0.xyz, srcCol1.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol1.xyz, srcCol2.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol3.xyz, srcCol4.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol4.xyz, srcCol5.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol6.xyz, srcCol7.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol7.xyz, srcCol8.xyz, 1.0 - srcCol7.w)

					+ mix(srcCol9.xyz, srcCol10.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol10.xyz, srcCol11.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol12.xyz, srcCol13.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol13.xyz, srcCol14.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol15.xyz, srcCol16.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol16.xyz, srcCol17.xyz, 1.0 - srcCol16.w)
					
					+ mix(srcCol18.xyz, srcCol19.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol19.xyz, srcCol20.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol21.xyz, srcCol22.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol22.xyz, srcCol23.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol24.xyz, srcCol25.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol25.xyz, srcCol26.xyz, 1.0 - srcCol25.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol20.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol24.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol26.w);

	} else if(direction == 1) {
		//-X direction
		outColor.xyz = mix(srcCol1.xyz, srcCol0.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol2.xyz, srcCol1.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol4.xyz, srcCol3.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol5.xyz, srcCol4.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol7.xyz, srcCol6.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol8.xyz, srcCol7.xyz, 1.0 - srcCol8.w)

					+ mix(srcCol10.xyz, srcCol9.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol11.xyz, srcCol10.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol13.xyz, srcCol12.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol14.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol16.xyz, srcCol15.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol17.xyz, srcCol16.xyz, 1.0 - srcCol17.w)
					
					+ mix(srcCol19.xyz, srcCol18.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol20.xyz, srcCol19.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol22.xyz, srcCol21.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol23.xyz, srcCol22.xyz, 1.0 - srcCol23.w)
					+ mix(srcCol25.xyz, srcCol24.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol26.xyz, srcCol25.xyz, 1.0 - srcCol26.w);
		outColor.w = 4.0 - (1.0 - srcCol1.w) * (1.0 - srcCol0.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol7.w)

						- (1.0 - srcCol10.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol16.w)
						
						- (1.0 - srcCol19.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol26.w) * (1.0 - srcCol25.w);

	} else if(direction == 2) {
		//+Y direction
		outColor.xyz = mix(srcCol0.xyz, srcCol3.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol3.xyz, srcCol6.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol1.xyz, srcCol4.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol4.xyz, srcCol7.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol2.xyz, srcCol5.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol5.xyz, srcCol8.xyz, 1.0 - srcCol5.w)

					+ mix(srcCol9.xyz, srcCol12.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol12.xyz, srcCol15.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol10.xyz, srcCol13.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol13.xyz, srcCol16.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol11.xyz, srcCol14.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol14.xyz, srcCol17.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol18.xyz, srcCol21.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol21.xyz, srcCol24.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol19.xyz, srcCol22.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol22.xyz, srcCol25.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol20.xyz, srcCol23.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol23.xyz, srcCol26.xyz, 1.0 - srcCol23.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol26.w);
	} 
	else if(direction == 3) {
		//-Y direction
		outColor.xyz = mix(srcCol3.xyz, srcCol0.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol6.xyz, srcCol3.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol4.xyz, srcCol1.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol7.xyz, srcCol4.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol5.xyz, srcCol2.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol8.xyz, srcCol5.xyz, 1.0 - srcCol8.w)
					
					+ mix(srcCol15.xyz, srcCol12.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol12.xyz, srcCol9.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol16.xyz, srcCol13.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol13.xyz, srcCol10.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol17.xyz, srcCol14.xyz, 1.0 - srcCol17.w)
					+ mix(srcCol14.xyz, srcCol11.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol24.xyz, srcCol21.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol21.xyz, srcCol18.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol25.xyz, srcCol22.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol22.xyz, srcCol19.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol26.xyz, srcCol23.xyz, 1.0 - srcCol26.w)
					+ mix(srcCol23.xyz, srcCol20.xyz, 1.0 - srcCol23.w);
		outColor.w = 4.0 - (1.0 - srcCol3.w) * (1.0 - srcCol0.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol5.w)
						
						- (1.0 - srcCol15.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol11.w)
						
						- (1.0 - srcCol24.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol20.w);
	} else if(direction == 4) {
		//+Z direction
		outColor.xyz = mix(srcCol0.xyz, srcCol9.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol9.xyz, srcCol18.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol1.xyz, srcCol10.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol10.xyz, srcCol19.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol2.xyz, srcCol11.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol11.xyz, srcCol20.xyz, 1.0 - srcCol11.w)

					+ mix(srcCol3.xyz, srcCol12.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol12.xyz, srcCol21.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol4.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol13.xyz, srcCol22.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol5.xyz, srcCol14.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol14.xyz, srcCol23.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol6.xyz, srcCol15.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol15.xyz, srcCol24.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol7.xyz, srcCol16.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol16.xyz, srcCol25.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol8.xyz, srcCol17.xyz, 1.0 - srcCol8.w)
					+ mix(srcCol17.xyz, srcCol26.xyz, 1.0 - srcCol17.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	} else if(direction == 5) {
		//-Z direction
		outColor.xyz = mix(srcCol9.xyz, srcCol0.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol18.xyz, srcCol9.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol10.xyz, srcCol1.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol19.xyz, srcCol10.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol11.xyz, srcCol2.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol20.xyz, srcCol11.xyz, 1.0 - srcCol20.w)

					+ mix(srcCol12.xyz, srcCol3.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol21.xyz, srcCol12.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol13.xyz, srcCol4.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol22.xyz, srcCol13.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol14.xyz, srcCol5.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol23.xyz, srcCol14.xyz, 1.0 - srcCol23.w)
					
					+ mix(srcCol15.xyz, srcCol6.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol24.xyz, srcCol15.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol16.xyz, srcCol7.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol25.xyz, srcCol16.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol17.xyz, srcCol8.xyz, 1.0 - srcCol17.w)
					+ mix(srcCol26.xyz, srcCol17.xyz, 1.0 - srcCol26.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	}
	
		outColor.xyz *= 0.2;

		imageStore(dstMip, dstPos, outColor);
	}
}

However, I figured out that I have only been transferring my values in the -X direction! because in my shader code:

void TextureManager::mipMapPass(GLuint shader, GLuint tex, int dir, int voxDim)
{
	glUseProgram(shader);

	int workGroupSize[3] = {};
	glGetProgramiv(shader, GL_COMPUTE_WORK_GROUP_SIZE, workGroupSize);
	if (workGroupSize[0] * workGroupSize[1] * workGroupSize[2] == 0){
		cout << "failed to load compute shader" << endl;
		return;
	}

	int mipLevels = GetBitIndex(voxDim) + 1;
	for (int mip = 1; mip < mipLevels; mip++)
	{
		glUniform1ui(glGetUniformLocation(shader, "direction"), dir);

		glBindImageTexture(0, tex, mip - 1, GL_TRUE, 0, GL_READ_ONLY, GL_RGBA8);
		glBindImageTexture(1, tex, mip, GL_TRUE, 0, GL_WRITE_ONLY, GL_RGBA8);

		glDispatchCompute(
			((voxDim >> 1) + workGroupSize[0] - 1) / workGroupSize[0],
			((voxDim >> 1) + workGroupSize[1] - 1) / workGroupSize[1],
			((voxDim >> 1) + workGroupSize[2] - 1) / workGroupSize[2]);
	}
}

I only run this function once with dir = 0.

 

The issue that I'm facing now is when I run this function 6 times for each direction, it doesn't accumulate the values in each direction, instead overwriting the 3d texture with the last direction.

If I use hardware generation of mipmaps and then do the transfer of values to neighbouring voxels in each direction for each mip level, the results come out wrong.


Edited by gboxentertainment, 12 October 2013 - 09:56 PM.


#26 gboxentertainment   Members   -  Reputation: 766

Like
5Likes
Like

Posted 13 October 2013 - 05:54 AM

Here's what I've done to amend the problem - now all mip levels are filtered equally in every direction:

#version 430

layout(local_size_x = 16, local_size_y = 8, local_size_z = 1) in;

layout(binding = 0, rgba8) uniform image3D srcMip;
layout(binding = 1, rgba8) uniform image3D dstMip;

uniform uint direction;

void main()
{
	ivec3 dstSize = imageSize(dstMip);

	if(gl_GlobalInvocationID.x >= dstSize.x || gl_GlobalInvocationID.y >= dstSize.y || gl_GlobalInvocationID.z >= dstSize.z){
		// out of range, ignore
	} else {
		ivec3 dstPos = ivec3(gl_GlobalInvocationID);
		ivec3 srcPos = dstPos*2;
		vec4 outColor;

		vec4 srcCol0 = imageLoad(srcMip, srcPos + ivec3(-1,-1,-1));
		vec4 srcCol1 = imageLoad(srcMip, srcPos + ivec3(0,-1,-1));
		vec4 srcCol2 = imageLoad(srcMip, srcPos + ivec3(1,-1,-1));
		vec4 srcCol3 = imageLoad(srcMip, srcPos + ivec3(-1,0,-1));
		vec4 srcCol4 = imageLoad(srcMip, srcPos + ivec3(0,0,-1));
		vec4 srcCol5 = imageLoad(srcMip, srcPos + ivec3(1,0,-1));
		vec4 srcCol6 = imageLoad(srcMip, srcPos + ivec3(-1,1,-1));
		vec4 srcCol7 = imageLoad(srcMip, srcPos + ivec3(0,1,-1));
		vec4 srcCol8 = imageLoad(srcMip, srcPos + ivec3(1,1,-1));

		vec4 srcCol9 = imageLoad(srcMip, srcPos + ivec3(-1,-1,0));
		vec4 srcCol10 = imageLoad(srcMip, srcPos + ivec3(0,-1,0));
		vec4 srcCol11 = imageLoad(srcMip, srcPos + ivec3(1,-1,0));
		vec4 srcCol12 = imageLoad(srcMip, srcPos + ivec3(-1,0,0));
		vec4 srcCol13 = imageLoad(srcMip, srcPos + ivec3(0,0,0));
		vec4 srcCol14 = imageLoad(srcMip, srcPos + ivec3(1,0,0));
		vec4 srcCol15 = imageLoad(srcMip, srcPos + ivec3(-1,1,0));
		vec4 srcCol16 = imageLoad(srcMip, srcPos + ivec3(0,1,0));
		vec4 srcCol17 = imageLoad(srcMip, srcPos + ivec3(1,1,0));

		vec4 srcCol18 = imageLoad(srcMip, srcPos + ivec3(-1,-1,1));
		vec4 srcCol19 = imageLoad(srcMip, srcPos + ivec3(0,-1,1));
		vec4 srcCol20 = imageLoad(srcMip, srcPos + ivec3(1,-1,1));
		vec4 srcCol21 = imageLoad(srcMip, srcPos + ivec3(-1,0,1));
		vec4 srcCol22 = imageLoad(srcMip, srcPos + ivec3(0,0,1));
		vec4 srcCol23 = imageLoad(srcMip, srcPos + ivec3(1,0,1));
		vec4 srcCol24 = imageLoad(srcMip, srcPos + ivec3(-1,1,1));
		vec4 srcCol25 = imageLoad(srcMip, srcPos + ivec3(0,1,1));
		vec4 srcCol26 = imageLoad(srcMip, srcPos + ivec3(1,1,1));

		//+X direction
		outColor.xyz = mix(srcCol0.xyz, srcCol1.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol1.xyz, srcCol2.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol3.xyz, srcCol4.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol4.xyz, srcCol5.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol6.xyz, srcCol7.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol7.xyz, srcCol8.xyz, 1.0 - srcCol7.w)

					+ mix(srcCol9.xyz, srcCol10.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol10.xyz, srcCol11.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol12.xyz, srcCol13.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol13.xyz, srcCol14.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol15.xyz, srcCol16.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol16.xyz, srcCol17.xyz, 1.0 - srcCol16.w)
					
					+ mix(srcCol18.xyz, srcCol19.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol19.xyz, srcCol20.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol21.xyz, srcCol22.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol22.xyz, srcCol23.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol24.xyz, srcCol25.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol25.xyz, srcCol26.xyz, 1.0 - srcCol25.w)
		//+Y direction			
					+ mix(srcCol0.xyz, srcCol3.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol3.xyz, srcCol6.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol1.xyz, srcCol4.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol4.xyz, srcCol7.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol2.xyz, srcCol5.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol5.xyz, srcCol8.xyz, 1.0 - srcCol5.w)

					+ mix(srcCol9.xyz, srcCol12.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol12.xyz, srcCol15.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol10.xyz, srcCol13.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol13.xyz, srcCol16.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol11.xyz, srcCol14.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol14.xyz, srcCol17.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol18.xyz, srcCol21.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol21.xyz, srcCol24.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol19.xyz, srcCol22.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol22.xyz, srcCol25.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol20.xyz, srcCol23.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol23.xyz, srcCol26.xyz, 1.0 - srcCol23.w)
		//+Z direction			
					+ mix(srcCol0.xyz, srcCol9.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol9.xyz, srcCol18.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol1.xyz, srcCol10.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol10.xyz, srcCol19.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol2.xyz, srcCol11.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol11.xyz, srcCol20.xyz, 1.0 - srcCol11.w)

					+ mix(srcCol3.xyz, srcCol12.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol12.xyz, srcCol21.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol4.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol13.xyz, srcCol22.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol5.xyz, srcCol14.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol14.xyz, srcCol23.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol6.xyz, srcCol15.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol15.xyz, srcCol24.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol7.xyz, srcCol16.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol16.xyz, srcCol25.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol8.xyz, srcCol17.xyz, 1.0 - srcCol8.w)
					+ mix(srcCol17.xyz, srcCol26.xyz, 1.0 - srcCol17.w);
//+X direction
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol20.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol24.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol26.w)
//+Y direction						
						- (1.0 - srcCol0.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol26.w)
//+Z direction
						- (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	
		outColor.xyz *= 0.05;

		imageStore(dstMip, dstPos, outColor);
	}
}

I've made it so that voxels are filtered in each direction in a single pass. It is a little slower but still runs just under 20fps (probably on avg about 19fps).

 

Also, when I sample the voxels for the cone tracing I apply a 1/4 of a voxel offset for each mip level to balance it better.

 

giboxv4-0.png giboxv4-1.png

giboxv4-2.png giboxv4-3.png

 

So as you can see, things are starting to look a bit more physically accurate (even though theoretically its not).

 

Now I've just got to work out how to speed things up by turning this into a 3-pass (trilateral) filtering algorithm.


Edited by gboxentertainment, 13 October 2013 - 05:56 AM.


#27 Frenetic Pony   Members   -  Reputation: 1312

Like
0Likes
Like

Posted 13 October 2013 - 03:00 PM

Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?



#28 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 13 October 2013 - 03:34 PM


Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?

 

No, mip-mapping is still pretty cheap in the scale of things (but the cost may accumulate later when I try to implement cascades).

Right now the main bottlenecks for performances are soft-shadowing, ssR and ssao.

I haven't bothered to set up any sort of way of querying actual cost of each feature so I can't really tell you accurately where the costs would come from. All I can tell you is the framerate of what I remember from before I implemented soft-shadowing, ssR and ssao - where it was running at 50fps with the same cone tracing features (except now I have the modified mip-mapping). I think soft-shadowing (for main pointlight and 3 emissive objects) pushed it down to ~35fps, ssR pushed it down to about ~25fps and ssao pushed it to ~20fps.

 

I believe that there is a lot of cost to binding all these textures, so I think the next step for me in improving performance would be to dwell into bindless graphics. I also want to try and get partially resident textures working on my nvidia card with OpenGL 4.4, but I haven't found any resources to help me with this - has anyone been able to implement this?



#29 Hodgman   Moderators   -  Reputation: 30360

Like
3Likes
Like

Posted 13 October 2013 - 10:06 PM

It's well worth spending a day to implement a basic GPU profiling system (I guess using ARB_timer_query in GL? I've only done it in D3D so far.) so you can see how many milliseconds are being burnt by your different shaders/passes. You can then print them to the screen, or the console, or a file, etc, and get decent statistics for all of your different features in one go.
 
Measuring performance with FPS is quite annoying on the other hand, requiring you to take before/after FPS measurements when turning a feature on and off.
50fps = 20ms/frame for drawing the scene, voxelizing it, cone tracing and direct lighting?
35fps = 28.5ms/frame == 8.5ms increase for adding soft-shadowing
25fps = 40ms/frame == 11.5ms increase for adding SSR
20fps = 50ms/frame == 10ms increase for adding SSAO
 

I believe that there is a lot of cost to binding all these textures, so I think the next step for me in improving performance would be to dwell into bindless graphics.

That will probably only be a CPU-side optimization, and by the sounds of it, your application is probably bottlenecked by the GPU workload.
 
Awesome results BTW biggrin.png


Edited by Hodgman, 13 October 2013 - 10:15 PM.


#30 Frenetic Pony   Members   -  Reputation: 1312

Like
0Likes
Like

Posted 13 October 2013 - 11:26 PM

It's well worth spending a day to implement a basic GPU profiling system (I guess using ARB_timer_query in GL? I've only done it in D3D so far.) so you can see how many milliseconds are being burnt by your different shaders/passes. You can then print them to the screen, or the console, or a file, etc, and get decent statistics for all of your different features in one go.

 

 

This is an excellent idea, especially since there are plenty of optimizations to pare down SSAO/SSR, those are pretty well established and researched. You'd really be looking at cone tracing for trying novel optimizations, though I can think of several already done.

 

One is downsampling before or otherwise binning together pixel blocks for the diffuse trace, which would work well with pixels relatively close to each other but miss thin and edge objects in the right cases. Epic also does this with the specular trace though they never explained how other than a hand wavy "then you upsample and scatter".

 

To reduce the number of samples for specular trace you can check a lower mip level of the volume for the alpha to see if its empty and if you should skip it. I'm also interested in, and may eventually get back to trying to figure out realtime signed distance fields. This should give you a minimum step size you can skip to for tracing, reducing the amount of samples you need.

 

As for a huge area (if you want to go that far), volume LOD and a Directed Acylic Graph (as I mentioned earlier) should reduce memory consumption a lot since you're using volume textures and not a sparse octree, though the paper is based on doing as such for an octree so I'm not sure how a uniform volume texture would play out.


Edited by Frenetic Pony, 14 October 2013 - 04:14 PM.


#31 gboxentertainment   Members   -  Reputation: 766

Like
3Likes
Like

Posted 15 October 2013 - 05:00 PM


It's well worth spending a day to implement a basic GPU profiling system

 

First value is 32x32x32 texture and second is 64x64x64 texture

Direct light Shadow 6.3; 6.3

Emissive light Shadows (all three) 13.1; 13.1

First Voxelization 0.01; 0.01

Second Bounce Voxelization (5 diffuse cones) 1; 1.6

Mip-mapping and filtering (3x3x3 filtering) 0.6; 1.3

Final Rendering (5 diffuse cones + 1 specular cone) 14.7; 16.8

Post (SSR + SSAO) 17.2; 17.2

Total 52.91 56.31


Edited by gboxentertainment, 15 October 2013 - 05:02 PM.


#32 kalle_h   Members   -  Reputation: 1387

Like
0Likes
Like

Posted 16 October 2013 - 02:41 AM

Your post processing is fairly expensive. With just those you could't run a game 60fps.



#33 Styves   Members   -  Reputation: 1023

Like
0Likes
Like

Posted 16 October 2013 - 05:56 AM

I'm curious as to why your SSAO + SSR are so expensive.



#34 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 16 October 2013 - 07:34 AM


I'm curious as to why your SSAO + SSR are so expensive.

 

Did some debugging and found out that (because I'm using forward rendering) I had accidentally used the hi-res version of the Buddha model for my ssao and ssr (over a million tris).

So instead of 1.0ms from the vertex shader with the low-poly model, I was getting 10ms.

My ssao is about 8.5ms now.

However, when I previously reported my results, I didn't actually have any ssr turned on, so my ssr, when turned on for the entire scene (all surfaces) is an additional 8.8ms.

I guess there's still a lot of room to optimize my ssr - when I implemented it, I was looking more for getting the best quality I could get than performance.

I've managed to reduce my ssao to 4.7ms without too much quality loss.

 

I'm trying to calculate whether deferred shading has an advantage over my current forward shading. With deferred shading, I have to render the Buddha at full res for position, normal and albedo textures so this will be a fixed vertex shader cost of 30ms. At the moment with forward shading, I render the model at full res once and at low-res 7 times, so that makes 17ms altogether for vertex shader costs.


Edited by gboxentertainment, 16 October 2013 - 07:41 AM.


#35 Che@ter   Members   -  Reputation: 248

Like
0Likes
Like

Posted 17 October 2013 - 03:59 AM

Hi! Try to find SSR with iteractive step - not fixed step. You will find reflected pixel in 3-8 steps. SSR must be faster than any postprocess effect.

SSAO - better implement it in multiple resolutions with upsampling - faster/better/no noise/no need to post-blur.



#36 kalle_h   Members   -  Reputation: 1387

Like
0Likes
Like

Posted 17 October 2013 - 04:11 AM

I would ditch all screenspace hacks If I would have voxel data structure some where already...



#37 Styves   Members   -  Reputation: 1023

Like
0Likes
Like

Posted 17 October 2013 - 04:36 AM

Yeah, your SSAO and SSR implementations seem very bloated. There's definitely a lot of room for optimizations here.

 

As for deferred shading: why would your cost for geometry go up 3x? You just need to use MRT to output some g-buffer data.



#38 Frenetic Pony   Members   -  Reputation: 1312

Like
0Likes
Like

Posted 17 October 2013 - 03:03 PM

I would ditch all screenspace hacks If I would have voxel data structure some where already...

 

The problem is computation and memory cost go up quite quickly with increased voxel resolution, and he's only getting that with a small room. Ideally you'd have say, a 16 square kilometer grid centered around the player. Which is going to cost more than enough by itself, without getting into the same voxel resolution as screen resolution.



#39 jcabeleira   Members   -  Reputation: 686

Like
0Likes
Like

Posted 18 October 2013 - 10:29 AM

 

I would ditch all screenspace hacks If I would have voxel data structure some where already...

 

The problem is computation and memory cost go up quite quickly with increased voxel resolution, and he's only getting that with a small room. Ideally you'd have say, a 16 square kilometer grid centered around the player. Which is going to cost more than enough by itself, without getting into the same voxel resolution as screen resolution.

 

 

Screen space effects are still useful for voxel cone tracing because the voxels often don't have enough resolution to provide finner details. For instance, ambient occlusion generated naturally by the cone tracing tends to look a bit washed out due to the lack of geometric detail and thus can benefit from SSAO to give finner details.

Same thing goes for reflections, if you want sharp reflections you'd need very small voxels which is impractical and expensive, in this case screen space reflections can help a lot. However, for blurred reflections voxel cone tracing is very good.



#40 gboxentertainment   Members   -  Reputation: 766

Like
1Likes
Like

Posted 18 October 2013 - 09:46 PM

Here's my ssR code for anyone that can help me optimize whilst still keeping some plausible quality:

	vec4 bColor = vec4(0.0);

	vec4 N = normalize(fNorm);
	mat3 tbn = mat3(tanMat*N.xyz, bitanMat*N.xyz, N.xyz);
	vec4 bumpMap = texture(bumpTex, texRes*fTexCoord);
	vec3 texN = (bumpMap.xyz*2.0 - 1.0);
	vec3 bumpN = bumpOn == true ? normalize(tbn*texN) : N.xyz;

	vec3 camSpaceNorm = vec3(view*(vec4(bumpN,N.w)));
	vec3 camSpacePos = vec3(view*worldPos);

	vec3 camSpaceViewDir = normalize(camSpacePos);
	vec3 camSpaceVec = normalize(reflect(camSpaceViewDir,camSpaceNorm));

	vec4 clipSpace = proj*vec4(camSpacePos,1);
	vec3 NDCSpace = clipSpace.xyz/clipSpace.w;
	vec3 screenSpacePos = 0.5*NDCSpace+0.5;

	vec3 camSpaceVecPos = camSpacePos+camSpaceVec;
	clipSpace = proj*vec4(camSpaceVecPos,1);
	NDCSpace = clipSpace.xyz/clipSpace.w;
	vec3 screenSpaceVecPos = 0.5*NDCSpace+0.5;
	vec3 screenSpaceVec = 0.01*normalize(screenSpaceVecPos - screenSpacePos);

	vec3 oldPos = screenSpacePos + screenSpaceVec;
	vec3 currPos = oldPos + screenSpaceVec;
	int count = 0;
	int nRefine = 0;
	float fade = 1.0;
	float fadeScreen = 0.0;
	float farPlane = 2.0;
	float nearPlane = 0.1;

	float cosAngInc = -dot(camSpaceViewDir,camSpaceNorm);
	cosAngInc = clamp(1-cosAngInc,0.3,1.0);
	
	if(specConeRatio <= 0.1 && ssrOn == true)
	{
	while(count < 50)
	{
		if(currPos.x < 0 || currPos.x > 1 || currPos.y < 0 || currPos.y > 1 || currPos.z < 0 || currPos.z > 1)
			break;

		vec2 ssPos = currPos.xy;

		float currDepth = 2.0*nearPlane/(farPlane+nearPlane-currPos.z*(farPlane-nearPlane));
		float sampleDepth = 2.0*nearPlane/(farPlane+nearPlane-texture(depthTex, ssPos).x*(farPlane-nearPlane));
		float diff = currDepth - sampleDepth;
		float error = length(screenSpaceVec);
		if(diff >= 0 && diff < error)
		{
			screenSpaceVec *= 0.7;
			currPos = oldPos;
			nRefine++;
			if(nRefine >= 3)
			{
					fade = float(count);
					fade = clamp(fade*fade/100,1.0,40.0);
					fadeScreen = distance(ssPos,vec2(0.5,0.5))*2;
					bColor.xyz += texture(reflTex, ssPos).xyz/2/fade*cosAngInc*(1-clamp(fadeScreen,0.0,1.0));
				break;
			}
		} else if(diff > error){
			bColor.xyz = vec3(0);
			sampleDepth = 2.0*nearPlane/(farPlane+nearPlane-texture(depthBTex, ssPos).x*(farPlane-nearPlane));
			diff = currDepth - sampleDepth;
			if(diff >= 0 && diff < error)
			{
				screenSpaceVec *= 0.7;
				currPos = oldPos;
				nRefine++;
				if(nRefine >= 3)
				{
					fade = float(count);
					fade = clamp(fade*fade/100,2.0,20.0);
					bColor.xyz += texture(reflTex, ssPos).xyz/2/fade*cosAngInc;
					break;
				}	
			}
		}

		oldPos = currPos;
		currPos = oldPos + screenSpaceVec;
		count++;

	}
	}

Note that the second half of the code (after the else if(diff > error)) is where I cover the back face of models (depthBTex is a depth texture with frontface culling) so that the back of models are reflected.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS