Jump to content

  • Log In with Google      Sign In   
  • Create Account


gboxentertainment

Member Since 18 Jan 2012
Offline Last Active Jun 08 2014 03:36 PM

#5109843 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 16 November 2013 - 10:05 PM


Sounds like you hit your video cards memory limit and the drivers are now using system memory - which is also why your frame rate tanks. Task Manager only shows system memory usage, not the memory internal to the video card.

 

Good point. So it turns out that it was to do with my voxel visualizer that was causing the massive increase in system ram. I've turned that off and it doesn't seem to have any effect on framerate.

Looking at gpu ram, it makes sense now - 64 voxel depth (with all other resources) uses up about 750mb. This increases to 1.8gb when using 512 voxel depth.




#5108922 Arauna2 path tracer announcement

Posted by gboxentertainment on 13 November 2013 - 04:00 AM

Have you or will you plan to implement any type of noise filtering? e.g. random parameter filtering looks interesting: http://www.youtube.com/watch?v=Ee51bkOlbMw

However, Sam Lapere, who's working on the Brigade engine, i'm pretty sure said that rpf doesn't really provide good results, but I would like to see some proof of that.

I think you worked on the Brigade engine as well didn't you?




#5108234 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 10 November 2013 - 02:25 AM

I just tested this with my brand new EVGA GTX780 and it runs at average 95fps at 1080p with all screen space effects turned on (ssao, ssr, all soft shadows). In fact, screen space effects seem to make little dent in the framerate.

 

I discovered something very unusual when testing the voxel depth. Here's my results:

32x32x32 -> 95fps (37MB memory)

64x64x64 -> 64fps (37MB memory)

128x128x128 -> 52fps (37MB memory)

256x256x256 -> 31fps (38MB memory)

512x512x512 -> 7fps (3.2GB memory)

 

How on earth did I jump from 38MB memory to 3.2GB of memory used when going from 256 to 512 3d texture depths?!




#5104713 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 26 October 2013 - 10:58 PM

So I've managed to remove some of the artifacts from my soft shadows:

Previously, when I had used front-face culling I got the following issue:

givoxshadows8-0.jpg

 

This was due to backfaces not being captured by the shadow-caster camera when at overlapping surfaces, thus leading to a gap of missing information in the depth test. There's also the issue of back-face self shadowing artifacts.

 

Using back-face culling (only rendering the front-face) resolves this problem, however, leads to the following problem:

givoxshadows8-1.jpg

Which is front-face self shadowing artifacts - any sort of bias does not resolve this problem because it is caused by the jittering process during depth testing.

 

I came up with a solution that resolves all these issues for direct lighting shadows, which is to also store an individual object id for each object in the scene from the shadow-caster's point of view. During depth testing, I then compare the object id from the player camera's point of view with that from the shadow-caster's point of view and make it so that each object does not cast its own shadow onto itself:

givoxshadows8-2.jpg

 

Now this is all good for direct lighting, because everything that is not directly lit I set to zero, including shadows, and then I add the indirect light to that zero - so there's a smooth transition between the shadow and the non-lit part of each object.

givoxshadleak2.jpg

 

For indirectly lit scenes with no direct lighting at all (i.e. emissively lit by objects), things are a bit different. I don't separate a secondary bounce with the subsequent bounces, all bounces are tied together - thus I cannot just set a secondary bounce as the "direct lighting" and everything else including shadows to zero, then add the subsequent bounces. This would require an additional voxel texture and I would need to double the number of cone traces.

I cheat by making the shadowed parts of the scene darker than the non-shadowed parts (when a more accurate algorithm would be to make shadowed areas zero and add subsequent bounces to those areas). This, together with the removal of any self-shadowing leads to shadow leaking:

givoxshadleak1.jpg givoxshadleak0.jpg

 

So I think I have two options:

  1. Add another voxel texture for the second bounce and double the number of cone traces (most expensive).
  2. Switch back to back-face rendering with front-face culling for the shadow mapping only for emissive lighting shadows (lots of ugly artifacts).

I wonder if anyone can come up with any other ideas.




#5103098 Rain droplets technique

Posted by gboxentertainment on 21 October 2013 - 05:56 AM

Here's something:

 

http://www.cescg.org/CESCG-2007/papers/Hagenberg-Stuppacher-Ines/cescg_StuppacherInes.pdf

 

It seems they use particles and store these in a height map texture.

 

[edit] Styves pretty much describes what is written here.




#5103091 Rain droplets technique

Posted by gboxentertainment on 21 October 2013 - 05:26 AM


Also if you look at the first video, see how the drops are affected by the movement of the camera

 

That's a good point. Originally I was thinking you store movement of the camera in variables for each direction and multiply this by the s-direction texture coordinate, but then that doesn't account for the bending of the drops.




#5103073 Rain droplets technique

Posted by gboxentertainment on 21 October 2013 - 03:59 AM

All it involves is just texture masking - not expensive at all. You create several textures with semi-transparent water droplets using the alpha channel (this can probably be done in photoshop) then have their vertical texture coordinates change as a variable of time in the shader code.




#5102806 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 20 October 2013 - 04:16 AM

I've managed to increase the speed of my ssR to 5.3ms at the cost of reduced quality by using variable step distance - so now i'm using 20 steps instead of 50.

 

giboxssr10.png

 

Even if I get it down to 10 steps and remove the additional backface cover, it will still be 3.1ms - is this fast enough? or can it be optimized further?




#5102550 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 18 October 2013 - 09:46 PM

Here's my ssR code for anyone that can help me optimize whilst still keeping some plausible quality:

	vec4 bColor = vec4(0.0);

	vec4 N = normalize(fNorm);
	mat3 tbn = mat3(tanMat*N.xyz, bitanMat*N.xyz, N.xyz);
	vec4 bumpMap = texture(bumpTex, texRes*fTexCoord);
	vec3 texN = (bumpMap.xyz*2.0 - 1.0);
	vec3 bumpN = bumpOn == true ? normalize(tbn*texN) : N.xyz;

	vec3 camSpaceNorm = vec3(view*(vec4(bumpN,N.w)));
	vec3 camSpacePos = vec3(view*worldPos);

	vec3 camSpaceViewDir = normalize(camSpacePos);
	vec3 camSpaceVec = normalize(reflect(camSpaceViewDir,camSpaceNorm));

	vec4 clipSpace = proj*vec4(camSpacePos,1);
	vec3 NDCSpace = clipSpace.xyz/clipSpace.w;
	vec3 screenSpacePos = 0.5*NDCSpace+0.5;

	vec3 camSpaceVecPos = camSpacePos+camSpaceVec;
	clipSpace = proj*vec4(camSpaceVecPos,1);
	NDCSpace = clipSpace.xyz/clipSpace.w;
	vec3 screenSpaceVecPos = 0.5*NDCSpace+0.5;
	vec3 screenSpaceVec = 0.01*normalize(screenSpaceVecPos - screenSpacePos);

	vec3 oldPos = screenSpacePos + screenSpaceVec;
	vec3 currPos = oldPos + screenSpaceVec;
	int count = 0;
	int nRefine = 0;
	float fade = 1.0;
	float fadeScreen = 0.0;
	float farPlane = 2.0;
	float nearPlane = 0.1;

	float cosAngInc = -dot(camSpaceViewDir,camSpaceNorm);
	cosAngInc = clamp(1-cosAngInc,0.3,1.0);
	
	if(specConeRatio <= 0.1 && ssrOn == true)
	{
	while(count < 50)
	{
		if(currPos.x < 0 || currPos.x > 1 || currPos.y < 0 || currPos.y > 1 || currPos.z < 0 || currPos.z > 1)
			break;

		vec2 ssPos = currPos.xy;

		float currDepth = 2.0*nearPlane/(farPlane+nearPlane-currPos.z*(farPlane-nearPlane));
		float sampleDepth = 2.0*nearPlane/(farPlane+nearPlane-texture(depthTex, ssPos).x*(farPlane-nearPlane));
		float diff = currDepth - sampleDepth;
		float error = length(screenSpaceVec);
		if(diff >= 0 && diff < error)
		{
			screenSpaceVec *= 0.7;
			currPos = oldPos;
			nRefine++;
			if(nRefine >= 3)
			{
					fade = float(count);
					fade = clamp(fade*fade/100,1.0,40.0);
					fadeScreen = distance(ssPos,vec2(0.5,0.5))*2;
					bColor.xyz += texture(reflTex, ssPos).xyz/2/fade*cosAngInc*(1-clamp(fadeScreen,0.0,1.0));
				break;
			}
		} else if(diff > error){
			bColor.xyz = vec3(0);
			sampleDepth = 2.0*nearPlane/(farPlane+nearPlane-texture(depthBTex, ssPos).x*(farPlane-nearPlane));
			diff = currDepth - sampleDepth;
			if(diff >= 0 && diff < error)
			{
				screenSpaceVec *= 0.7;
				currPos = oldPos;
				nRefine++;
				if(nRefine >= 3)
				{
					fade = float(count);
					fade = clamp(fade*fade/100,2.0,20.0);
					bColor.xyz += texture(reflTex, ssPos).xyz/2/fade*cosAngInc;
					break;
				}	
			}
		}

		oldPos = currPos;
		currPos = oldPos + screenSpaceVec;
		count++;

	}
	}

Note that the second half of the code (after the else if(diff > error)) is where I cover the back face of models (depthBTex is a depth texture with frontface culling) so that the back of models are reflected.




#5101809 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 16 October 2013 - 07:34 AM


I'm curious as to why your SSAO + SSR are so expensive.

 

Did some debugging and found out that (because I'm using forward rendering) I had accidentally used the hi-res version of the Buddha model for my ssao and ssr (over a million tris).

So instead of 1.0ms from the vertex shader with the low-poly model, I was getting 10ms.

My ssao is about 8.5ms now.

However, when I previously reported my results, I didn't actually have any ssr turned on, so my ssr, when turned on for the entire scene (all surfaces) is an additional 8.8ms.

I guess there's still a lot of room to optimize my ssr - when I implemented it, I was looking more for getting the best quality I could get than performance.

I've managed to reduce my ssao to 4.7ms without too much quality loss.

 

I'm trying to calculate whether deferred shading has an advantage over my current forward shading. With deferred shading, I have to render the Buddha at full res for position, normal and albedo textures so this will be a fixed vertex shader cost of 30ms. At the moment with forward shading, I render the model at full res once and at low-res 7 times, so that makes 17ms altogether for vertex shader costs.




#5101670 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 15 October 2013 - 05:00 PM


It's well worth spending a day to implement a basic GPU profiling system

 

First value is 32x32x32 texture and second is 64x64x64 texture

Direct light Shadow 6.3; 6.3

Emissive light Shadows (all three) 13.1; 13.1

First Voxelization 0.01; 0.01

Second Bounce Voxelization (5 diffuse cones) 1; 1.6

Mip-mapping and filtering (3x3x3 filtering) 0.6; 1.3

Final Rendering (5 diffuse cones + 1 specular cone) 14.7; 16.8

Post (SSR + SSAO) 17.2; 17.2

Total 52.91 56.31




#5101111 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 13 October 2013 - 03:34 PM


Looks pretty good, where's your main bottleneck for performance, is it really your new mip mapping?

 

No, mip-mapping is still pretty cheap in the scale of things (but the cost may accumulate later when I try to implement cascades).

Right now the main bottlenecks for performances are soft-shadowing, ssR and ssao.

I haven't bothered to set up any sort of way of querying actual cost of each feature so I can't really tell you accurately where the costs would come from. All I can tell you is the framerate of what I remember from before I implemented soft-shadowing, ssR and ssao - where it was running at 50fps with the same cone tracing features (except now I have the modified mip-mapping). I think soft-shadowing (for main pointlight and 3 emissive objects) pushed it down to ~35fps, ssR pushed it down to about ~25fps and ssao pushed it to ~20fps.

 

I believe that there is a lot of cost to binding all these textures, so I think the next step for me in improving performance would be to dwell into bindless graphics. I also want to try and get partially resident textures working on my nvidia card with OpenGL 4.4, but I haven't found any resources to help me with this - has anyone been able to implement this?




#5100986 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 13 October 2013 - 05:54 AM

Here's what I've done to amend the problem - now all mip levels are filtered equally in every direction:

#version 430

layout(local_size_x = 16, local_size_y = 8, local_size_z = 1) in;

layout(binding = 0, rgba8) uniform image3D srcMip;
layout(binding = 1, rgba8) uniform image3D dstMip;

uniform uint direction;

void main()
{
	ivec3 dstSize = imageSize(dstMip);

	if(gl_GlobalInvocationID.x >= dstSize.x || gl_GlobalInvocationID.y >= dstSize.y || gl_GlobalInvocationID.z >= dstSize.z){
		// out of range, ignore
	} else {
		ivec3 dstPos = ivec3(gl_GlobalInvocationID);
		ivec3 srcPos = dstPos*2;
		vec4 outColor;

		vec4 srcCol0 = imageLoad(srcMip, srcPos + ivec3(-1,-1,-1));
		vec4 srcCol1 = imageLoad(srcMip, srcPos + ivec3(0,-1,-1));
		vec4 srcCol2 = imageLoad(srcMip, srcPos + ivec3(1,-1,-1));
		vec4 srcCol3 = imageLoad(srcMip, srcPos + ivec3(-1,0,-1));
		vec4 srcCol4 = imageLoad(srcMip, srcPos + ivec3(0,0,-1));
		vec4 srcCol5 = imageLoad(srcMip, srcPos + ivec3(1,0,-1));
		vec4 srcCol6 = imageLoad(srcMip, srcPos + ivec3(-1,1,-1));
		vec4 srcCol7 = imageLoad(srcMip, srcPos + ivec3(0,1,-1));
		vec4 srcCol8 = imageLoad(srcMip, srcPos + ivec3(1,1,-1));

		vec4 srcCol9 = imageLoad(srcMip, srcPos + ivec3(-1,-1,0));
		vec4 srcCol10 = imageLoad(srcMip, srcPos + ivec3(0,-1,0));
		vec4 srcCol11 = imageLoad(srcMip, srcPos + ivec3(1,-1,0));
		vec4 srcCol12 = imageLoad(srcMip, srcPos + ivec3(-1,0,0));
		vec4 srcCol13 = imageLoad(srcMip, srcPos + ivec3(0,0,0));
		vec4 srcCol14 = imageLoad(srcMip, srcPos + ivec3(1,0,0));
		vec4 srcCol15 = imageLoad(srcMip, srcPos + ivec3(-1,1,0));
		vec4 srcCol16 = imageLoad(srcMip, srcPos + ivec3(0,1,0));
		vec4 srcCol17 = imageLoad(srcMip, srcPos + ivec3(1,1,0));

		vec4 srcCol18 = imageLoad(srcMip, srcPos + ivec3(-1,-1,1));
		vec4 srcCol19 = imageLoad(srcMip, srcPos + ivec3(0,-1,1));
		vec4 srcCol20 = imageLoad(srcMip, srcPos + ivec3(1,-1,1));
		vec4 srcCol21 = imageLoad(srcMip, srcPos + ivec3(-1,0,1));
		vec4 srcCol22 = imageLoad(srcMip, srcPos + ivec3(0,0,1));
		vec4 srcCol23 = imageLoad(srcMip, srcPos + ivec3(1,0,1));
		vec4 srcCol24 = imageLoad(srcMip, srcPos + ivec3(-1,1,1));
		vec4 srcCol25 = imageLoad(srcMip, srcPos + ivec3(0,1,1));
		vec4 srcCol26 = imageLoad(srcMip, srcPos + ivec3(1,1,1));

		//+X direction
		outColor.xyz = mix(srcCol0.xyz, srcCol1.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol1.xyz, srcCol2.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol3.xyz, srcCol4.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol4.xyz, srcCol5.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol6.xyz, srcCol7.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol7.xyz, srcCol8.xyz, 1.0 - srcCol7.w)

					+ mix(srcCol9.xyz, srcCol10.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol10.xyz, srcCol11.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol12.xyz, srcCol13.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol13.xyz, srcCol14.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol15.xyz, srcCol16.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol16.xyz, srcCol17.xyz, 1.0 - srcCol16.w)
					
					+ mix(srcCol18.xyz, srcCol19.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol19.xyz, srcCol20.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol21.xyz, srcCol22.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol22.xyz, srcCol23.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol24.xyz, srcCol25.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol25.xyz, srcCol26.xyz, 1.0 - srcCol25.w)
		//+Y direction			
					+ mix(srcCol0.xyz, srcCol3.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol3.xyz, srcCol6.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol1.xyz, srcCol4.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol4.xyz, srcCol7.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol2.xyz, srcCol5.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol5.xyz, srcCol8.xyz, 1.0 - srcCol5.w)

					+ mix(srcCol9.xyz, srcCol12.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol12.xyz, srcCol15.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol10.xyz, srcCol13.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol13.xyz, srcCol16.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol11.xyz, srcCol14.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol14.xyz, srcCol17.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol18.xyz, srcCol21.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol21.xyz, srcCol24.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol19.xyz, srcCol22.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol22.xyz, srcCol25.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol20.xyz, srcCol23.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol23.xyz, srcCol26.xyz, 1.0 - srcCol23.w)
		//+Z direction			
					+ mix(srcCol0.xyz, srcCol9.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol9.xyz, srcCol18.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol1.xyz, srcCol10.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol10.xyz, srcCol19.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol2.xyz, srcCol11.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol11.xyz, srcCol20.xyz, 1.0 - srcCol11.w)

					+ mix(srcCol3.xyz, srcCol12.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol12.xyz, srcCol21.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol4.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol13.xyz, srcCol22.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol5.xyz, srcCol14.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol14.xyz, srcCol23.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol6.xyz, srcCol15.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol15.xyz, srcCol24.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol7.xyz, srcCol16.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol16.xyz, srcCol25.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol8.xyz, srcCol17.xyz, 1.0 - srcCol8.w)
					+ mix(srcCol17.xyz, srcCol26.xyz, 1.0 - srcCol17.w);
//+X direction
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol20.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol24.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol26.w)
//+Y direction						
						- (1.0 - srcCol0.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol26.w)
//+Z direction
						- (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	
		outColor.xyz *= 0.05;

		imageStore(dstMip, dstPos, outColor);
	}
}

I've made it so that voxels are filtered in each direction in a single pass. It is a little slower but still runs just under 20fps (probably on avg about 19fps).

 

Also, when I sample the voxels for the cone tracing I apply a 1/4 of a voxel offset for each mip level to balance it better.

 

giboxv4-0.png giboxv4-1.png

giboxv4-2.png giboxv4-3.png

 

So as you can see, things are starting to look a bit more physically accurate (even though theoretically its not).

 

Now I've just got to work out how to speed things up by turning this into a 3-pass (trilateral) filtering algorithm.




#5100936 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 12 October 2013 - 09:56 PM

So it turns out I had misinterpreted the original code that my method is based on. I am in fact manually mip-mapping to downsample my voxels.

#version 430

layout(local_size_x = 16, local_size_y = 8, local_size_z = 1) in;

layout(binding = 0, rgba8) uniform image3D srcMip;
layout(binding = 1, rgba8) uniform image3D dstMip;

uniform uint direction;

void main()
{
	ivec3 dstSize = imageSize(dstMip);

	if(gl_GlobalInvocationID.x >= dstSize.x || gl_GlobalInvocationID.y >= dstSize.y || gl_GlobalInvocationID.z >= dstSize.z){
		// out of range, ignore
	} else {
		ivec3 dstPos = ivec3(gl_GlobalInvocationID);
		ivec3 srcPos = dstPos*2;
		vec4 outColor;

		vec4 srcCol0 = imageLoad(srcMip, srcPos + ivec3(0,0,0));
		vec4 srcCol1 = imageLoad(srcMip, srcPos + ivec3(1,0,0));
		vec4 srcCol2 = imageLoad(srcMip, srcPos + ivec3(2,0,0));
		vec4 srcCol3 = imageLoad(srcMip, srcPos + ivec3(0,1,0));
		vec4 srcCol4 = imageLoad(srcMip, srcPos + ivec3(1,1,0));
		vec4 srcCol5 = imageLoad(srcMip, srcPos + ivec3(2,1,0));
		vec4 srcCol6 = imageLoad(srcMip, srcPos + ivec3(0,2,0));
		vec4 srcCol7 = imageLoad(srcMip, srcPos + ivec3(1,2,0));
		vec4 srcCol8 = imageLoad(srcMip, srcPos + ivec3(2,2,0));

		vec4 srcCol9 = imageLoad(srcMip, srcPos + ivec3(0,0,1));
		vec4 srcCol10 = imageLoad(srcMip, srcPos + ivec3(1,0,1));
		vec4 srcCol11 = imageLoad(srcMip, srcPos + ivec3(2,0,1));
		vec4 srcCol12 = imageLoad(srcMip, srcPos + ivec3(0,1,1));
		vec4 srcCol13 = imageLoad(srcMip, srcPos + ivec3(1,1,1));
		vec4 srcCol14 = imageLoad(srcMip, srcPos + ivec3(2,1,1));
		vec4 srcCol15 = imageLoad(srcMip, srcPos + ivec3(0,2,1));
		vec4 srcCol16 = imageLoad(srcMip, srcPos + ivec3(1,2,1));
		vec4 srcCol17 = imageLoad(srcMip, srcPos + ivec3(2,2,1));

		vec4 srcCol18 = imageLoad(srcMip, srcPos + ivec3(0,0,2));
		vec4 srcCol19 = imageLoad(srcMip, srcPos + ivec3(1,0,2));
		vec4 srcCol20 = imageLoad(srcMip, srcPos + ivec3(2,0,2));
		vec4 srcCol21 = imageLoad(srcMip, srcPos + ivec3(0,1,2));
		vec4 srcCol22 = imageLoad(srcMip, srcPos + ivec3(1,1,2));
		vec4 srcCol23 = imageLoad(srcMip, srcPos + ivec3(2,1,2));
		vec4 srcCol24 = imageLoad(srcMip, srcPos + ivec3(0,2,2));
		vec4 srcCol25 = imageLoad(srcMip, srcPos + ivec3(1,2,2));
		vec4 srcCol26 = imageLoad(srcMip, srcPos + ivec3(2,2,2));

	if(direction == 0) {
		//+X direction
		outColor.xyz = mix(srcCol0.xyz, srcCol1.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol1.xyz, srcCol2.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol3.xyz, srcCol4.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol4.xyz, srcCol5.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol6.xyz, srcCol7.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol7.xyz, srcCol8.xyz, 1.0 - srcCol7.w)

					+ mix(srcCol9.xyz, srcCol10.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol10.xyz, srcCol11.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol12.xyz, srcCol13.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol13.xyz, srcCol14.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol15.xyz, srcCol16.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol16.xyz, srcCol17.xyz, 1.0 - srcCol16.w)
					
					+ mix(srcCol18.xyz, srcCol19.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol19.xyz, srcCol20.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol21.xyz, srcCol22.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol22.xyz, srcCol23.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol24.xyz, srcCol25.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol25.xyz, srcCol26.xyz, 1.0 - srcCol25.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol20.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol24.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol26.w);

	} else if(direction == 1) {
		//-X direction
		outColor.xyz = mix(srcCol1.xyz, srcCol0.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol2.xyz, srcCol1.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol4.xyz, srcCol3.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol5.xyz, srcCol4.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol7.xyz, srcCol6.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol8.xyz, srcCol7.xyz, 1.0 - srcCol8.w)

					+ mix(srcCol10.xyz, srcCol9.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol11.xyz, srcCol10.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol13.xyz, srcCol12.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol14.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol16.xyz, srcCol15.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol17.xyz, srcCol16.xyz, 1.0 - srcCol17.w)
					
					+ mix(srcCol19.xyz, srcCol18.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol20.xyz, srcCol19.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol22.xyz, srcCol21.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol23.xyz, srcCol22.xyz, 1.0 - srcCol23.w)
					+ mix(srcCol25.xyz, srcCol24.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol26.xyz, srcCol25.xyz, 1.0 - srcCol26.w);
		outColor.w = 4.0 - (1.0 - srcCol1.w) * (1.0 - srcCol0.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol7.w)

						- (1.0 - srcCol10.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol16.w)
						
						- (1.0 - srcCol19.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol26.w) * (1.0 - srcCol25.w);

	} else if(direction == 2) {
		//+Y direction
		outColor.xyz = mix(srcCol0.xyz, srcCol3.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol3.xyz, srcCol6.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol1.xyz, srcCol4.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol4.xyz, srcCol7.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol2.xyz, srcCol5.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol5.xyz, srcCol8.xyz, 1.0 - srcCol5.w)

					+ mix(srcCol9.xyz, srcCol12.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol12.xyz, srcCol15.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol10.xyz, srcCol13.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol13.xyz, srcCol16.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol11.xyz, srcCol14.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol14.xyz, srcCol17.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol18.xyz, srcCol21.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol21.xyz, srcCol24.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol19.xyz, srcCol22.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol22.xyz, srcCol25.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol20.xyz, srcCol23.xyz, 1.0 - srcCol20.w)
					+ mix(srcCol23.xyz, srcCol26.xyz, 1.0 - srcCol23.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol3.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol7.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol5.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol8.w)
						
						- (1.0 - srcCol9.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol16.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol17.w)
						
						- (1.0 - srcCol18.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol19.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol20.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol26.w);
	} 
	else if(direction == 3) {
		//-Y direction
		outColor.xyz = mix(srcCol3.xyz, srcCol0.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol6.xyz, srcCol3.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol4.xyz, srcCol1.xyz, 1.0 - srcCol4.w)
					+ mix(srcCol7.xyz, srcCol4.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol5.xyz, srcCol2.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol8.xyz, srcCol5.xyz, 1.0 - srcCol8.w)
					
					+ mix(srcCol15.xyz, srcCol12.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol12.xyz, srcCol9.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol16.xyz, srcCol13.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol13.xyz, srcCol10.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol17.xyz, srcCol14.xyz, 1.0 - srcCol17.w)
					+ mix(srcCol14.xyz, srcCol11.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol24.xyz, srcCol21.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol21.xyz, srcCol18.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol25.xyz, srcCol22.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol22.xyz, srcCol19.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol26.xyz, srcCol23.xyz, 1.0 - srcCol26.w)
					+ mix(srcCol23.xyz, srcCol20.xyz, 1.0 - srcCol23.w);
		outColor.w = 4.0 - (1.0 - srcCol3.w) * (1.0 - srcCol0.w)
						- (1.0 - srcCol6.w) * (1.0 - srcCol3.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol1.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol4.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol2.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol5.w)
						
						- (1.0 - srcCol15.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol11.w)
						
						- (1.0 - srcCol24.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol21.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol25.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol22.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol23.w) * (1.0 - srcCol23.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol20.w);
	} else if(direction == 4) {
		//+Z direction
		outColor.xyz = mix(srcCol0.xyz, srcCol9.xyz, 1.0 - srcCol0.w)
					+ mix(srcCol9.xyz, srcCol18.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol1.xyz, srcCol10.xyz, 1.0 - srcCol1.w)
					+ mix(srcCol10.xyz, srcCol19.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol2.xyz, srcCol11.xyz, 1.0 - srcCol2.w)
					+ mix(srcCol11.xyz, srcCol20.xyz, 1.0 - srcCol11.w)

					+ mix(srcCol3.xyz, srcCol12.xyz, 1.0 - srcCol3.w)
					+ mix(srcCol12.xyz, srcCol21.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol4.xyz, srcCol13.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol13.xyz, srcCol22.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol5.xyz, srcCol14.xyz, 1.0 - srcCol5.w)
					+ mix(srcCol14.xyz, srcCol23.xyz, 1.0 - srcCol14.w)
					
					+ mix(srcCol6.xyz, srcCol15.xyz, 1.0 - srcCol6.w)
					+ mix(srcCol15.xyz, srcCol24.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol7.xyz, srcCol16.xyz, 1.0 - srcCol7.w)
					+ mix(srcCol16.xyz, srcCol25.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol8.xyz, srcCol17.xyz, 1.0 - srcCol8.w)
					+ mix(srcCol17.xyz, srcCol26.xyz, 1.0 - srcCol17.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	} else if(direction == 5) {
		//-Z direction
		outColor.xyz = mix(srcCol9.xyz, srcCol0.xyz, 1.0 - srcCol9.w)
					+ mix(srcCol18.xyz, srcCol9.xyz, 1.0 - srcCol18.w)
					+ mix(srcCol10.xyz, srcCol1.xyz, 1.0 - srcCol10.w)
					+ mix(srcCol19.xyz, srcCol10.xyz, 1.0 - srcCol19.w)
					+ mix(srcCol11.xyz, srcCol2.xyz, 1.0 - srcCol11.w)
					+ mix(srcCol20.xyz, srcCol11.xyz, 1.0 - srcCol20.w)

					+ mix(srcCol12.xyz, srcCol3.xyz, 1.0 - srcCol12.w)
					+ mix(srcCol21.xyz, srcCol12.xyz, 1.0 - srcCol21.w)
					+ mix(srcCol13.xyz, srcCol4.xyz, 1.0 - srcCol13.w)
					+ mix(srcCol22.xyz, srcCol13.xyz, 1.0 - srcCol22.w)
					+ mix(srcCol14.xyz, srcCol5.xyz, 1.0 - srcCol14.w)
					+ mix(srcCol23.xyz, srcCol14.xyz, 1.0 - srcCol23.w)
					
					+ mix(srcCol15.xyz, srcCol6.xyz, 1.0 - srcCol15.w)
					+ mix(srcCol24.xyz, srcCol15.xyz, 1.0 - srcCol24.w)
					+ mix(srcCol16.xyz, srcCol7.xyz, 1.0 - srcCol16.w)
					+ mix(srcCol25.xyz, srcCol16.xyz, 1.0 - srcCol25.w)
					+ mix(srcCol17.xyz, srcCol8.xyz, 1.0 - srcCol17.w)
					+ mix(srcCol26.xyz, srcCol17.xyz, 1.0 - srcCol26.w);
		outColor.w = 4.0 - (1.0 - srcCol0.w) * (1.0 - srcCol9.w)
						- (1.0 - srcCol9.w) * (1.0 - srcCol18.w)
						- (1.0 - srcCol1.w) * (1.0 - srcCol10.w)
						- (1.0 - srcCol10.w) * (1.0 - srcCol19.w)
						- (1.0 - srcCol2.w) * (1.0 - srcCol11.w)
						- (1.0 - srcCol11.w) * (1.0 - srcCol20.w)
						
						- (1.0 - srcCol3.w) * (1.0 - srcCol12.w)
						- (1.0 - srcCol12.w) * (1.0 - srcCol21.w)
						- (1.0 - srcCol4.w) * (1.0 - srcCol13.w)
						- (1.0 - srcCol13.w) * (1.0 - srcCol22.w)
						- (1.0 - srcCol5.w) * (1.0 - srcCol14.w)
						- (1.0 - srcCol14.w) * (1.0 - srcCol23.w)
						
						- (1.0 - srcCol6.w) * (1.0 - srcCol15.w)
						- (1.0 - srcCol15.w) * (1.0 - srcCol24.w)
						- (1.0 - srcCol7.w) * (1.0 - srcCol6.w)
						- (1.0 - srcCol16.w) * (1.0 - srcCol25.w)
						- (1.0 - srcCol8.w) * (1.0 - srcCol17.w)
						- (1.0 - srcCol17.w) * (1.0 - srcCol26.w);
	}
	
		outColor.xyz *= 0.2;

		imageStore(dstMip, dstPos, outColor);
	}
}

However, I figured out that I have only been transferring my values in the -X direction! because in my shader code:

void TextureManager::mipMapPass(GLuint shader, GLuint tex, int dir, int voxDim)
{
	glUseProgram(shader);

	int workGroupSize[3] = {};
	glGetProgramiv(shader, GL_COMPUTE_WORK_GROUP_SIZE, workGroupSize);
	if (workGroupSize[0] * workGroupSize[1] * workGroupSize[2] == 0){
		cout << "failed to load compute shader" << endl;
		return;
	}

	int mipLevels = GetBitIndex(voxDim) + 1;
	for (int mip = 1; mip < mipLevels; mip++)
	{
		glUniform1ui(glGetUniformLocation(shader, "direction"), dir);

		glBindImageTexture(0, tex, mip - 1, GL_TRUE, 0, GL_READ_ONLY, GL_RGBA8);
		glBindImageTexture(1, tex, mip, GL_TRUE, 0, GL_WRITE_ONLY, GL_RGBA8);

		glDispatchCompute(
			((voxDim >> 1) + workGroupSize[0] - 1) / workGroupSize[0],
			((voxDim >> 1) + workGroupSize[1] - 1) / workGroupSize[1],
			((voxDim >> 1) + workGroupSize[2] - 1) / workGroupSize[2]);
	}
}

I only run this function once with dir = 0.

 

The issue that I'm facing now is when I run this function 6 times for each direction, it doesn't accumulate the values in each direction, instead overwriting the 3d texture with the last direction.

If I use hardware generation of mipmaps and then do the transfer of values to neighbouring voxels in each direction for each mip level, the results come out wrong.




#5100760 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by gboxentertainment on 12 October 2013 - 04:00 AM

In my attempts of trying to reduce flickering I have come across an unusual discovery:

 

There is much less mid-level flickering in the z direction than there is in the x direction. i.e. Flickering caused by voxels at lod = 1 and lod = 2 is much less in the z direction (as opposed to lod = 0 (densest level), where there is the same amount of flickering no matter what direction).

 

Looking at the debug view I noticed that when I switch to lod = 1 debug, there seems to be much more information captured in the voxels in the x direction, than in the z direction:

 

gibox-debug0.jpg

 

As you can notice in the above image, the voxels are a lot thicker on the facing surfaces of the left and right walls than on the wall at the very back.

 

Originally, a while ago when I had first discovered this, I had thought that this must be caused by some incorrect filtering during the mip-map stage. But I have checked my code many many times and voxels are filtered in exactly the same manner in each direction.

 

There must be something that I'm doing wrong during the mip-mapping stage which requires further investigation.






PARTNERS