Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

gboxentertainment

Member Since 18 Jan 2012
Offline Last Active Today, 05:29 AM
-----

Topics I've Started

Voxel Cone Tracing Experiment - Progress

17 February 2013 - 06:45 AM

Hi all,

 

I thought I might share with you all the latest progress of my voxel cone tracing engine, implemented in OpenGL 4.3 and based on Alex Nankervis' voxel cone tracing method (http://www.geeks3d.com/20121214/voxel-cone-tracing-global-illumination-in-opengl-4-3/) - except, I have built on that by creating my own engine from scratch and implementing a number of things differently.

 

Here are a number of screenshots to show my results so far - of course there are many things that need to be improved, which you will notice a number of artifacts that exist in my test engine:

 

Basic scene with cone-traced soft-shadows, specular reflections, as well as light specular:

Attached File  gibox0.jpg   37.29K   49 downloadsAttached File  gibox1.jpg   35.72K   53 downloads

 

Single-bounce gi vs "unlimited" multi-bounce gi (unlimited in quotes being that the intensity of each subsequent bounce converges towards zero):

Attached File  gibox2-1.jpg   43.35K   40 downloadsAttached File  gibox2.jpg   43.19K   40 downloads

 

First voxelization of scene (done only in first frame) and then the revoxelization loop of the scene after cone tracing for "unlimited" bounces:

Attached File  gibox2-2.jpg   38.42K   33 downloadsAttached File  gibox2-3.jpg   38.98K   34 downloads

 

Just showing off the quality of the specular reflections:

Attached File  gibox3.jpg   51.8K   39 downloadsAttached File  gibox4.jpg   37.27K   46 downloads

 

and finally...

Single-bounce emissive vs multi-bounce emissive ([edit] I made a mistake here where I accidentally had the green wall outside of the scene volume, thus it was not correctly lit, which is why there is no color bleeding). The last image shows that there is a major problem with the revoxelization of the emissive scene (which leads to flickering artifacts - I guess this may not be a problem in a real game because it can add an effect of flickering lights):

Attached File  giboxemissive1.jpg   41.2K   44 downloadsAttached File  giboxemissive0.jpg   45K   37 downloadsAttached File  giboxemissive2.jpg   41.13K   32 downloads

 

To give you guys an idea of the scale of this scene, here are the specs:

  • 64x64x64 voxels for the entire scene
  • Runs at around 30-35fps on 1024x768 on my gtx485m i7 2720qm with 8gb ram Windows 7 64-bit. Drops to about 23fps if I get close to the Buddha model.
  • 64-bit OpenGL 4.3
  • 1 3D texture using dominant axis voxelization (plus a second 3D texture for image atomic average operations which reduces flickering artifacts significantly - but doesn't eliminate them).
  • (4 diffuse cones traced + 1 specular cone traced + 1 shadow cone traced in direction of light) x 2 (2nd time for revoxelization to achieve "unlimited" multiple bounces).
  • Buddha model is the most costly object, with over 500,000 vertices.
  • I also apply the lighting and shadows prior to voxelization (and it is applied in each pass). Until someone comes up with a convincing explanation, I don't see any advantage to splatting light into the 3D texture (which is not as accurate) after voxelization from a lightmap texture that has world position information in it.

Things that I need to improve (hopefully I can get some advice from this community):

 

  • If you notice the specular reflection under the red box - part of it is missing which I think is caused by some opacity problem during cone trace of the base mip level:

Attached File  gibox0-1.jpg   45.22K   32 downloadsAttached File  gibox0-2.jpg   38.1K   33 downloads

 

  • You would have noticed that the soft shadows are very voxelized and have holes in them - again, some opacity issue that might be related to the red box specular reflection problem. What I had to do was to increase the step size of just the initial cone trace step because originally, the shadow cone tracing was producing the self-shadowing artifacts as shown in the second image, even though the smaller step size produced more accurate shadows without holes in them:

Attached File  giboxshadows3.jpg   44.1K   29 downloadsAttached File  giboxshadows4.jpg   36.62K   36 downloads

 

  • In the images above of the emissive test, emissive objects really bring out the incorrect filtering, because I believe I am using 2x2x2 bricks during the mip-mapping process. Another explanation could be that i am not distributing my four diffuse cones evenly enough.
  • In some cases, activating multiple bounces actually makes the lighting look worse than a single-bounce due to the scene progressively reducing in intensity for each bounce. However, I think I can address this by turning up the intensity of the direct lighting in each bounce pass.
  • I need to implement an octree structure using 3x3x3 bricks.
  • I will probably implement some sort of performance debugging which shows the cost in ms of each action.

Voxelization using Dominant Axis (Voxel Cone Tracing)

12 February 2013 - 05:37 AM

I am trying to voxelize my scene by rasterization using the dominant axis theory - which involves using the vertex normals to find the dominant axis of each triangle which would then determine the direction to apply an orthographic projection of the scene. This is done in the geometry shader and then passed to the fragment shader.

 

I manage to voxelize the scene, except for one major problem as shown below, adjacent to the non-voxelized scene:

Attached File  givox0.jpg   38.05K   20 downloadsAttached File  givox1.jpg   39.75K   21 downloads

 

Notice how there are artifacts above the Buddha and the Sphere?

 

Does anyone know what could be the cause of this and how this can be fixed?

 

Here is my geometry shader code:

#version 430

layout(triangles) in;
layout(triangle_strip, max_vertices=3) out;

in VSOutput
{
	vec4 ndcPos;
	vec4 fNorm;
	vec2 fTexCoord;
	vec4 worldPos;
} vsout[];

out GSOutput
{
	vec4 ndcPos;
	vec4 fNorm;
	vec2 fTexCoord;
	vec4 worldPos;
	vec4 axisCol;
	mat4 oProj;
} gsout;

uniform mat4 voxSpace;

void main()
{
	gsout.oProj = mat4(0.0);

	for(int i = 0; i<gl_in.length(); i++)
	{	
		
		vec4 n = vsout[i].fNorm;

		gsout.fNorm = n;
		gsout.fTexCoord = vsout[i].fTexCoord;
		gsout.worldPos = vsout[i].worldPos;
		
		float maxC = max(abs(n.x), max(abs(n.y), abs(n.z)));
		float x,y,z;
		x = abs(n.x) < maxC ? 0.0 : 1.0;
		y = abs(n.y) < maxC ? 0.0 : 1.0;
		z = abs(n.z) < maxC ? 0.0 : 1.0;

		vec4 axis = vec4(x,y,z,1);
		
		if(axis == vec4(1.0,0.0,0.0,1))
		{
			gsout.oProj[0] = vec4(0,0,-1,0);
			gsout.oProj[1] = vec4(0,-1,0,0);
			gsout.oProj[2] = vec4(-1,0,0,0);
			gsout.oProj[3] = vec4(0,0,0,1);
		}	
		else if(axis == vec4(0,1,0,1))
		{
			gsout.oProj[0] = vec4(1,0,0,0);
			gsout.oProj[1] = vec4(0,0,1,0);
			gsout.oProj[2] = vec4(0,-1,0,0);
			gsout.oProj[3] = vec4(0,0,0,1);
		}
		else if(axis == vec4(0,0,1,1))
		{
			gsout.oProj[0] = vec4(1,0,0,0);
			gsout.oProj[1] = vec4(0,-1,0,0);
			gsout.oProj[2] = vec4(0,0,-1,0);
			gsout.oProj[3] = vec4(0,0,0,1);
		}
		
		gsout.axisCol = axis;

		gl_Position = gsout.oProj*voxSpace*gl_in[i].gl_Position;
		gsout.ndcPos = gsout.oProj*voxSpace*vsout[i].ndcPos;

		EmitVertex();
	}
	
}

Reading binary file to vbo in OpenGL

21 January 2013 - 07:18 AM

I am currently having trouble trying to create and load binary files of model vertex data into the vbo in OpenGL.

 

Here is the code for a simple test that I have done by writing 2D triangle data into a *.bin file:

 

	glm::vec2 points[6] =
	{
		glm::vec2( -0.5, -0.5 ), glm::vec2( 0.5, -0.5 ),
		glm::vec2( 0.5, 0.5 ), glm::vec2( 0.5, 0.5 ),
		glm::vec2( -0.5, 0.5 ), glm::vec2( -0.5, -0.5 )
	};
	
	std::ofstream fs("model.bin", std::ios::out | std::ios::binary | std::ios::app);
        for(int i = 0; i<6; i++)
	{
		fs.write((const char*)&points[i].x, sizeof(points[i].x));
		fs.write((const char*)&points[i].y, sizeof(points[i].y));
	}
	fs.close();

 

and here is the code for copying the data into the vbo:

 

	int length;
	char* data;
	ifstream is;
	is.open("model.bin", ios::binary);
	is.seekg(0, ios::end);
	length = is.tellg();
	is.seekg(0, ios::beg);
	data = new char[length];
	is.read(data, length);
	is.close();
 
	glGenBuffers(1, &vbo);
	glBindBuffer(GL_ARRAY_BUFFER, vbo);
	glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_STATIC_DRAW);

 

 

All I am getting is a blank screen, instead of a large colored square at the centre of the window. What am I doing wrong?


Parallel Octree Construction on GPU

29 September 2012 - 09:53 PM

I am currently trying to construct a voxel octree on the gpu using directx11 compute shader. The only way I can think of how to do this in parallel is to take the entire data set, create 8 new nodes, check (in parallel) if every voxel in the set is less than/greater than each axis, and flag each new node if voxels exist there. So for instance:

struct Voxel
{
float3 pos;
float3 color;
};

struct OctreeNode

{
int voxBufIdx;
int parentIdx;
int childOffset;
int level;
int subdivided;
int terminal;
};

RWStructuredBuffer<OctreeNode> octIn : register(u0);
StructuredBuffer<Voxel> voxBuf : register(t0);



void createNodes(const uint voxIdx, float maxL, int level, uint baseOffset)
{

float L = maxL/pow(2, level);


//Flag subdivision requests

if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y<L && 0<voxBuf[voxIdx].pos.z<L)
   octIn[baseOffset+0].subdivided = 1;
  if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y<L && voxBuf[voxIdx].pos.z>L)
   octIn[baseOffset+1].subdivided = 1;
  if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y<L && voxBuf[voxIdx].pos.z>L)
   octIn[baseOffset+2].subdivided = 1;
  if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y<L && 0<voxBuf[voxIdx].pos.z<L)
   octIn[baseOffset+3].subdivided = 1;
  if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y>L && 0<voxBuf[voxIdx].pos.z<L)
   octIn[baseOffset+4].subdivided = 1;
  if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y>L && voxBuf[voxIdx].pos.z>L)
   octIn[baseOffset+5].subdivided = 1;
  if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y>L && voxBuf[voxIdx].pos.z>L)
   octIn[baseOffset+6].subdivided = 1;
  if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y>L && 0<voxBuf[voxIdx].pos.z<L)
   octIn[baseOffset+7].subdivided = 1;
}


[numthreads(1024, 1, 1)]
void CS_Octree(uint gtidx : SV_GroupIndex, uint3 blockIdx : SV_GroupID)
{

const uint threads = gtidx + blockIdx.x*1024 +blockIdx.y*1024*1024;
const uint nodes = threads;
const uint voxIdx = threads;

float maxL = 512;
int level = 0;
int nodeOff = 0;

OctreeNode root;
  root.voxBufIdx = 0;
  root.parentIdx = 0;
  root.childOffset = 1;
  root.level = 0;
  root.subdivided = 1;
  root.terminal = 0;
octIn[0] = root;
level += 1;[/indent]


if(octIn[0].subdivided == 1)
{
  nodeOff +=1;
  createNodes(voxIdx, maxL, level, nodeOff);
}
...
}


But now I've ran into a roadblock with a couple of problems:

First Problem

The compute shader does not allow recursive functions - so how would I subdivide each of the 8 nodes in level 1 in parallel?

So far, my idea is to count the number of nodes created in the current level (which is 8 for level 1) and add that to the node offset so that in my node buffer, I can add the new nodes to the end of the buffer.
i.e. after creating the 8 new nodes from the root, and checking that the first four nodes require subdivision, my buffer looks like this:

octIn.subdivision = {1,1,1,1,1,0,0,0,0,...}

where index 0 is the root and index 1-8 are the 8 new nodes.

If every first four nodes (at level 2) within each of the four flagged nodes (at level 1) also require subdivision, then I would want my buffer to look like this:

octIn.subdivision = {1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,..}

where index 0 is the root, index 1-8 are the nodes at the first level, and index 9-40 are the child nodes of the first level that have been subdivided (4 nodes subdivided x 8 children = 32 nodes).

The challenge is understanding how to run the createNodes function 4 times in parallel. I am thinking that I can assign a thread per node to check if it has been flagged for subdivision:

if(nodes>8)
return;


if(octIn[1+nodes].subdivided == 1)
{
..//run createNodes function
}

but will this run the createNodes function 4 times in parallel? Would there be a way to use something like InterlockedAdd to do this?

Second Problem

Once I do find a solution to the first problem, there is the issue of reducing the set of voxels in each node so that only the voxels within each node are checked for the subdivision requests.

if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y<L && 0<voxBuf[voxIdx].pos.z<L)

otherwise, if at level 2, four nodes are checked for subdivision in parallel, then there would need to be 4 times the threads (where 1 thread is assigned per voxel) and the number of threads required increases by 2^Level for each Level.

The only way I can think of doing this is by sorting the voxBuf data after each subdivision, so that it resides in a section of memory that corresponds with the node index at each level. For example:

take 500,000 voxels in the voxel buffer.
After the first subdivision, the data would be sorted so that:
voxBuf[0~50,000] contains the voxels that lie in the first node.
voxBuf[50,001~100,000] contains the voxels that lie in the second node.
..
voxBuf[450,000~500,000] contains the voxels that lie in the eighth node.

The entire set would not be evenly divided by eight, so I would have to atomically count the number of voxels in each section of the buffer to determine the offset positions. Then I could use these offset positions to reduce the set of voxels at each level so that within each node, only those voxels will be checked for subdivision, instead of the entire set.

This would require some sort of sorting algorithm that can be done in parallel. I have heard that something like Bitonic Sorting would be good, which is essentially also a 'divide-and-conquer' methodology, similar to quadtrees. If I were to use Bitonic Sorting, I would have to run three passes for each x,y,z axis. This would essentially mean that this entire Octree Construction method requires several 'divide-and-conquer' methods within another 'divide-and-conquer' method at multiple layers.

Surely there is a more efficient and simpler way of constructing an octree in parallel on the gpu?
Can anybody please give me some advice or insights on what I am doing?

[Solved] Deferred Rendered Spotlight Problem

18 August 2012 - 10:29 AM

Hello,

I've been trying to implement a simple deferred shader with a spotlight.

The original forward rendered image looks like this:

Posted Image

But in my deferred shader, when I transfer the lighting code into the deferred shader, it looks like this:

Posted Image

Here is the normal and position buffers I use - both normalized:

Posted Image
Posted Image

The code for GBuffer.hlsl:

#pragma pack_matrix(row_major)
Texture2D Map : register(t0);

cbuffer Matrix : register(b0)
{
matrix world;
matrix view;
matrix proj;
};

SamplerState TextureSampler
{
Filter = MIN_MAG_MIP_LINEAR;
AddressU = Wrap;
AddressV = Wrap;
};

struct VS_IN
{
	float4 pos  : POSITION;
float3 normal : NORMAL;
	float2 tex0 : TEXCOORD;
};

struct PS_IN
{
float4 pos : SV_POSITION;
float2 tex0 : TEXCOORD0;
float3 normal : TEXCOORD1;
float4 worldPos : TEXCOORD2;
};

PS_IN VS_Effect(VS_IN vertex)
{
	PS_IN vsOut = ( PS_IN )0;
	float4 posWorld = mul(vertex.pos, world);
	float4 viewPos = mul(posWorld, view );
	vsOut.pos = mul(viewPos, proj );
	vsOut.tex0 = vertex.tex0;
vsOut.normal = normalize(mul(vertex.normal, (float3x3)world));
vsOut.worldPos = posWorld;
	return vsOut;
}

float4 PS_Color(PS_IN input) : SV_TARGET
{
float4 color = float4(0,0,0,0);
float4 texMap = Map.Sample(TextureSampler, input.tex0);
color = texMap;
return color;
}

float4 PS_Normal(PS_IN input) : SV_TARGET
{
float4 normal = float4(input.normal, 1);
return normal;
}

float4 PS_WorldPos(PS_IN input) : SV_TARGET
{
float4 color = normalize(input.worldPos);
return color;
}

The code for Combined.hlsl:

#pragma pack_matrix(row_major)

Texture2D ColorMap : register(t0);
Texture2D NormalMap : register(t1);
Texture2D WorldPosMap : register(t2);

cbuffer Light : register(b0)
{
float4 lightPos;
float4 lightDir;
float4 lightColor;
float lightRad;
float lightInt;
};

cbuffer Param : register(b1)
{
float4 camPos;
float4 diffColor;
};

SamplerState TextureSampler
{
Filter = MIN_MAG_MIP_POINT;
AddressU = Wrap;
AddressV = Wrap;
};

struct VS_IN
{
	float4 pos  : POSITION;
	float2 tex0 : TEXCOORD;
};

struct PS_IN
{
float4 pos : SV_POSITION;
float2 tex0 : TEXCOORD;
};

PS_IN VS_Effect(uint id: SV_VertexID)
{
	// form a full-screen triangle
	float2 pos = float2(id == 1 ? 2 : 0, id == 2 ? 2 : 0);
	PS_IN output;
	output.pos = float4(pos.x * 2 - 1, 1 - pos.y * 2, 0, 1);
	output.tex0 = pos.xy;
	return output;
}

float4 PS_Effect(PS_IN input) : SV_TARGET
{
float4 color = {0,0,0,0};
float4 colorMap = ColorMap.Sample(TextureSampler, input.tex0);
float4 normal = normalize(NormalMap.Sample(TextureSampler, input.tex0));
float4 worldPos = WorldPosMap.Sample(TextureSampler, input.tex0);

float4 L = normalize(lightPos - worldPos);
float4 D = normalize(lightDir);
if(dot(-L, D) > 0.9f)
{
  float4 N = normal;
  float lambertTerm = max(dot(N,L), 0);

  if(lambertTerm > 0)
  {
   color += colorMap*lightColor*diffColor*lambertTerm;
  }
}
return color;
}

What am I doing wrong?

PARTNERS