Jump to content

  • Log In with Google      Sign In   
  • Create Account

#Actualspek

Posted 14 November 2012 - 08:36 AM

A bit like "GigaVoxels", I'm trying to generate an octree on the fly by using a (OpenCL) compute shader. The code (incomplete) code below should do that, BUT... obviously it's not thread-safe yet. So, I'm asking what I should do to make it safe, yet still somewhat fast.

What the code does is reading 64x64x64 pixels from a 3D texture. If those pixels contain geometry data, it will insert a leaf in the octree. Since the octree isn't filled initially (other than I reserved plenty of space in a VBO), it happens that the octree gets constructed along the way:
1- fetch a pixel, test if it should be inserted
2- Dive into the octree, starting with the rootnode. Get the leaf node at level 8
3- If a node hasn't been divided in 8 subnodes yet, get an unused pointer from the VBO and reserve 8 nodes there. Let the parent node refer to the first childnode.

The main problem is that multiple pixels may try to split octree nodes at the same time. I've seen barriers and such in the OpenCL reference, I guess I should create semaphores around the "if ( node->child1Ptr == 0 ) { ... }" section in the code below, but wouldn't make that the code terribly slow? Asides, but this is something that can't be fixed maybe, creating a semaphor requires support for the atom_xchg operation afaik. Seems my videocard is too old for that.
__kernel void voxelOctree_Build(  __global struct TOctreeNode* octree,
	  __global int* octreeNodeCounter,
	  __read_only image3d_t dataTex
    )
{
    // We loop through all the pixels of a (64^3) 3D texture
    // We inject each pixel into the octree, eventually creating new subnodes along
    // the way
    int3  locID  = (int3)( get_local_id(0), get_local_id(1), get_local_id(2) );
    int4 texCoord = (int4)( locID.x, locID.y, locID.z , 0 );
    sampler_t sampler  = CLK_FILTER_NEAREST | CLK_ADDRESS_CLAMP;
    float4 data   = read_imagef( dataTex, sampler, texCoord );

    if ( data.w > 0.f ) // Only insert if there is some occlusion at this point
    {
		  // Get root node
		  __global struct TOctreeNode *node = &octree[0];
		  // Dive into the octree, eventually split nodes if not done before
		  int level = 0;
		  while ( level++ < 8 )
		  {
			   // Create 8 sub nodes, if not done before
			   if ( node->child1Ptr == 0 )
			   {
				    // Get an unused index from the octree VBO
				    int  indx1 = *octreeNodeCounter;
				    node->child1Ptr = (int)(&amp;octree[ indx1 ]); // Store pointer to 1st child node
				    indx1+= 8; // Reserve 8 nodes
				    *octreeNodeCounter = indx1; // write back in global var
			   }
			  // Check in which of the 8 subnodes we need to dive
			  int localOffset = 0;
			  ...generate a number between 0..7
			  // Get node from deeper level, then repeat the loop
			  node = (__global struct TOctreeNode*)(node->child1Ptr + localOffset);
		  } // while level > 0

		  // Now insert the actual data
		  ...
    } // occlSum
} // voxelOctree_Build
Dunno how they exactly do it in GigaVoxels, but I could also pre-generate the octree once, entirely "unfolded". That means I don't have to split nodes, making the code faster. However, that will take a lot more memory of course.

Greets,
Rick

#2spek

Posted 14 November 2012 - 08:33 AM

A bit like "GigaVoxels", I'm trying to generate an octree on the fly by using a (OpenCL) compute shader. The code (incomplete) code below should do that, BUT... obviously it's not thread-safe yet. So, I'm asking what I should do to make it safe, yet still somewhat fast.

What the code does is reading 64x64x64 pixels from a 3D texture. If those pixels contain geometry data, it will insert a leaf in the octree. Since the octree isn't filled initially (other than I reserved plenty of space in a VBO), it happens that the octree gets constructed along the way:
1- fetch a pixel, test if it should be inserted
2- Dive into the octree, starting with the rootnode. Get the leaf node at level 8
3- If a node hasn't been divided in 8 subnodes yet, get an unused pointer from the VBO and reserve 8 nodes there. Let the parent node refer to the first childnode.

The main problem is that multiple pixels may try to split octree nodes at the same time. I've seen barriers and such in the OpenCL reference, I guess I should create semaphores around the "if ( node->child1Ptr == 0 ) { ... }" section in the code below, but wouldn't make that the code terribly slow? Asides, but this is something that can't be fixed maybe, creating a semaphor requires support for the atom_xchg operation afaik. Seems my videocard is too old for that.
__kernel void voxelOctree_Build(  __global struct TOctreeNode* octree,
	  __global int* octreeNodeCounter,
	  __read_only image3d_t dataTex
    )
{
    // We loop through all the pixels of a (64^3) 3D texture
    // We inject each pixel into the octree, eventually creating new subnodes along
    // the way
    int3  locID  = (int3)( get_local_id(0), get_local_id(1), get_local_id(2) );
    int4 texCoord = (int4)( locID.x, locID.y, locID.z , 0 );
    sampler_t sampler  = CLK_FILTER_NEAREST | CLK_ADDRESS_CLAMP;
    float4 data   = read_imagef( dataTex, sampler, texCoord );

    if ( data.w > 0.f ) // Only insert if there is some occlusion at this point
    {
		  // Get root node
		  __global struct TOctreeNode *node = &amp;octree[0];
		  // Dive into the octree, eventually split nodes if not done before
		  int level = 0;
		  while ( level++ < 8 )
		  {
			   // Create 8 sub nodes, if not done before
			   if ( node->child1Ptr == 0 )
			   {
				    // Get an unused index from the octree VBO
				    int  indx1 = *octreeNodeCounter;
				    node->child1Ptr = (int)(&amp;octree[ indx1 ]); // Store pointer to 1st child node
				    indx1+= 8; // Reserve 8 nodes
				    *octreeNodeCounter = indx1; // write back in global var
			   }
			  // Check in which of the 8 subnodes we need to dive
			  int localOffset = 0;
			  ...generate a number between 0..7
			  // Get node from deeper level, then repeat the loop
			  node = (TOctreeNode*)(node->child1Ptr + localOffset);
		  } // while level > 0

		  // Now insert the actual data
		  ...
    } // occlSum
} // voxelOctree_Build
Dunno how they exactly do it in GigaVoxels, but I could also pre-generate the octree once, entirely "unfolded". That means I don't have to split nodes, making the code faster. However, that will take a lot more memory of course.

Greets,
Rick

#1spek

Posted 14 November 2012 - 08:26 AM

A bit like "GigaVoxels", I'm trying to generate an octree on the fly by using a (OpenCL) compute shader. The code (incomplete) code below should do that, BUT... obviously it's not thread-safe yet. So, I'm asking what I should do to make it safe, yet still somewhat fast.

What the code does is reading 64x64x64 pixels from a 3D texture. If those pixels contain geometry data, it will insert a leaf in the octree. Since the octree isn't filled initially (other than I reserved plenty of space in a VBO), it happens that the octree gets constructed along the way:
1- fetch a pixel, test if it should be inserted
2- Dive into the octree, starting with the rootnode. Get the leaf node at level 8
3- If a node hasn't been divided in 8 subnodes yet, get an unused pointer from the VBO and reserve 8 nodes there. Let the parent node refer to the first childnode.

The main problem is that multiple pixels may try to split octree nodes at the same time. I've seen barriers and such in the OpenCL reference, I guess I should create semaphores around the "if ( node->child1Ptr == 0 ) { ... }" section in the code below, but wouldn't make that the code terribly slow? Asides, but this is something that can't be fixed maybe, creating a semaphor requires support for the atom_xchg operation afaik. Seems my videocard is too old for that.
__kernel void voxelOctree_Build(  __global struct TOctreeNode* octree,
	  __global int* octreeNodeCounter,
	  __read_only image3d_t dataTex
    )
{
    // We loop through all the pixels of a (64^3) 3D texture
    // We inject each pixel into the octree, eventually creating new subnodes along
    // the way
    int3  locID  = (int3)( get_local_id(0), get_local_id(1), get_local_id(2) );
    int4 texCoord = (int4)( locID.x, locID.y, locID.z , 0 );
    sampler_t sampler  = CLK_FILTER_NEAREST | CLK_ADDRESS_CLAMP;
    float4 data   = read_imagef( dataTex, sampler, texCoord );

    if ( data.w > 0.f ) // Only insert if there is some occlusion at this point
    {
		  // Get root node
		  __global struct TOctreeNode *node = &octree[0];
		  // Dive into the octree, eventually split nodes if not done before
		  int level = 0;
		  while ( level++ < 8 )
		  {
			   // Create 8 sub nodes, if not done before
			   if ( node->child1Ptr == 0 )
			   {
				    // Get an unused index from the octree VBO
				    int  indx1 = octreeNodeCounter*;
				    node->child1Ptr = (int)(&octree[ indx1 ]); // Store pointer to 1st child node
				    indx1+= 8; // Reserve 8 nodes
				    octreeNodeCounter* = indx1; // write back in global var
			   }
			  // Check in which of the 8 subnodes we need to dive
			  int localOffset = 0;
			  ...generate a number between 0..7
			  // Get node from deeper level, then repeat the loop
			  node = (TOctreeNode*)(node->child1Ptr + localOffset);
		  } // while level > 0

		  // Now insert the actual data
		  ...
    } // occlSum
} // voxelOctree_Build
Dunno how they exactly do it in GigaVoxels, but I could also pre-generate the octree once, entirely "unfolded". That means I don't have to split nodes, making the code faster. However, that will take a lot more memory of course.

Greets,
Rick

PARTNERS