What the code does is reading 64x64x64 pixels from a 3D texture. If those pixels contain geometry data, it will insert a leaf in the octree. Since the octree isn't filled initially (other than I reserved plenty of space in a VBO), it happens that the octree gets constructed along the way:
1- fetch a pixel, test if it should be inserted
2- Dive into the octree, starting with the rootnode. Get the leaf node at level 8
3- If a node hasn't been divided in 8 subnodes yet, get an unused pointer from the VBO and reserve 8 nodes there. Let the parent node refer to the first childnode.
The main problem is that multiple pixels may try to split octree nodes at the same time. I've seen barriers and such in the OpenCL reference, I guess I should create semaphores around the "if ( node->child1Ptr == 0 ) { ... }" section in the code below, but wouldn't make that the code terribly slow? Asides, but this is something that can't be fixed maybe, creating a semaphor requires support for the atom_xchg operation afaik. Seems my videocard is too old for that.
__kernel void voxelOctree_Build( __global struct TOctreeNode* octree,
__global int* octreeNodeCounter,
__read_only image3d_t dataTex
)
{
// We loop through all the pixels of a (64^3) 3D texture
// We inject each pixel into the octree, eventually creating new subnodes along
// the way
int3 locID = (int3)( get_local_id(0), get_local_id(1), get_local_id(2) );
int4 texCoord = (int4)( locID.x, locID.y, locID.z , 0 );
sampler_t sampler = CLK_FILTER_NEAREST | CLK_ADDRESS_CLAMP;
float4 data = read_imagef( dataTex, sampler, texCoord );
if ( data.w > 0.f ) // Only insert if there is some occlusion at this point
{
// Get root node
__global struct TOctreeNode *node = &octree[0];
// Dive into the octree, eventually split nodes if not done before
int level = 0;
while ( level++ < 8 )
{
// Create 8 sub nodes, if not done before
if ( node->child1Ptr == 0 )
{
// Get an unused index from the octree VBO
int indx1 = *octreeNodeCounter;
node->child1Ptr = (int)(&octree[ indx1 ]); // Store pointer to 1st child node
indx1+= 8; // Reserve 8 nodes
*octreeNodeCounter = indx1; // write back in global var
}
// Check in which of the 8 subnodes we need to dive
int localOffset = 0;
...generate a number between 0..7
// Get node from deeper level, then repeat the loop
node = (__global struct TOctreeNode*)(node->child1Ptr + localOffset);
} // while level > 0
// Now insert the actual data
...
} // occlSum
} // voxelOctree_Build
Dunno how they exactly do it in GigaVoxels, but I could also pre-generate the octree once, entirely "unfolded". That means I don't have to split nodes, making the code faster. However, that will take a lot more memory of course.
Greets,
Rick