• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0

Parallel Octree Construction on GPU

3 posts in this topic

I am currently trying to construct a voxel octree on the gpu using directx11 compute shader. The only way I can think of how to do this in parallel is to take the entire data set, create 8 new nodes, check (in parallel) if every voxel in the set is less than/greater than each axis, and flag each new node if voxels exist there. So for instance:

struct Voxel
float3 pos;
float3 color;

struct OctreeNode

int voxBufIdx;
int parentIdx;
int childOffset;
int level;
int subdivided;
int terminal;

RWStructuredBuffer<OctreeNode> octIn : register(u0);
StructuredBuffer<Voxel> voxBuf : register(t0);

void createNodes(const uint voxIdx, float maxL, int level, uint baseOffset)

float L = maxL/pow(2, level);

//Flag subdivision requests

if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y<L && 0<voxBuf[voxIdx].pos.z<L)
octIn[baseOffset+0].subdivided = 1;
if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y<L && voxBuf[voxIdx].pos.z>L)
octIn[baseOffset+1].subdivided = 1;
if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y<L && voxBuf[voxIdx].pos.z>L)
octIn[baseOffset+2].subdivided = 1;
if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y<L && 0<voxBuf[voxIdx].pos.z<L)
octIn[baseOffset+3].subdivided = 1;
if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y>L && 0<voxBuf[voxIdx].pos.z<L)
octIn[baseOffset+4].subdivided = 1;
if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y>L && voxBuf[voxIdx].pos.z>L)
octIn[baseOffset+5].subdivided = 1;
if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y>L && voxBuf[voxIdx].pos.z>L)
octIn[baseOffset+6].subdivided = 1;
if(0<L<voxBuf[voxIdx].pos.x && voxBuf[voxIdx].pos.y>L && 0<voxBuf[voxIdx].pos.z<L)
octIn[baseOffset+7].subdivided = 1;

[numthreads(1024, 1, 1)]
void CS_Octree(uint gtidx : SV_GroupIndex, uint3 blockIdx : SV_GroupID)

const uint threads = gtidx + blockIdx.x*1024 +blockIdx.y*1024*1024;
const uint nodes = threads;
const uint voxIdx = threads;

float maxL = 512;
int level = 0;
int nodeOff = 0;

OctreeNode root;
root.voxBufIdx = 0;
root.parentIdx = 0;
root.childOffset = 1;
root.level = 0;
root.subdivided = 1;
root.terminal = 0;
octIn[0] = root;
level += 1;[/indent]

if(octIn[0].subdivided == 1)
nodeOff +=1;
createNodes(voxIdx, maxL, level, nodeOff);

But now I've ran into a roadblock with a couple of problems:

[b][u]First Problem[/u][/b]

The compute shader does not allow recursive functions - so how would I subdivide each of the 8 nodes in level 1 in parallel?

So far, my idea is to count the number of nodes created in the current level (which is 8 for level 1) and add that to the node offset so that in my node buffer, I can add the new nodes to the end of the buffer.
i.e. after creating the 8 new nodes from the root, and checking that the first four nodes require subdivision, my buffer looks like this:

octIn.subdivision = {1,1,1,1,1,0,0,0,0,...}

where index 0 is the root and index 1-8 are the 8 new nodes.

If every first four nodes (at level 2) within each of the four flagged nodes (at level 1) also require subdivision, then I would want my buffer to look like this:

octIn.subdivision = {1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,..}

where index 0 is the root, index 1-8 are the nodes at the first level, and index 9-40 are the child nodes of the first level that have been subdivided (4 nodes subdivided x 8 children = 32 nodes).

The challenge is understanding how to run the createNodes function 4 times in parallel. I am thinking that I can assign a thread per node to check if it has been flagged for subdivision:


if(octIn[1+nodes].subdivided == 1)
..//run createNodes function

but will this run the createNodes function 4 times in parallel? Would there be a way to use something like InterlockedAdd to do this?

[u][b]Second Problem[/b][/u]

Once I do find a solution to the first problem, there is the issue of reducing the set of voxels in each node so that only the voxels within each node are checked for the subdivision requests.

if(voxBuf[voxIdx].pos.x<L && voxBuf[voxIdx].pos.y<L && 0<voxBuf[voxIdx].pos.z<L)

otherwise, if at level 2, four nodes are checked for subdivision in parallel, then there would need to be 4 times the threads (where 1 thread is assigned per voxel) and the number of threads required increases by 2^Level for each Level.

The only way I can think of doing this is by sorting the voxBuf data after each subdivision, so that it resides in a section of memory that corresponds with the node index at each level. For example:

take 500,000 voxels in the voxel buffer.
After the first subdivision, the data would be sorted so that:
voxBuf[0~50,000] contains the voxels that lie in the first node.
voxBuf[50,001~100,000] contains the voxels that lie in the second node.
voxBuf[450,000~500,000] contains the voxels that lie in the eighth node.

The entire set would not be evenly divided by eight, so I would have to atomically count the number of voxels in each section of the buffer to determine the offset positions. Then I could use these offset positions to reduce the set of voxels at each level so that within each node, only those voxels will be checked for subdivision, instead of the entire set.

This would require some sort of sorting algorithm that can be done in parallel. I have heard that something like Bitonic Sorting would be good, which is essentially also a 'divide-and-conquer' methodology, similar to quadtrees. If I were to use Bitonic Sorting, I would have to run three passes for each x,y,z axis. This would essentially mean that this entire Octree Construction method requires several 'divide-and-conquer' methods within another 'divide-and-conquer' method at multiple layers.

Surely there is a more efficient and simpler way of constructing an octree in parallel on the gpu?
Can anybody please give me some advice or insights on what I am doing? Edited by gboxentertainment

Share this post

Link to post
Share on other sites
You usually convert the BVH construction problem to a sorting problem. This can be done with the use of morton codes, see for example this paper [url="http://mgarland.org/files/papers/gpubvh.pdf"]http://mgarland.org/...pers/gpubvh.pdf[/url] Bitonic sort and Radix sorts can be implemented very efficiently on modern GPUs,
see here for example for an OpenCL implementation: [url="http://www.bealto.com/gpu-sorting_intro.html"]http://www.bealto.co...ting_intro.html[/url]

Share this post

Link to post
Share on other sites
Thanks for the sorting algorithm links.
I've run into another roadblock:

My octree construction requires that I check each parent node in parallel if it is flagged for subdivision. Then the createNodes function needs to be run in parallel based on these flags.
For instance, after my data is sorted, and the first four nodes are flagged for subdivision, I want createNodes to run for each one simultaneously so that:

Node 1 checks voxBuf[0~50,000] for which side of each axis it lies on.
Node 2 checks voxBuf[50,001~100,000] for which side of each axis it lies on.

In order to do this, I will have to allocate the first 50,000 threads to node 1 and the next 50,000 to node 2 and so on...
Is it possible to allocate threads in this way? Does anyone know how to do this? Edited by gboxentertainment

Share this post

Link to post
Share on other sites
Okay, so now I've decided to go with the Morton-Code sorting octree method.
I have interlaced all of my xyz coordinates into a single-dimensional list of morton codes. Then I have used the Compute Shader Bitonic Sort example to sort this list of morton codes in ascending order.
I can see that the way the Morton-Code is sorted from ascending order essentially represents an Octree. Does anyone know of any examples that implement the octree structure using Morton-Codes in parallel on the gpu?

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
Followers 0