Voxel Cone Tracing - Octrees

Started by
13 comments, last by JoeJ 6 years, 10 months ago

So, work in progress.

[sharedmedia=gallery:images:8640]

I believe I've got filtering and brick building correct as of now (I've even checked the 3D texture and it seems correct to me. Anyways I'm using the sampling traversal as of now (which is incorrect, as it doesn't give me any advantage of using octree), but I had to switch back to it (for testing the filtering) due to having something wrong in the ray-octree one (which I'll need to update to cone-octree).

Whole SVO is built on GPU as of now, and the steps are following:


// Build a hierarchy (similar to how it is done in papers)
for (i = 0; i < levels; i++)
{
    FlagNodes();
    AllocNodes();
    InitNodes();
}

// Fill base level with data (as per your description, just done in 3D)
FillLeaves();
// Filter bricks (as we need neighbor information!)
FilterLeaves();

// Build interior nodes (interpolate from lower levels)
for (i = 0; i < levels; i++)
{
    BuildInterior(levels - i - 1);
}

Works like a charm, it is also very fast (I will do actual measurement shortly), and I am over-allocating memory a bit now (as I don't need log2(levels), but log2(levels)-1 depth of tree - as leaves (bricks) actually store one whole level in the 3D texture. I'm going to give figures for performance and memory out once I solve the traversal part (which I'm digging in now).

And I haven't even started optimization yet!

Thanks for advises in this thread. If everything is successful, then my next post will go into blog here on GameDev, with the figures and maybe some demo/code to show.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Advertisement

And no, I still don't have filtering right. I noticed that 2 items will cause problems, so the first one is filtering inside leaves:

Here is a pseudo-example:

[sharedmedia=gallery:images:8641]

This is our starting point - semi transparent nodes are visualization of part of the tree that is empty (therefore those nodes are not really there, and or course we can detect that there are empty nodes). Top left 4 nodes are full. Bottom left 4 nodes are also full (some with data that have alpha = 0).

I will demonstrate filtering along X-axis (as described in paper). So, the first step is:

[sharedmedia=gallery:images:8642]

Writing data to the right neighbor (of course as tree is sparse we can't really access the 'transparent' - non existing nodes). This will be the result:

[sharedmedia=gallery:images:8643]

Now, as the data in right-most voxels need to be the same as in the left-most voxels, we need to copy from right back to the left. Like:

[sharedmedia=gallery:images:8644]

And the problem is:

[sharedmedia=gallery:images:8646]

Obviously as we can't write into non-existent nodes (due to sparse nature of the tree), the values won't match (even when we assume that value was with alpha = 0 in the previous steps). All the data neighboring non-existent nodes will be invalid ending up in a border.

Of course the same problem is hit when computing interior nodes (mip mapped ones) - they will not match properly. Resulting in non-smooth ambient occlusion like this:

[sharedmedia=gallery:images:8648]
The paper sadly doesn't address this problem at all (it describes scenario when the tree is dense, in which case it is working properly, but as soon as tree is sparse, the problems arise).
Any idea how to properly solve this? Obviously the leaf nodes can be solved by detecting non-existent nodes around currently processed node and set values to match (in all 6 directions). But how to perform 'mip-mapping' after (e.g. how to compute interior node values in a way that would make sense)?
My apologize for double-post!

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

I think you do need to interolate first, build the triples second, and only after this is done you can build the tree. This way it can not happen you would need non existing nodes.

Take a look at my 1D example above, it only results in empty cells if at least two adjecent cells from original data are empty.

But i'm still unsure if my example is the best way to solve filtering - it needs 6 times the original memory just for the 1D case.

I'll try another option:


0   4     0   0     8   4     0   0   // rasterized input (with an additional space to show the data we want to group in a node)

// interpolation not necessary, instead we add the right neighour to allow filtering.
// So we get those triples for seamless filtering using range [0.33,0.66]:

  04 0,  00 8,  84 0,  00 0

... ah, much better - only 1.5 times the data, stupid me :) Requires 3 adjacent empty cells to generate an empty voxel.

So if i'm right here, something like mixing colors like you show in your image should not be necessary.

But when you bulid the tree you need to look at neighbours as well to be sure a node will in deed be empty.

The problem with interpolation is, I can't do that beforehand. I have a set of elements from which I'm building a tree (those are voxels, generated for the scene). Now as I know the position of each voxel, I have no idea about its sorroundings (they can be in random order inserted into the set), all I know is their position and data (color, normal, whatever I wish to store). Adding voxels with interpolations would mean that I need to find neighbors in this set (which is actually easily done when octree building is finished).

This will be quite hard, as the tree can't be easily modified. EDIT: Thinking of this, yes technically it is possible to add nodes, removing them might be more tricky - but yes, also possible ... as actually each node represents sparse octree itself. I believe this might be a decent way to handle inserting dynamic objects too ... or maybe different levels of quality (whenever it is necessary). Making this easily usable even for case of 'ball in stadium'.

Each of the elements is inserted into the tree (the tree I generate is sparse), and doing any form of refinement will be hard, if not impossible once tree is generated (it uses parallel algorithm for that):

  1. You start with root node (levels = 1)
  2. Loop (for i = 0 to levels):
  3. For each element - traverse into the 'levels', and flag which nodes are to be split
  4. For each node in current level - if node is flagged, generate 8 child nodes
  5. Allocate generated child nodes
  6. Repeat the loop

Now, I can't really change a tree once it is finished. The point is, I realized that bricks are something different than what I thought they are. The value in the centre is the most important one, and 26 values sorrounding the center are duplicated (yes, it is "memory overkill" in this sense)! The correct version of previous images is (and should be):

[sharedmedia=gallery:images:8649]

The mipmapping part seemed tricky at first (but my algorithm is wrong), what I should actually do is, for child nodes, use their center values (and the values between the centres - which will be 3x3x3 total values! Not 4x4x4!) and average this into the parent node's center. Then put the boundaries to fill boundaries in the parent node.

Now, this will need additional step to perform the interpolations like I did for leaves, to guarantee that the border values in bricks are correct.

[sharedmedia=gallery:images:8650]
Sounds quite complicated though (I didn't have time to measure time figures, as the code is still work in progress, so hopefully it won't be that big problem to compute).
EDIT2: Yes, I can confirm that now it works quite properly (even AO is smooth). The last part I'd like to investigate is traversal.
The sampling one is of course extremely fast if sample count is low, but as stated - unreliable. Increasing sample count can solve problems (basically oversampling everything), but tends to be quite heavy. But what's the point of using SVO when I'm just brute-forcing through right? Which has O(n) complexity.
I've tried ray-AABB intersections and stepping down the tree (using stack-based approach ... basically something similar to what one would use for BVH ray tracing). While reliable, it is extremely heavy if the nodes aren't pushed into stack in correct order (which doesn't seem as straight forward as I thought will be). This should theoretically have O(log(n)) complexity, assuming I put the children to stack in order to process them "along the ray".
I'd also like to try DDA-like approach (as each time I can determine on which level the empty intersected node is, it should be straight forward to step accordingly to next node), as stated previously - if implemented correctly, it should have O(log(n)) complexity of finding next node along "ray", and

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com


The problem with interpolation is, I can't do that beforehand. I have a set of elements from which I'm building a tree (those are voxels, generated for the scene). Now as I know the position of each voxel, I have no idea about its sorroundings (they can be in random order inserted into the set), all I know is their position and data (color, normal, whatever I wish to store). Adding voxels with interpolations would mean that I need to find neighbors in this set (which is actually easily done when octree building is finished).

You could use a grid of indices and populate it while generating the voxels. Assuming this grid fits into memory (256^3 * 32bit = 64MB), neighbourfinding is just a lookup.

Not sure about your tree generation algorithm, personally i do it this way for a similar usecase:

for each level top down:

for each leaf determinate the subdivided octant of its actual parent and increase atomic counters.

Do prefix sum over the counters to determinate new child nodes and link leaf parent pointers to them.

Advantage: Each thread maps to one leaf and is busy even at the top levels, no inner loops. Disadvantage: Needs sync between levels, cumbersome to keep atomic counters in LDS

Another way: https://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/

There is a link to a paper which describes how to convert binary tree results to octree.


I've tried ray-AABB intersections and stepping down the tree (using stack-based approach ... basically something similar to what one would use for BVH ray tracing). While reliable, it is extremely heavy if the nodes aren't pushed into stack in correct order (which doesn't seem as straight forward as I thought will be). This should theoretically have O(log(n)) complexity, assuming I put the children to stack in order to process them "along the ray".

For ray traversal you could use a skip pointer per node instead of a stack. Should be a big win.

EDIT: see here: https://www.cs.utah.edu/~bes/papers/fastRT/paper-node12.html

This topic is closed to new replies.

Advertisement