Voxel Cone Tracing - Octrees

Started by
13 comments, last by JoeJ 6 years, 10 months ago

And it's again me - I'd like to first say thanks for all the hints on the 3D texture mip mapping, which allowed me to do some great experiments with voxel cone tracing.

Now, I already had some time to move further and implement octree construction on GPU which is quite fast, along with populating octree with the data. So far so good, and I'm even able to search for voxel on position in octree using a function now. So why am I writing here? It is simple, as I'm going to cast rays (and later cones) in the octree:

I'd like to know whether it has sense to implement full ray-octree traversal (e.g. stack-based traversal that would actually intersect the nodes), or whether I should just do the 'sampling method' (e.g. do N steps along ray and accumulate results)?

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Advertisement

i'd say do full/real/whateveryoucallit-traversal so you won't miss any nodes alone the ray. i played around with gpu/octree/tracing a while ago and found that the fixed-step-size approach ain't very good (imo). you either have a too large stepsize, which may miss corners or thin objects in the tree, or a too tight stepping, which is just slow due to oversampling where it is not needed.

as said it has been a while but if i remember correctly i settled for something like:

- traverse tree to find node that contains ray's origin or ray's entry-point into the tree (if ray starts outside the bbox of tree)

- check found node for opacity and so on if fully opaque, finish traversal and draw pixel.

- find the side of the current node on which the ray would exit.

- find the smallest non-empty node touching that side of the current node.

- repeat

but performance of that depends on the cache-friendliness of your tree towards that method, and among other things depends on if all your leafes are at the same/deepest level.

that kind of ends up being some sort of adaptive-step-size-thingy, where the step size is always the distance to the next good node, however deep that node might be.

no idea how well that turns into cone-tracing. and when doing cone-tracing fixed step-size based on the size of the cone might work just as well. imagine the cone being filled with ever-larger balls, each overlapping to a certain amount. that would give you a step-size dependend on distance to ray's origin, and it still shouldn't miss many corners/thins depending on the amount of overlap of the "query-spheres" you use within the cone.

edit: however for conetracing you might need a different structure for the octree to use it efficiently - so i wouldn't plan to just slap that on later. might be better of trying to do that right from the start and plan you tree accordingly.

Thanks for reply!

My octree actually point to a 'brick' per each node, incl. leaves (which is 3x3x3 voxels stored in 3D texture, to allow for hardware trilinear filtering). I implemented this based on Voxel Cone Tracing paper.

The problem is, that all papers that went into octree voxel cone tracing didn't pay too much attention into traversal, which brings me to either:

  • They use just sampling (which is extremely simple, yet as you said has major flaws)
  • Octree traversal isn't really hard one, so they don't even consider mentioning it (yet, once you work with cones - you basically have some max. level you want to traverse into, which changes based on your cone angle ... which brings in some complexity)
  • They don't want to publish it (which I don't really believe, anyone who works a bit with ray tracing can put it together in few days)

I'll put some effort into full traversal then, and share some results here.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

The octree structure, if I'm not mistaking is there to let you not do full traversal yet have no missed nodes, so you don't really have to go through all the n voxels in the ray path, but only lg(n) of them, which makes the process way faster. Otherwise I wouldn't see a need for actual octree structure.

Thanks for the replies.

#IYP - you're correct, yes the point is to make the process faster than going literally 'through all cells'

I believe I've got it correct to some extent. Let me share a screenshot here:

[sharedmedia=gallery:images:8634]

As you can see, the data are not really filtered - I've followed the paper - https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf

Before I go to my questions and notes - let me outline the algorithm here, so that you might have some ideas where I did something wrong or right. So, my octree building algorithm goes as following:


Graphics queue:
VoxelOctree::Generate()
{
    // Simply render the scene, storing fragments (position + color) in UAV
    // Use counter buffer to count number of voxels
    RenderScene();
}

Compute queue:
VoxelOctree::BuildSVO()
{
    levels = log2(dimensions)
    for (i = 0; i < levels; i++)
    {
        // Flag nodes that are going to be subdivided
        OctreeNodeFlag();

        // For each node on current level, that has been flagged, allocate 8 children
        OctreeNodeAlloc();

        // Initialize children that were generated
        OctreeNodeInit();
    }

    // At this point I'm absolutely sure that octree structure is working properly!
    // But how to fill the data???
    // Next to octree I have a data buffer which currently has N*rgba8 field (where N is 
    // the total number of nodes in octree). Octree node index basically matches the color
    // in this array.
    //
    // Following function just clear octree color and fill single color per node (using 
    // atomics and approximate average allows us to fill EACH node to which voxel maps)
    OctreeNodeClear();
    OctreeNodeFill();
}

My understanding of bricks (e.g. how the data should be stored originally) is to allocate large structured buffer which would have 3x3x3 voxel space per each node in the octree (that means 27 times the size of single color per node), which would allow me to the filtering.

So what I had? I simply ran another kernel for each leaf node, that would search what is in 26 voxel surrounding (searching the whole tree from top) the center and filling those values. Which tends to be quite slow. Plus I'm not entirely sure how should I filter it (I stored it in the structured buffer and attempted to do trilinear filtering, without too much success though).

So my real question is, what should I store in those 'bricks' and how to store & obtain data for them (without taking too long time)? Once stored, I still need to perform filtering (which can be done by using 3D texture and pointing to voxels in center of each 3x3x3 voxels in that ~ I can figure out indexing, but I still have problem with obtaining data for them and potentially also storing them in interior nodes of the tree). The paper doesn't really mention how to fill data into the 'bricks', and we don't really have any information about ALL neighbors (just 7 others, that are in the same parent of the current nodes, but what about the rest?).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Looking at the original paper they do not need to filter across bricks because they store 2x2x2 data in 3x3x3 bricks, so they avoid the problem but use more memory.

https://research.nvidia.com/sites/default/files/publications/GIVoxels-pg2011-authors.pdf

I assume sparse virtual 3D texturing tiled resource features (not sure which name to pick) would do the filtering in hardware. Did you consider this? (AMD does not have it yet)

You could read up Remedys paper about lighting in Quantum Break. They talked about filtering across full/empty or cells with different LODs IIRC - may be related.

Thanks for the response. I've read through Quantum Break paper but it doesn't really help much.

I'm not really against increasing memory footprint, I just don't have any idea how to store 2x2x2 data in 3x3x3 brick (e.g. basically a brick would hold all 8 voxels in a leaf). Based on my thinking it has to be 4x4x4 brick (you have 2x2x2 interior composed of those 8 values + the border), which doesn't sound that bad.

EDIT: They do have this paragraph:

To solve this problem, we propose to use 3×3×3 voxel bricks, but assume that the voxel centers are located at the node corners instead of the node centers (Fig. left). This ensures that interpolated values can always be computed inside a brick covering a set of 2×2×2 nodes. Redundancy is not eliminated, but we use less than half the amount of memory compared to adding a boundary, for the same sampling precision

This doesn't really make any sense to me. As in this sense I need to have a function like:


// This is pseudo-code! But will do as example.
// I'm looping through all voxels and storing them in tree, I can find out respective brick without any problem
// But I have no idea how to store the data!
//
// brick here points to one brick in which I'm storing current voxel that is in processing
// color is the value I want to store
// location is index into X, Y and Z in voxel array
void StoreColorInBrick(uint brick[3][3][3], uint color, uint location[3])
{
    // ... Now assuming my location is:
    // location[0] % 2 determines whether current voxel color is to the left or right (e.g. on x-axis)
    // location[1] % 2 determines whether current voxel color is to the top or bottom (e.g. on y-axis)
    // location[2] % 2 determines whether current voxel color is to the front or back (e.g. on z-axis)
    //
    // For simplicity let's assume I'm doing lower left corner of some node, where do I store color?
    // should it be brick[0][0][0]?
    // or added that belong to lower left (4 nodes), e.g. to brick[0..1][0..1][0..1] .. in which case 
    // how to handle the parts that are not in corner (there will be more values written most likely).
    // Average them? Sum them?
}

I believe I can make the interpolation work after (for each brick, I can find neighboring brick (if there isn't any, that means there are zeroes ~ that node is empty)). I assume the interpolation would then be only for boundary voxels standard sum/2 for 2 that are directly neighboring between 2 bricks.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

I did not try to follow your code, tried to make a 1D example instead:


  0   4   0   0   8   4   0   0  // rasterized input
    2   2   0   4   6   2   0  // interpolated

  // resulteing triples for seamless filtering using range [0.33,0.66]:
  024,242,420,200,000,004,048,486,864,642,420,200,000

This should work, but it really needs a lot of memory :(

Not sure if i miss something here, you would need a lot of empty space to make the octree a win with this.

Also i did not think how this extends to building a tree.

EDIT: Stupid way to solve this, better example below (!)

No worries, I got the idea. It makes sense but you're right, it will have quite large memory impact (yet doing trilinear filtering in shader seems to have more impact).

The octree tends to be quite sparse (For Sponza, the occupancy is about 3-4% (and it tends to be groupped) so there tends to be quite a lot of empty nodes even higher in the hierarchy. Although this might not be general rule for any scene.

Anyways, time to do some coding. Thanks!

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Probably it makes sense to use octree only for the highest detail levels, and use reguler 3D texture at low detail.

This topic is closed to new replies.

Advertisement