Sign in to follow this  
hick18

Octree Permforrnance

Recommended Posts

hick18    102
Im using an Octree in my current ray tracing project, so that i can use triangle meshes. Below is a shot of my Octree app. As you can see :

Total Triangles = 192,200
Brute Force Time = 7.1ms
Octree Time = 0.20ms


With an average of 97% performance improvement...But its still really slow for ray tracing, when you consider that each pixel is going to be performing the ray/octree test maybe 20-30 times.

The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?


My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?

Share this post


Link to post
Share on other sites
SimonForsman    7642
[quote name='hick18' timestamp='1306008843' post='4813955']
Im using an Octree in my current ray tracing project, so that i can use triangle meshes. Below is a shot of my Octree app. As you can see :

Total Triangles = 192,200
Brute Force Time = 7.1ms
Octree Time = 0.20ms


With an average of 97% performance improvement...But its still really slow for ray tracing, when you consider that each pixel is going to be performing the ray/octree test maybe 20-30 times.

The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?


My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?
[/quote]

it looks fairly good, some other structure might give better results than an octree (kd-trees should be faster on average allthough they take slightly longer to build) , is this an optimized release build ? do you use multiple threads ?

Share this post


Link to post
Share on other sites
SimonForsman    7642
Hidden
[quote name='hick18' timestamp='1306008843' post='4813955']
Im using an Octree in my current ray tracing project, so that i can use triangle meshes. Below is a shot of my Octree app. As you can see :

Total Triangles = 192,200
Brute Force Time = 7.1ms
Octree Time = 0.20ms


With an average of 97% performance improvement...But its still really slow for ray tracing, when you consider that each pixel is going to be performing the ray/octree test maybe 20-30 times.

The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?


My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?
[/quote]

it looks fairly good, some other structure might give better results than an octree (kd-trees should be faster on average allthough they take slightly longer to build) , is this an optimized release build ? do you use multiple threads ?

Share this post


Link to post
Ashaman73    13715
[quote name='hick18' timestamp='1306008843' post='4813955']
My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?
[/quote]
Your octree version is roughly 35 times faster than your brute force version. If your bruteforce version consists of testing all tris vs your ray, then 35x seems not enough.

[quote name='hick18' timestamp='1306008843' post='4813955']
The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?
[/quote]
Compare the node position with the ray start position. A overlapping free octree is a special case of a binary space partition tree (BSP-tree). You can use the same traversal strategy
as in a BSP tree (with virtual axis aligned planes which runs through the node position). In this case you can traverse the tree in-order and should abort once you hit a solid surface.
Btw. you can encode the index of a node tree child as 3 bit value (bit0: is 0 if ray.start.x<=node.x, else 1, bit1 is 0 if ray.start.y<=node.y ....).

For a proper preformance test, try to separate ray/tris checks and oct-tree traversal time. If you still check ~5500 tris(1/35 of scene) per ray you could still have a lighting fast octree traversal.

Share this post


Link to post
Share on other sites
NEXUSKill    475
I think you are missing one important optimization between the Ray vs OctTree and the Ray vs Poly test, add one more abstraction layer, a tight fit simple shape, like sphere or AABB or OBB or even a low poly count convex mesh arround each particular mesh for small loose objects would greatly reduce your test times. If it doesn't collide with the simple shape, no need to do a per poly test.

Also the OctTree you describe seems to be a regular one, meaning all branches have the same depth, however if on the second level (one step down from the root) you find that one branch has no objects inside (or barely a couple) there is no point on splitting and traversing that branch two steps deeper. Split a node only if its population is high (some testing and balancing will indicate how much is too much population).

Share this post


Link to post
Share on other sites
maya18222    194
Hidden
I am only partitioning nodes further if they need to, for example, if an octant at a level less than maxlevel has no triangles crossing it, then its child nodes are null.

One thing I have noticed though is that other people only seem to store triangles at the very bottom of the tree, ie only nodes of maxlevel actually store triangles, and that many copies of the triangles are made. In my implementation, when adding a triangle, I stop recursion when the whole triangle wont fit into child octants. So in my implementation the tree has a 1 to 1 ratio of triangles with the original mesh, but triangles are stored at all levels.

I adapted my implementation to instead only store triangles at the bottom level of the tree, sharing them across the octants they intersect. This gives me much better performance as im not testing triangles at each level, only the bottom level, and its around 300-400 times faster than simply testing every triangle. The downside is that it can store a huge amount more times as many triangles as the original mesh, as a triangle that is several times bigger than the octant at the lowest level gets inserted into many many nodes. If however the triangles are on average smaller than the octant at its lowest level it only stores around double the amount of triangles.

I made it more memory efficent by only storing indices, but for meshes with large triangles its still making too many copies. For example, if I have a scene with a triangle that spans with width of the scene (say for example the floor plane), then it gets copied into thousands of nodes.

So my plan is to now store triangles at each level again, but only sudivide nodes if the triangles area is less than that nodes "max triangle area size"

Share this post


Link to post
maya18222    194
Hidden
I am only partitioning nodes further if they need to, for example, if an octant at a level less than maxlevel has no triangles crossing it, then its child nodes are null.

One thing I have noticed though is that other people only seem to store triangles at the very bottom of the tree, ie only nodes of maxlevel actually store triangles, and that many copies of the triangles are made. In my implementation, when adding a triangle, I stop recursion when the whole triangle wont fit into child octants. So in my implementation the tree has a 1 to 1 ratio of triangles with the original mesh, but triangles are stored at all levels.

I adapted my implementation to instead only store triangles at the bottom level of the tree, sharing them across the octants they intersect. This gives me much better performance as im not testing triangles at each level, only the bottom level, and its around 300-400 times faster than simply testing every triangle. The downside is that it can store a huge amount more times as many triangles as the original mesh, as a triangle that is several times bigger than the octant at the lowest level gets inserted into many many nodes. If however the triangles are on average smaller than the octant at its lowest level it only stores around double the amount of triangles.

I made it more memory efficent by only storing indices, but for meshes with large triangles its still making too many copies. For example, if I have a scene with a triangle that spans with width of the scene (say for example the floor plane), then it gets copied into thousands of nodes.

So my plan is to now store triangles at each level again, but only sudivide nodes if the triangles area is less than that nodes "max triangle area size"

Share this post


Link to post
hick18    102
I am only partitioning nodes further if they need to, for example, if an octant at a level less than maxlevel has no triangles crossing it, then its child nodes are null.

One thing I have noticed though is that other people only seem to store triangles at the very bottom of the tree, ie only nodes of maxlevel actually store triangles, and that many copies of the triangles are made. In my implementation, when adding a triangle, I stop recursion when the whole triangle wont fit into child octants. So in my implementation the tree has a 1 to 1 ratio of triangles with the original mesh, but triangles are stored at all levels.

I adapted my implementation to instead only store triangles at the bottom level of the tree, sharing them across the octants they intersect. This gives me much better performance as im not testing triangles at each level, only the bottom level, and its around 300-400 times faster than simply testing every triangle. The downside is that it can store a huge amount more times as many triangles as the original mesh, as a triangle that is several times bigger than the octant at the lowest level gets inserted into many many nodes. If however the triangles are on average smaller than the octant at its lowest level it only stores around double the amount of triangles.

I made it more memory efficent by only storing indices, but for meshes with large triangles its still making too many copies. For example, if I have a scene with a triangle that spans with width of the scene (say for example the floor plane), then it gets copied into thousands of nodes.

So my plan is to now store triangles at each level again, but only sudivide nodes if the triangles area is less than that nodes "max triangle area size"

Share this post


Link to post
Share on other sites
Zoner    232
Depending on where your bottleneck is, the branches it takes to do an 8 way test for an octree can be brutal. Each jump or test into the leaves also typically consists of L2 cache misses on top of that. A lot of data sets are relatively flat, and a quadtree could probably do the job just fine. It is worth trying at least.

As for prefering near nodes first, its should just be a matter of recursing into the nodes in an order based on the sign of the ray's X Y an Z values, so compute which nodes you need to recurse into all at once, then take that list and process them in the order dictated by the sign of of XY&Z.

Building up a full list of nodes to process into an array before processing any of them can also be faster, since prefetching the data is a lot easier when working with arrays than it is with trees and lists. Provided this list is sorted you can then early out just as easily, or turn around and take the array and feed it to a threaded work queue to solve.

Share this post


Link to post
Share on other sites
hick18    102
Thanks for the reply. When you say test the rays X,Y and Z to figure out which node to test first do you mean the origin of the ray or its direction?

I thought that by using the origin of the ray to test each Octant in order I could exit early, by exiting with the first octant containing triangles. That is, finding the closest octant to the ray using the ray origin, and if it is a leaf node then it contains the closest triangle, exit with the closest triangle in this leaf. Unfortunatly, this doesnt work in the following case -

[attachment=3116:www.jpg]

Is there any way around this than to either

a) Not perform the early exit
b) spilt triangles that straddle planes

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this