Octree Permforrnance

Started by
7 comments, last by hick18 12 years, 10 months ago
Im using an Octree in my current ray tracing project, so that i can use triangle meshes. Below is a shot of my Octree app. As you can see :

Total Triangles = 192,200
Brute Force Time = 7.1ms
Octree Time = 0.20ms


With an average of 97% performance improvement...But its still really slow for ray tracing, when you consider that each pixel is going to be performing the ray/octree test maybe 20-30 times.

The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?


My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?
Advertisement

Im using an Octree in my current ray tracing project, so that i can use triangle meshes. Below is a shot of my Octree app. As you can see :

Total Triangles = 192,200
Brute Force Time = 7.1ms
Octree Time = 0.20ms


With an average of 97% performance improvement...But its still really slow for ray tracing, when you consider that each pixel is going to be performing the ray/octree test maybe 20-30 times.

The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?


My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?


it looks fairly good, some other structure might give better results than an octree (kd-trees should be faster on average allthough they take slightly longer to build) , is this an optimized release build ? do you use multiple threads ?
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

Im using an Octree in my current ray tracing project, so that i can use triangle meshes. Below is a shot of my Octree app. As you can see :

Total Triangles = 192,200
Brute Force Time = 7.1ms
Octree Time = 0.20ms


With an average of 97% performance improvement...But its still really slow for ray tracing, when you consider that each pixel is going to be performing the ray/octree test maybe 20-30 times.

The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?


My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?


it looks fairly good, some other structure might give better results than an octree (kd-trees should be faster on average allthough they take slightly longer to build) , is this an optimized release build ? do you use multiple threads ?
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

My question is, does that improvment look about right for an Octree, given the above parameters? and would I get better performance using another acceleration structure?

Your octree version is roughly 35 times faster than your brute force version. If your bruteforce version consists of testing all tris vs your ray, then 35x seems not enough.


The tree has 8 levels, and consists of 2 intersection tests, A ray/aabb and ray/triangle. One thing I have noticed is that when performing the ray/aabb test for the octants, the order in which they are processed is the same each time, ie, I dont prefer to test closer octants first. How could you go about doing that?

Compare the node position with the ray start position. A overlapping free octree is a special case of a binary space partition tree (BSP-tree). You can use the same traversal strategy
as in a BSP tree (with virtual axis aligned planes which runs through the node position). In this case you can traverse the tree in-order and should abort once you hit a solid surface.
Btw. you can encode the index of a node tree child as 3 bit value (bit0: is 0 if ray.start.x<=node.x, else 1, bit1 is 0 if ray.start.y<=node.y ....).

For a proper preformance test, try to separate ray/tris checks and oct-tree traversal time. If you still check ~5500 tris(1/35 of scene) per ray you could still have a lighting fast octree traversal.
Show the octree code.
I think you are missing one important optimization between the Ray vs OctTree and the Ray vs Poly test, add one more abstraction layer, a tight fit simple shape, like sphere or AABB or OBB or even a low poly count convex mesh arround each particular mesh for small loose objects would greatly reduce your test times. If it doesn't collide with the simple shape, no need to do a per poly test.

Also the OctTree you describe seems to be a regular one, meaning all branches have the same depth, however if on the second level (one step down from the root) you find that one branch has no objects inside (or barely a couple) there is no point on splitting and traversing that branch two steps deeper. Split a node only if its population is high (some testing and balancing will indicate how much is too much population).

Game making is godlike

LinkedIn profile: http://ar.linkedin.com/pub/andres-ricardo-chamarra/2a/28a/272


I am only partitioning nodes further if they need to, for example, if an octant at a level less than maxlevel has no triangles crossing it, then its child nodes are null.

One thing I have noticed though is that other people only seem to store triangles at the very bottom of the tree, ie only nodes of maxlevel actually store triangles, and that many copies of the triangles are made. In my implementation, when adding a triangle, I stop recursion when the whole triangle wont fit into child octants. So in my implementation the tree has a 1 to 1 ratio of triangles with the original mesh, but triangles are stored at all levels.

I adapted my implementation to instead only store triangles at the bottom level of the tree, sharing them across the octants they intersect. This gives me much better performance as im not testing triangles at each level, only the bottom level, and its around 300-400 times faster than simply testing every triangle. The downside is that it can store a huge amount more times as many triangles as the original mesh, as a triangle that is several times bigger than the octant at the lowest level gets inserted into many many nodes. If however the triangles are on average smaller than the octant at its lowest level it only stores around double the amount of triangles.

I made it more memory efficent by only storing indices, but for meshes with large triangles its still making too many copies. For example, if I have a scene with a triangle that spans with width of the scene (say for example the floor plane), then it gets copied into thousands of nodes.

So my plan is to now store triangles at each level again, but only sudivide nodes if the triangles area is less than that nodes "max triangle area size"
I am only partitioning nodes further if they need to, for example, if an octant at a level less than maxlevel has no triangles crossing it, then its child nodes are null.

One thing I have noticed though is that other people only seem to store triangles at the very bottom of the tree, ie only nodes of maxlevel actually store triangles, and that many copies of the triangles are made. In my implementation, when adding a triangle, I stop recursion when the whole triangle wont fit into child octants. So in my implementation the tree has a 1 to 1 ratio of triangles with the original mesh, but triangles are stored at all levels.

I adapted my implementation to instead only store triangles at the bottom level of the tree, sharing them across the octants they intersect. This gives me much better performance as im not testing triangles at each level, only the bottom level, and its around 300-400 times faster than simply testing every triangle. The downside is that it can store a huge amount more times as many triangles as the original mesh, as a triangle that is several times bigger than the octant at the lowest level gets inserted into many many nodes. If however the triangles are on average smaller than the octant at its lowest level it only stores around double the amount of triangles.

I made it more memory efficent by only storing indices, but for meshes with large triangles its still making too many copies. For example, if I have a scene with a triangle that spans with width of the scene (say for example the floor plane), then it gets copied into thousands of nodes.

So my plan is to now store triangles at each level again, but only sudivide nodes if the triangles area is less than that nodes "max triangle area size"
I am only partitioning nodes further if they need to, for example, if an octant at a level less than maxlevel has no triangles crossing it, then its child nodes are null.

One thing I have noticed though is that other people only seem to store triangles at the very bottom of the tree, ie only nodes of maxlevel actually store triangles, and that many copies of the triangles are made. In my implementation, when adding a triangle, I stop recursion when the whole triangle wont fit into child octants. So in my implementation the tree has a 1 to 1 ratio of triangles with the original mesh, but triangles are stored at all levels.

I adapted my implementation to instead only store triangles at the bottom level of the tree, sharing them across the octants they intersect. This gives me much better performance as im not testing triangles at each level, only the bottom level, and its around 300-400 times faster than simply testing every triangle. The downside is that it can store a huge amount more times as many triangles as the original mesh, as a triangle that is several times bigger than the octant at the lowest level gets inserted into many many nodes. If however the triangles are on average smaller than the octant at its lowest level it only stores around double the amount of triangles.

I made it more memory efficent by only storing indices, but for meshes with large triangles its still making too many copies. For example, if I have a scene with a triangle that spans with width of the scene (say for example the floor plane), then it gets copied into thousands of nodes.

So my plan is to now store triangles at each level again, but only sudivide nodes if the triangles area is less than that nodes "max triangle area size"
Depending on where your bottleneck is, the branches it takes to do an 8 way test for an octree can be brutal. Each jump or test into the leaves also typically consists of L2 cache misses on top of that. A lot of data sets are relatively flat, and a quadtree could probably do the job just fine. It is worth trying at least.

As for prefering near nodes first, its should just be a matter of recursing into the nodes in an order based on the sign of the ray's X Y an Z values, so compute which nodes you need to recurse into all at once, then take that list and process them in the order dictated by the sign of of XY&Z.

Building up a full list of nodes to process into an array before processing any of them can also be faster, since prefetching the data is a lot easier when working with arrays than it is with trees and lists. Provided this list is sorted you can then early out just as easily, or turn around and take the array and feed it to a threaded work queue to solve.
http://www.gearboxsoftware.com/

This topic is closed to new replies.

Advertisement