Sign in to follow this  

Kd compile times

This topic is 4728 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

What kind of compile times are you guys getting for kd-trees (per 1000 triangles)? I'm using N^2 SAH, and my trees are very good quality, but it's taking about 90 seconds just to build 10,000 triangles! An obvious attempt at improvement would be if there are more than (for example) 1000 triangles in a node, to just choose 1000 at random and select the best among those. I assume people have tried this approach? What results have you had? Theoretically, it improves your performance to O(N)... But if you have a million triangles to divide, your first few cuts may be poor (though not too bad, i think).

Share this post


Link to post
Share on other sites
Boy, I had a stupid bug. I was testing ALL the triangles as possible split positions, not just the ones inside the node. I also sped up my SAH code... extremely fast. I doubt it could be faster.

And now I limit split plane test to a random 600 per node. This gives me 4 secs for 300,000 triangles... more what I was hoping for :)

Share this post


Link to post
Share on other sites
Another one working on ray tracing? ;)

My timings: 10 minutes for the 1.1M buddha, with full prim-to-node clipping and consideration of all interesting split plane positions. SAH, also. The Stanford Bunny is subdivided in about 15 seconds (70k tris).

Share this post


Link to post
Share on other sites
What is the order of your "interesting split plane positions" algorithm? Right now, I've settled on capping # split positions to 1200 per node (and really it's just 6 planes of 200 random prims).

It seems to work ok but I can't make a good qualitative comparison test because I haven't implemented RTA yet. With that big, fat hack I run at about 4.2 secs / 180k tris (ad hoc: 16, 8) and 6.3 secs at (20, 4) criteria.

I've sped up SAH as much as I can, but clearly an "interesting split plane" selection algorithm would be best. However, the random sample split algorithm does give O(N) characteristic: 50 sec for 1.2M (20,4) It would be hard to see that go :)

"Prim to Node Clipping" ... is that repositioning the prim AABB to just the portion that is in the node? Do you use that for intersection or just tree compile? I'm using indexed primitives, so I can't have different primitive characteristics per Node :(

Share this post


Link to post
Share on other sites
I use the following algorithm:

When I wish to subdivide a voxel, I loop over the primitives (triangles, in my case) and clip them against the original voxel. This way I can also generate split plane candidates where the triangle leaves the voxel. Next, I store all these positions in a linked list, thereby sorting them from 'min' to 'max' (e.g., left-to-right for the x-axis) and removing duplicates. Then, I calculate for each candidate position how many triangles there are on the left side and on the right side of this candidate plane. Once that is done, I calculate the SAH score for that split plane position.

The above process is done for all three axii, and then the best score is used to determine the best split plane position and orientation.

So, no tricks, all positions are evaluated. If I subdivide up to a depth of 20, I am done in less than 5 minutes for the Happy Buddha (1089k triangles), but that gives me lots of primitives in the leafs, so I subdivide up to a depth of 45. This takes longer, but ray tracing is much faster after that (I can render the buddha at 400x600 in about 600ms, with one light source (point)).

- Jacco.

Share this post


Link to post
Share on other sites
45!!! Holy crap. Mine starts to hit the hard drive at around 32, and then it's 'Game Over' :)

I intend to use it for collision detection, so a max_depth of 20 is not unreasonable. Which SAH equation do you use? I've just put in a late cutting-off-empties, and it improved my empty-space use dramatically with a very small compile-time cost... even with empty-culling several passes.

I'm curious: what's the difference in rendering times for Budda@20 vs Budda@45? And how expensive is your ray-prim test?

Share this post


Link to post
Share on other sites
I have to admit that the Buddha @ tree depth = 45 does require a substantial amount of memory (it was 299Mb last week, but I brought it down slightly over the weekend, so it's 250 or so now I guess), but I had to do this in order to limit the number of primitives per leaf. I'm not sure if it's using the full allowed depth though; gotta check that.

I use the basic SA/SV formula for the SAH, no special things to favour empty space or whatever. I wasn't aware of a lot of different options here (although one guy suggested doubling the score for non-empy voxels with 20 prims or more to favour empty space). I still must try this cutting of stuff that Havran described. He claims it helps.

The difference in rendering time between Buddha@20 and Buddha@45 is 20-30%. My ray-prim test is fast, I guess: I'm tracing ray packets (a la Wald) and so I test four rays against a single triangle simultaneously using sse. As this requires that all rays hit the same primitive (otherwise the mono ray code kicks in) it's probably not helping very much for the Buddha, but it does help to cull empty space around the Buddha real quickly.

Share this post


Link to post
Share on other sites
Just a small idea that I came up with today:

Did anyone notice that a kd-tree doesn't neccessarily have to be axis aligned? Suppose you wanted to tilt the entire thing for some reason, this could be done by pre-transforming the input data. Obviously, rays cast at this tree would also need to be transformed (in my code, that's as good as free).

Suppose you pick the first split plane using an algorithm that finds the best plane for a regular BSP. The orientation of all subsequent planes could then be based on this plane. Could work pretty well. I think. Any thoughts? I don't believe Havran mentioned this technique btw. :)

- Jacco.

Share this post


Link to post
Share on other sites
That sounds like a very good idea. I think the best algorithm would probably be one that selects an orientation based on maximizing the empty volume that can be culled early. But thinking about it, I feel like that would be a very difficult problem (NP complete to solve optimally, I'm sure).

A quick and dirty try would be to choose a variety of orientations and minimize the sum of surface area of all prim AABBS. That would be very fast compared to tree compile time, I think, even to try 100 orientations. For just triangles, you could sum the area of the tri projected onto the three candidate plane (instead of its AABB), and minimize that.

It's funny to have a truly original idea...
I had thought I was clever by coming up with the idea to pack the axis identifier for a node split into the 2 lsb of the split value... Then I found the Wald paper on doing ray-tri with multiple rays at once, and he does the same thing! I was pissed :)

Share this post


Link to post
Share on other sites
If you have code to find a tight fitting OOBB, you could probably use that for the axes of the kd-tree, intuitively it seems like it would work well, and fitting an OOBB tightly is more or less a solved problem.

Share this post


Link to post
Share on other sites
I realize that it is going to be extremely hard to find an optimal tree if you add this extra variable, but that's no reason for not adding this extra variable. :)

Apparently one of the fixed properties of the kd-tree is not that fixed after all. And that's food for thought. :)

Share this post


Link to post
Share on other sites
One extra thought: The matrix that you would use to transform the rays and the input data set doesn't even have to be orthagonal... Suppose you could build a better tree by shearing the object. :) Now that would result in odd trees.

Share this post


Link to post
Share on other sites
Quote:
Original post by phantomus
One extra thought: The matrix that you would use to transform the rays and the input data set doesn't even have to be orthagonal... Suppose you could build a better tree by shearing the object. :) Now that would result in odd trees.

its certainly a nice thought, but if you assume unordered primitives, which isnt too weird an assumption for complex real-world meshes, its not going to give any benefit.

also, transforming the ray into treespace in order to archieve rotated trees is not a new idea, but a good one though.

Share this post


Link to post
Share on other sites

This topic is 4728 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this