Kd compile times

Started by
13 comments, last by Eelco 19 years, 3 months ago
What kind of compile times are you guys getting for kd-trees (per 1000 triangles)? I'm using N^2 SAH, and my trees are very good quality, but it's taking about 90 seconds just to build 10,000 triangles! An obvious attempt at improvement would be if there are more than (for example) 1000 triangles in a node, to just choose 1000 at random and select the best among those. I assume people have tried this approach? What results have you had? Theoretically, it improves your performance to O(N)... But if you have a million triangles to divide, your first few cuts may be poor (though not too bad, i think).
Advertisement
Not done anything with triangles, but I regularly build left-balanced kd-trees of single point nodes with over 200,000 points in under 2 seconds (for photon mapping).

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Boy, I had a stupid bug. I was testing ALL the triangles as possible split positions, not just the ones inside the node. I also sped up my SAH code... extremely fast. I doubt it could be faster.

And now I limit split plane test to a random 600 per node. This gives me 4 secs for 300,000 triangles... more what I was hoping for :)
Sounds like a much more reasonable number [smile]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Another one working on ray tracing? ;)

My timings: 10 minutes for the 1.1M buddha, with full prim-to-node clipping and consideration of all interesting split plane positions. SAH, also. The Stanford Bunny is subdivided in about 15 seconds (70k tris).
What is the order of your "interesting split plane positions" algorithm? Right now, I've settled on capping # split positions to 1200 per node (and really it's just 6 planes of 200 random prims).

It seems to work ok but I can't make a good qualitative comparison test because I haven't implemented RTA yet. With that big, fat hack I run at about 4.2 secs / 180k tris (ad hoc: 16, 8) and 6.3 secs at (20, 4) criteria.

I've sped up SAH as much as I can, but clearly an "interesting split plane" selection algorithm would be best. However, the random sample split algorithm does give O(N) characteristic: 50 sec for 1.2M (20,4) It would be hard to see that go :)

"Prim to Node Clipping" ... is that repositioning the prim AABB to just the portion that is in the node? Do you use that for intersection or just tree compile? I'm using indexed primitives, so I can't have different primitive characteristics per Node :(
I use the following algorithm:

When I wish to subdivide a voxel, I loop over the primitives (triangles, in my case) and clip them against the original voxel. This way I can also generate split plane candidates where the triangle leaves the voxel. Next, I store all these positions in a linked list, thereby sorting them from 'min' to 'max' (e.g., left-to-right for the x-axis) and removing duplicates. Then, I calculate for each candidate position how many triangles there are on the left side and on the right side of this candidate plane. Once that is done, I calculate the SAH score for that split plane position.

The above process is done for all three axii, and then the best score is used to determine the best split plane position and orientation.

So, no tricks, all positions are evaluated. If I subdivide up to a depth of 20, I am done in less than 5 minutes for the Happy Buddha (1089k triangles), but that gives me lots of primitives in the leafs, so I subdivide up to a depth of 45. This takes longer, but ray tracing is much faster after that (I can render the buddha at 400x600 in about 600ms, with one light source (point)).

- Jacco.
45!!! Holy crap. Mine starts to hit the hard drive at around 32, and then it's 'Game Over' :)

I intend to use it for collision detection, so a max_depth of 20 is not unreasonable. Which SAH equation do you use? I've just put in a late cutting-off-empties, and it improved my empty-space use dramatically with a very small compile-time cost... even with empty-culling several passes.

I'm curious: what's the difference in rendering times for Budda@20 vs Budda@45? And how expensive is your ray-prim test?
I have to admit that the Buddha @ tree depth = 45 does require a substantial amount of memory (it was 299Mb last week, but I brought it down slightly over the weekend, so it's 250 or so now I guess), but I had to do this in order to limit the number of primitives per leaf. I'm not sure if it's using the full allowed depth though; gotta check that.

I use the basic SA/SV formula for the SAH, no special things to favour empty space or whatever. I wasn't aware of a lot of different options here (although one guy suggested doubling the score for non-empy voxels with 20 prims or more to favour empty space). I still must try this cutting of stuff that Havran described. He claims it helps.

The difference in rendering time between Buddha@20 and Buddha@45 is 20-30%. My ray-prim test is fast, I guess: I'm tracing ray packets (a la Wald) and so I test four rays against a single triangle simultaneously using sse. As this requires that all rays hit the same primitive (otherwise the mono ray code kicks in) it's probably not helping very much for the Buddha, but it does help to cull empty space around the Buddha real quickly.
Just a small idea that I came up with today:

Did anyone notice that a kd-tree doesn't neccessarily have to be axis aligned? Suppose you wanted to tilt the entire thing for some reason, this could be done by pre-transforming the input data. Obviously, rays cast at this tree would also need to be transformed (in my code, that's as good as free).

Suppose you pick the first split plane using an algorithm that finds the best plane for a regular BSP. The orientation of all subsequent planes could then be based on this plane. Could work pretty well. I think. Any thoughts? I don't believe Havran mentioned this technique btw. :)

- Jacco.

This topic is closed to new replies.

Advertisement