• Create Account

### #Actualcr88192

Posted 18 February 2013 - 11:02 PM

Polarist, on 18 Feb 2013 - 19:22, said:

cr88192, on 15 Feb 2013 - 19:07, said:
the main difference is that an oct-tree divides the space into 8 regions in each step, so requires less recursion, but has to do a few more checks at each step (since there are 8 possible regions to step into). dividing the space in a binary-fashion allows simpler logic, but a larger number of internal nodes.

That's not necessarily true, you can implement an "oct-tree" with a b-tree under the hood; your recursion step should cycle over each axis. Thus the number of recursive steps would be similar in complexity to a BSP-tree.

But I think my question was poorly worded. Simply put, I'm curious about if it's worth it to "choose" a splitting plane versus naively cutting it down some midpoint in the dynamic real-time situation you were describing. The obvious difference is that choosing the latter would be O(1), and the former would be (presumably) at least O(N). That of course would pay off as searching would be faster. If the tree were generated before-hand and kept static, then of course it'd be better to load the cost up front, but in the case of a real-time dynamically generated tree, I'm wondering if, generally, such an approach is still worth it.

It's a bit of a subjective question and it begs more for intuition/experience than a purely academic response, I suppose.

ok.

well, it depends on whether or not the oct-tree has a calculated mid-point (as opposed to simply dividing into 8 equal-sized regions).

if it does, then both methods will involve a similar cost (a loop over all the items to find the midpoint), though with a potential difference that an oct-tree doesn't have to (also) calculate a distribution vector.

interestingly, regardless of how exactly it is done, the total complexity would remain the same: O(n log n).

this is because (unlike a traditional BSP), my approach doesn't "choose" the plane, it calculates it.
basically, all you really need is an averaged center-point, and a vector describing how the "mass" is distributed relative to the point.

so, pseudocode:

point=vec3(0,0,0); count=0; cur=list;
while(cur)
{
point=point+cur->origin;
count++;
cur=cur->next;
}
point=point/count;

cur=list;
dirx=vec3(0,0,0);
diry=vec3(0,0,0);
dirz=vec3(0,0,0);
while(cur)
{
dir=cur->origin-point;
dirx=dirx+dir*dir.x;
diry=diry+dir*dir.y;
dirz=dirz+dir*dir.z;
cur=cur->next;
}
dir=v3norm(v3max(dirx, v3max(diry, dirz)));  //normalized greatest-vector
plane=vec4(dir, v3dot(point, dir));

left=NULL; right=NULL; mid=NULL; cur=list;
while(cur)
{
f=v3ndot(cur->origin, plane);
{cur->chain=mid; mid=cur; }
if(f<0)
{cur->chain=left; left=cur; }
else
{cur->chain=right; right=cur; }
cur=cur->next;
}
...


Quote

Satharis, on 16 Feb 2013 - 06:07, said:
It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

I'm wondering if there's a subtle point here that you're also describing. If you're just talking about memory allocations, then of course it's almost never an issue, but isn't memory bandwidth is a large bottleneck for speed these days? I don't work in the games industry, so I'm not aware of the current state, but isn't it a challenge to maintain cache coherency in collision detection systems, too? Or is cache coherency kind of a solved issue at this point?

personally, I haven't usually found cache to be a huge issue on recent PC hardware.

even then, it isn't usually as much of an issue at present, as memory bandwidth has increased considerably over the past several years (relative to CPU speed increases), making the limit harder to run into (currently typically only really happens during bulk memory-copies and similar AFAICT, rather than in general-purpose code).

it was much worse of a problem 10 years ago though.

ATM, branch-prediction-failures seem to have become a much bigger issue (manking conditionals very costly in some algorithms where the ability of the CPU to accurately predict branches is fairly low).

a particular example in my case was devising a branch-free version of the Paeth filter, mostly as the conditionals inside the filter were eating lots of clock-cycles.

### #1cr88192

Posted 18 February 2013 - 10:59 PM

Polarist, on 18 Feb 2013 - 19:22, said:

cr88192, on 15 Feb 2013 - 19:07, said:
the main difference is that an oct-tree divides the space into 8 regions in each step, so requires less recursion, but has to do a few more checks at each step (since there are 8 possible regions to step into). dividing the space in a binary-fashion allows simpler logic, but a larger number of internal nodes.

That's not necessarily true, you can implement an "oct-tree" with a b-tree under the hood; your recursion step should cycle over each axis. Thus the number of recursive steps would be similar in complexity to a BSP-tree.

But I think my question was poorly worded. Simply put, I'm curious about if it's worth it to "choose" a splitting plane versus naively cutting it down some midpoint in the dynamic real-time situation you were describing. The obvious difference is that choosing the latter would be O(1), and the former would be (presumably) at least O(N). That of course would pay off as searching would be faster. If the tree were generated before-hand and kept static, then of course it'd be better to load the cost up front, but in the case of a real-time dynamically generated tree, I'm wondering if, generally, such an approach is still worth it.

It's a bit of a subjective question and it begs more for intuition/experience than a purely academic response, I suppose.

ok.

well, it depends on whether or not the oct-tree has a calculated mid-point (as opposed to simply dividing into 8 equal-sized regions).

if it does, then both methods will involve a similar cost (a loop over all the items to find the midpoint), though with a potential difference that an oct-tree doesn't have to (also) calculate a distribution vector.

interestingly, regardless of how exactly it is done, the total complexity would remain the same: O(n log n).

this is because (unlike a traditional BSP), my approach doesn't "choose" the plane, it calculates it.
basically, all you really need is an averaged center-point, and a vector describing how the "mass" is distributed relative to the point.

so, pseudocode:
point=vec3(0,0,0); count=0; cur=list;
while(cur)
{
point=point+cur-&gt;origin;
count++;
cur=cur-&gt;next;
}
point=point/count;

cur=list;
dirx=vec3(0,0,0);
diry=vec3(0,0,0);
dirz=vec3(0,0,0);
while(cur)
{
dir=cur-&gt;origin-point;
dirx=dirx+dir*dir.x;
diry=diry+dir*dir.y;
dirz=dirz+dir*dir.z;
cur=cur-&gt;next;
}
dir=v3norm(v3max(dirx, v3max(diry, dirz)));  //normalized greatest-vector
plane=vec4(dir, v3dot(point, dir));

left=NULL; right=NULL; mid=NULL; cur=list;
while(cur)
{
f=v3ndot(cur-&gt;origin, plane);
{cur-&gt;chain=mid; mid=cur; }
if(f&lt;0)
{cur-&gt;chain=left; left=cur; }
else
{cur-&gt;chain=right; right=cur; }
cur=cur-&gt;next;
}
...


Quote

Satharis, on 16 Feb 2013 - 06:07, said:
It's not about memory it's about how much CPU time doing a lot of ray casting takes really.

I'm wondering if there's a subtle point here that you're also describing. If you're just talking about memory allocations, then of course it's almost never an issue, but isn't memory bandwidth is a large bottleneck for speed these days? I don't work in the games industry, so I'm not aware of the current state, but isn't it a challenge to maintain cache coherency in collision detection systems, too? Or is cache coherency kind of a solved issue at this point?

personally, I haven't usually found cache to be a huge issue on recent PC hardware.

even then, it isn't usually as much of an issue at present, as memory bandwidth has increased considerably over the past several years (relative to CPU speed increases), making the limit harder to run into (currently typically only really happens during bulk memory-copies and similar AFAICT, rather than in general-purpose code).

it was much worse of a problem 10 years ago though.

ATM, branch-prediction-failures seem to have become a much bigger issue (manking conditionals very costly in some algorithms where the ability of the CPU to accurately predict branches is fairly low).

a particular example in my case was devising a branch-free version of the Paeth filter, mostly as the conditionals inside the filter were eating lots of clock-cycles.

PARTNERS