Back to General and Gameplay Programming

Spatial Sorting in 3d

General and Gameplay Programming Programming Algorithm

Started by Angelic Ice July 23, 2017 01:18 AM

19 comments, last by Pronova159 6 years, 9 months ago

46,222

July 24, 2017 08:22 PM

Agreed about the 200 objects being small. If they're cache friendly and arranged linearly you could iterate over them all faster than you could jump around a tree with the corresponding cache misses.

For timing approximation, iterating over a cache-friendly array takes around 0.5ns per item since array traversals are prefetched to L1 cache. Jumping through a tree takes about 10ns (or more) per lookup as they're usually in L2 cache (or worse, L3 or main memory). A tree may have five, ten or more of those jumps depending on how it is implemented.

My search skills is failing right now so I'm not finding the links, but I believe it was the DICE Battlefield 2/3 team who basically abandoned the traditional spatial trees in favor of cache friendly objects in flat arrays because they were faster for their work.

JoeJ

4,183

July 25, 2017 07:39 AM

You can balance this:

Use larger branching factor (binary tree is bad, tree with 8, 27 or 64 children is good - if they are in linear memory order so cache friendly).

Build less levels and store more objects per leaf, making a linear copy of those objects.

My personal experience is that trees can always become a win for performance, but first you need to make sure you actually have a performance problem (win for 200 vs. 200 - but still not worth the effort, loss for 2 vs. 200)

Angelic Ice

929

Author

July 25, 2017 09:05 PM

Oh, this slightly confuses me:

On 24.7.2017 at 10:22 PM, frob said:

For timing approximation, iterating over a cache-friendly array takes around 0.5ns per item since array traversals are prefetched to L1 cache. Jumping through a tree takes about 10ns (or more) per lookup as they're usually in L2 cache (or worse, L3 or main memory).

Because I did some research about BVHs and people refer to them as trees, but you talk about array-based systems. Are BVHs internally functioning via arrays? Probably an implementation-detail differing between libraries?

On another note, are there any publicly popular C++ BVHs? I found some, but they either lack documentation or are quite young and untested.

JoeJ

4,183

July 25, 2017 09:45 PM

What we mean with "cache friendly linear array" is simply brute force e.g.:


	BoundingBox myBoxes[200];
	for (int i=0; i< objectCount; i++)
	for (int j=0; i+1< objectCount; j++)
	if (myBoxes.Overlaps(myBoxes[j]) ) potentialPairs[pairCount++] = MakePair (i,j);

The inner loop reads memory perfectly cache efficient. A tree can do tis only for the small number of its children (and the final objects in the leafs). After that you need to read memory from a random location to follow the children -> potential cache miss.

So you should try brute force first and see if it's too slow. Did you?

Angelic Ice

929

Author

July 25, 2017 10:17 PM

33 minutes ago, JoeJ said:

What we mean with "cache friendly linear array" is simply brute force e.g.:

Ah, assumed something into that direction, thanks!

33 minutes ago, JoeJ said:

So you should try brute force first and see if it's too slow. Did you?

Yes and it is too slow on weak devices (e.g. smartphones), especially since I need a rather scaling implementation, as the object-count is not predictable (200 objects was just the bare minimum).

Pronova159

July 26, 2017 04:27 AM

I think you're looking for loose-leaf octrees/quadtrees. They're commonly used for dynamic objects and in collision systems. The idea is that as objects move, reinserting the object is costly, so you try to reduce the amount of reinsertions. Standard octree implementations subdivide a grid into equal but fixed-sized cubes. A loose-leaf tree would expand the AABB by a certain amount when doing the intersection tests, relaxing it to reduce the amount of reinsertions that take place during updates. Objects that travel shorter distances shouldn't need to reinsert.

Bear in mind that this is not the same thing as just increasing the scale of the octree's overall bounds. This loosened tree is at the same scale but insertions are determined mostly by the position of the object and it's depth is determined by the size of the object. This does cause the object's bounds to overlap multiple cells resulting in more queries but realistically this is offset by the higher quality tree that it produces. Loose-leaf trees have a tendency to pack objects in more efficiently separating clusters not only by position but by shape. Larger objects appearing closer to root and smaller objects towards the leaves. In contrast, normal octrees can have smaller objects near the root if their bounds intersect multiple cells.

The amount you need to relax the AABB has a significant affect to the quality of the tree an ultimately the performance of updates and queries. A larger amount would result in less reinsertions but have a less optimal tree fill which would slow down queries.

Hope this helps. I am by no means an expert but this is something I did before and it really improved the performance of my scene which had 5,000+ dynamic objects.

JoeJ

4,183

July 26, 2017 06:31 AM

2 hours ago, Pronova159 said:

I think you're looking for loose-leaf octrees/quadtrees. They're commonly used for dynamic objects and in collision systems. The idea is that as objects move, reinserting the object is costly, so you try to reduce the amount of reinsertions. Standard octree implementations subdivide a grid into equal but fixed-sized cubes. A loose-leaf tree would expand the AABB by a certain amount when doing the intersection tests, relaxing it to reduce the amount of reinsertions that take place during updates. Objects that travel shorter distances shouldn't need to reinsert.

Bear in mind that this is not the same thing as just increasing the scale of the octree's overall bounds. This loosened tree is at the same scale but insertions are determined mostly by the position of the object and it's depth is determined by the size of the object. This does cause the object's bounds to overlap multiple cells resulting in more queries but realistically this is offset by the higher quality tree that it produces. Loose-leaf trees have a tendency to pack objects in more efficiently separating clusters not only by position but by shape. Larger objects appearing closer to root and smaller objects towards the leaves. In contrast, normal octrees can have smaller objects near the root if their bounds intersect multiple cells.

The amount you need to relax the AABB has a significant affect to the quality of the tree an ultimately the performance of updates and queries. A larger amount would result in less reinsertions but have a less optimal tree fill which would slow down queries.

Hope this helps. I am by no means an expert but this is something I did before and it really improved the performance of my scene which had 5,000+ dynamic objects.

If you are right then my understanding of the term 'loose octree' is wrong (and i've never been sure of that).

I'll point this out, because i think i have a better solution for this (also i don't think i'm not the first one using it, but no tutorial covers this, so...)

There are multiple ways to handle objects intersecting the grid:

1. Put them in the lowest (internal or leaf) node that does not intersect (e.g. https://www.gamedev.net/articles/programming/general-and-gameplay-programming/introduction-to-octrees-r3529/ ) Problem: Intersecting objects cause much more tests than necessary - an object at world center collides with every object in the scene - unsteady performance between frames - very bad.

2. Put intersecting objects in multiple nodes. Requires to store a list of pointers to objects per node instead a single pointer, can't sort objects to tree without duplication, memory requirements unpredictable - still bad.

3. Put objects into internal and leaf nodes depending on their size. Objects are allowed to extend the grid by half cell size in every direction. This solves all above problems at the cost of a small increase of tests. (This is what i called 'loose octree', but i came up with it myself so let me know the proper name for it). It allows more movement as well but defines the rules exactly and more importantly: It allows to get the bounds from the grid, so no need to store a bounding box per node.

How does this relate to BVH?

BVH avoids those problems and offers the flexibility from ground up. It's only worth to mention that it makes sense to store large objects in interior nodes as well. If we store all objects just in the leafs, a large objects prevent small bounds, so small objects nearby collide with every other small object near the large one. If all your objects have similar size, it's better to put them all in the leafs.

(Decisions like that make it hard to give a good genaral example, so you find lots of research about details, but hardly a good introduction that fits your specific need. But a BVH node has just a bounding volume and some children - it's simple. You should not need a tutorial or a library to make it)

Personally i've implemented octree with update and neighbour pointers for collision detection. It worked well at that time but it's the hardest route to go and not very cache friendly. I've implemented BVH for rendering purposes only but i'd pick it for collision detection as well nowadays.

Pronova159

July 26, 2017 07:04 AM

12 minutes ago, JoeJ said:

3. Put objects into internal and leaf nodes depending on their size. Objects are allowed to extend the grid by half cell size in every direction. This solves all above problems at the cost of a small increase of tests. (This is what i called 'loose octree', but i came up with it myself so let me know the proper name for it). It allows more movement as well but defines the rules exactly and more importantly: It allows to get the bounds from the grid, so no need to store a bounding box per node.

You're not wrong. That is what a loose tree is and an optimal implementation would just use a grid to calculate the position in the tree while avoiding the bounding box. The phenomenon I discovered when using this was a distribution of objects in the tree that was better than a normal octree. There was an article about it as well but I can't seem to find it.

As far as it compares to a BVH, I really don't know the answer to that. I always found octrees simpler to setup while a BVH requires checking for nearby objects to cluster together. I don't disagree with you at all, I just don't have the kind of data to back up my use of loose octrees over a BVH.

Awesome feedback btw. I really appreciate your passion about this. I'm new to this site and your response is already making me love this community.

JoeJ

4,183

July 26, 2017 07:29 AM

27 minutes ago, Pronova159 said:

I always found octrees simpler to setup while a BVH requires checking for nearby objects to cluster together.

How do you mean this? Clustering is usually used for bottom up building processes - this gives better trees but is too slow for games, we always use top down build. The building process is basically the same as for octree, you just (can) use the bounding volume center instead grid cells to get the split point. Also, if you use octree with unconstrained bounding boxes as you describe, then you already use BVH with maximum children of 8 (Nothing wrong with that - that's exactly what i suggested.)

But to make clear: I don't wanna sound persuading with something. I can neither say BVH is better than octree nor rebuild is better than update.

Pronova159

July 26, 2017 07:33 AM

Ah yes, I've been building BVH trees the agglomerative way all this time. I had not adapted that approach for real-time but it sounds like I could very easily do it considering I've been doing something similar as you mentioned. Thanks for clarifying.

Spatial Sorting in 3d

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Spatial Sorting in 3d

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines