Sign in to follow this  

Simple Data Structure for quick look-up of "closeness" on 2D grid?

This topic is 473 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm familiar with BSP trees, quad trees, octrees and other variations of space partitioning data structures typically used in games. Currently trying to wrap my head around R-trees.

 

But something keeps bugging me, I think these data structures are way overkill for what I'm trying to achieve. I really think there's a much simpler way for me but I cant quite figure out the math. 

 

I have a big grid of objects, and for any one of the objects, I want a quick way to find the other objects that are "close" to it, without having to loop through every object and check it's coordinates. "Close" will be some radius that I'll choose based on the viewport of the game or something.

 

It seems to me there should be some math magic that can be done on the coordinates to arrange things properly in a data structure. Like maybe a doubly linked list, where the items to the left and right of any item are the ones close to it.

 

I thought about simply adding the X and Y coordinates together, and using that as some sort of hash for the location of items. So for eg an item at (40, 290) would hash to 330. But then I realized that (290, 40) would also hash to 330 even though they're nowhere near eachother. So I also tried thinking about multiplying the coordinates together, and other math operations but nothing seems to work.

 

Then I started thinking about storing each item in 2 lists. One list sorted by the X coordinate, and the other sorted by the Y coordinate. Then given any item, you can find the close items by searching the first list for X coordinates that are close, and the second list for Y coordinates that are close, and the intersection of those 2 results are the close items. You would use a binary search on the lists of course, instead of looping through each item. This method seems promising in my head, but I dont know if this is slower or faster than just using a standard quad-tree type data structure. 

 

I hope this makes sense. Basically I'm hoping the people who have already thought about all of this, can tell me whether I'm wasting my time and maybe I should just use a tried-and-true quad-tree.

 

 

 

 

Share this post


Link to post
Share on other sites

Then I started thinking about storing each item in 2 lists. One list sorted by the X coordinate, and the other sorted by the Y coordinate. Then given any item, you can find the close items by searching the first list for X coordinates that are close, and the second list for Y coordinates that are close, and the intersection of those 2 results are the close items. You would use a binary search on the lists of course, instead of looping through each item. This method seems promising in my head, but I dont know if this is slower or faster than just using a standard quad-tree type data structure.


This is basically how "sweep and prune" works. I don't know how it compares to a quad-tree, but I'm guessing someone has done performance comparisons on them and has benchmarks/suggestions. It seems like the kind of thing a research paper would love to cover in depth.

Share this post


Link to post
Share on other sites

R-trees are great for some situations, but games don't usually have the situations they are best at.

 

 

An R-tree is great for arbitrarily-sized rectangles found in clusters. So for a city map you might have a bunch of rectangles for the city blocks, then nested inside each city block are rectangles indicating buildings, inside those rectangles are businesses. This makes it easy to find nearby items based on their rectangles, but the data sizes are irregular. One search may be very deep in the tree, another search may be shallow. The individual rectangle trees need to be clustered and built into a hierarchy which can take processing time.  The irregular shape can be good or bad depending on the density of objects. Being irregular you might get better clustering and more efficient encoding of dead space, but you also get worse in other situations. Also being irregular the processing needs may vary wildly.

 

 

Games are more likely to use a loose quadtree. The data is regular (with all the associated pros and cons) and it is automatically built into a full hierarchy based on position (rather than needing to sweep to find nearby nodes).

Share this post


Link to post
Share on other sites
I thought about simply adding the X and Y coordinates together, and using that as some sort of hash for the location of items. So for eg an item at (40, 290) would hash to 330. But then I realized that (290, 40) would also hash to 330 even though they're nowhere near eachother. So I also tried thinking about multiplying the coordinates together, and other math operations but nothing seems to work.

I don't think there is a math trick here.

You have a 2D problem, and are trying to mamgle it into a 1D problem. In the latter space, you would need to search in 4 directions due to symmetry between X and Y, while you only have 2 directions available.

 

Simplest solution here is probably a 2D grid with cells to store the objects. Finding neighbours is easy then. Quad tree is a generalization of that where you don't have loads of empty cells, or very full ones.

 

Another solution I have come across is 2 sorted lists, one on X coordinate, and one on Y coordinate. From an object find its neighbours in X direction (by walking both directios in the X list), and its neighbours in Y direction (by using the Y list), Discard those which are not neighbour in both directions. Done.

 

Perhaps even simpler is to sort on X only (or Y only), then walk the list as before in both directions from the object, but check the Y (or X) from the neighbour object itself. You will find some false positives, but depending on the spread in the direction of the list, this should often be less bad than all objects (the worst case).

 

 

Edit: After pressing "Post" I came up with another one, which works only if you have integer coordinates (at least in one direction). You can sort on x*width + Y, the standard solution to map a 2D coordinate onto a 1D array. Searching in Y increments or decrements by 1, searching in X increments or decrements by 'width'.

Edited by Alberth

Share this post


Link to post
Share on other sites

Make a coarse grid, with squares of width equal to the radius of interest. For each object X, you only ever need to examine the grid square that includes the object or the 8 that border on it. Done.

 

Theoretically there are weird and wonderful algorithms that would allow you change this 2D problem into a 1D problem, involving space-filling curves, but I can almost guarantee you that the cost of creating and maintaining the structure will far exceed any benefit you get from it.

Share this post


Link to post
Share on other sites

there is no mathmagic, what you have is a problem of sets, which is discrete math. In other words a data structure. Google hierarchical spatial hashing. This can limit how many objects you search for based on distance.

Share this post


Link to post
Share on other sites

While I agree with Alberth's ideas, I also want to bring up, have you profiled?  How many objects are we talking about?  How often are you doing the search?  Can you brute force it and forget about it?  Could be a case of overengineering.  (Might not be, but I think it's important to think about before going hog wild on a system when you have maybe 100 objects and/or are only doing this search once every sixty frames.)

Share this post


Link to post
Share on other sites

Lots of different ways to go about this --

 

If for each space on the grid, there's a list of objects that are inside/overlap that space, then you have a problem that's akin to rasterizing a circle -- grid spaces that are entirely inside the radius (that is, the distance to their furthest corner is <= radius) then all objects inside that space are also inside the radius. Likewise, grid spaces that are entirely outside the radius (that is, the distance to their nearest corner is > radius) then all objects inside that space are also outside the radius. This leaves grid spaces which the outline of the circle overlap it (that is, the radius falls between the nearest and furthest corners) -- only for those spaces do you need to check the distance per-object.

 

If there is no such list, then the simplest thing to do is to do a course pass based on X and Y coordinates alone -- conceptually, you draw a square that exactly encloses your radius, and anything outside the box can't be inside the radius. You loop over all objects looking at their X coordinate, then over only the Y coordinates of those that passed the X test, and finally do a full distance test on those that passed both. This can be further optimized in lots of ways -- there's a smaller square inside the radius for which objects inside trivially are inside the radius (its the largest square that fits entirely inside the radius. If your object layout is predominately aligned to one of the axis, test that axis first to eliminate the most candidates from the subsequent test. If you can, sort your object list by the coordinate of the first check.

 

Finally, keep in mind that you don't need a true distance for a simple radius check, so you can avoid the per-object square root in any case. 

Edited by Ravyne

Share this post


Link to post
Share on other sites

Oh, I suppose I should also throw out there that if you have a smallish number of items a plain old linear search may be fastest thanks to cache effects.  

 

 

Jumping around randomly through data structures is quite cache-unfriendly.  There are some ways to minimize it if you repeatedly jump through the same patterns, but a direct linear search can have amazing cache friendliness.  Access to a single cache line (64 bytes) is effectively free after the first use.  That's 16 32-bit values for effectively the cost of 1.   If you're reading along sequentially the prefetcher can see the addresses increment, and the cost for the next element is also nearly free.

 

Assuming you're on a modern processor, your data is packed well, and you can build your structure to fit at least 2 to a cache line, thanks to sequential cache prefetching you can probably scan at about a rate of about 4 per nanosecond. If your data is integers rather than floats you can probably scan around 12 per nanosecond thanks to the higher speed integer processing ports.  In contrast, jumping in memory with a cache miss incurs a roughly 10 nanosecond per jump penalty if the data is already in L2 cache, about 100 nanoseconds per jump penalty if it is out another layer in memory, in addition to the time spent running through the instruction pipeline and doing the processing.  

 

Since the relative cost of cache misses seems to increase at a faster rate every year, and processors are able to slurp up and execute multiple instructions (currently 4 decoded per cycle per thread) and microinstructions every cycle (currently 7 microinstruction execution ports, some operating at 3 microinstructions per cycle), binary searches and jump tables are slower than the simple linear search for larger and larger data sets.

 

 

 

 

If you've got under 1000 items the simple linear scan will be faster.  Somewhere between 1,000 and 10,000 items the tradeoff starts to occur where hopping around becomes slightly faster, with the exact number depending on your machine and the data. It isn't until you reach larger collections that you know hopping around in memory will be more efficient.

Share this post


Link to post
Share on other sites

It seems to me there should be some math magic that can be done on the coordinates to arrange things properly in a data structure. Like maybe a doubly linked list, where the items to the left and right of any item are the ones close to it.

 

I thought about simply adding the X and Y coordinates together, and using that as some sort of hash for the location of items. So for eg an item at (40, 290) would hash to 330. But then I realized that (290, 40) would also hash to 330 even though they're nowhere near eachother. So I also tried thinking about multiplying the coordinates together, and other math operations but nothing seems to work.

 

The closest thing I'm aware off like this is to use a space-filling curve to order your objects in memory, like a Morton or Hilbert curve (https://en.wikipedia.org/wiki/Space-filling_curve).  If the list of objects is really large and is a lot bigger than the cache, then iterating over the particles using such a curve is much more cache-efficient since your neighbouring objects are more LIKELY to be nearby in memory.  I say more likely because it's impossible to have all objects nearby in memory (unless they were on a straight 1d line!).  Occasionally, an object will be close in space but far away in memory but the average number of cache misses will definitely be less than random ordering.  However, if the number of objects is actually quite small then the space-filling curve perhaps gives no real benefit (although ordering your data sensibly is not a bad idea).  And you have to also factor in the cost of ordering the objects in memory as well (especially if you want to manage a dynamic list).

 

You can also combine the space-filling curve with a simple uniform 2d/3d grid search or with a tree.  Then the maths of performing the search is similar, except you are searching your objects in memory in a more cache-friendly way I suppose.  I know you said you want something simpler than a tree, but I think if you want to avoid direct iteration through all objects (if you have a large number) then you'll have to use something like this I suppose.

 

I didn't know much about R-trees until I read your message and looked them up, and there is also apparently a Hilbert-R tree (https://en.wikipedia.org/wiki/Hilbert_R-tree) which I guess already merges the two concepts.  But there are so many different types of trees, and also different forms of space-filling curves, that I'm sure you won't be beholden to any one type of data structure!

 

Anyway, hope that helps even if it makes you realise what you want doesn't exist, lol  :-)

Share this post


Link to post
Share on other sites

This topic is 473 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this