Oh, I suppose I should also throw out there that if you have a smallish number of items a plain old linear search may be fastest thanks to cache effects.

Jumping around randomly through data structures is quite cache-unfriendly. There are some ways to minimize it if you repeatedly jump through the same patterns, but a direct linear search can have amazing cache friendliness. Access to a single cache line (64 bytes) is effectively free after the first use. That's 16 32-bit values for effectively the cost of 1. If you're reading along sequentially the prefetcher can see the addresses increment, and the cost for the next element is also nearly free.

Assuming you're on a modern processor, your data is packed well, and you can build your structure to fit at least 2 to a cache line, thanks to sequential cache prefetching you can probably scan at about a rate of about 4 per nanosecond. If your data is integers rather than floats you can probably scan around 12 per nanosecond thanks to the higher speed integer processing ports. In contrast, jumping in memory with a cache miss incurs a roughly 10 nanosecond per jump penalty if the data is already in L2 cache, about 100 nanoseconds per jump penalty if it is out another layer in memory, in addition to the time spent running through the instruction pipeline and doing the processing.

Since the relative cost of cache misses seems to increase at a faster rate every year, and processors are able to slurp up and execute multiple instructions (currently 4 decoded per cycle per thread) and microinstructions every cycle (currently 7 microinstruction execution ports, some operating at 3 microinstructions per cycle), binary searches and jump tables are slower than the simple linear search for larger and larger data sets.

If you've got under 1000 items the simple linear scan will be faster. Somewhere between 1,000 and 10,000 items the tradeoff starts to occur where hopping around becomes slightly faster, with the exact number depending on your machine and the data. It isn't until you reach larger collections that you know hopping around in memory will be more efficient.

sergamer1

822

August 28, 2016 07:36 AM

It seems to me there should be some math magic that can be done on the coordinates to arrange things properly in a data structure. Like maybe a doubly linked list, where the items to the left and right of any item are the ones close to it.

I thought about simply adding the X and Y coordinates together, and using that as some sort of hash for the location of items. So for eg an item at (40, 290) would hash to 330. But then I realized that (290, 40) would also hash to 330 even though they're nowhere near eachother. So I also tried thinking about multiplying the coordinates together, and other math operations but nothing seems to work.

The closest thing I'm aware off like this is to use a space-filling curve to order your objects in memory, like a Morton or Hilbert curve (https://en.wikipedia.org/wiki/Space-filling_curve). If the list of objects is really large and is a lot bigger than the cache, then iterating over the particles using such a curve is much more cache-efficient since your neighbouring objects are more LIKELY to be nearby in memory. I say more likely because it's impossible to have all objects nearby in memory (unless they were on a straight 1d line!). Occasionally, an object will be close in space but far away in memory but the average number of cache misses will definitely be less than random ordering. However, if the number of objects is actually quite small then the space-filling curve perhaps gives no real benefit (although ordering your data sensibly is not a bad idea). And you have to also factor in the cost of ordering the objects in memory as well (especially if you want to manage a dynamic list).

You can also combine the space-filling curve with a simple uniform 2d/3d grid search or with a tree. Then the maths of performing the search is similar, except you are searching your objects in memory in a more cache-friendly way I suppose. I know you said you want something simpler than a tree, but I think if you want to avoid direct iteration through all objects (if you have a large number) then you'll have to use something like this I suppose.

I didn't know much about R-trees until I read your message and looked them up, and there is also apparently a Hilbert-R tree (https://en.wikipedia.org/wiki/Hilbert_R-tree) which I guess already merges the two concepts. But there are so many different types of trees, and also different forms of space-filling curves, that I'm sure you won't be beholden to any one type of data structure!

Anyway, hope that helps even if it makes you realise what you want doesn't exist, lol :-)

Simple Data Structure for quick look-up of "closeness" on 2D grid?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Simple Data Structure for quick look-up of "closeness" on 2D grid?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines