nearest neighbour 1-bit bitmap

Math and Physics Programming

Started by AndersOredsson April 12, 2011 09:16 AM

17 comments, last by AndersOredsson 13 years ago

21,604

April 13, 2011 08:43 PM

alvaro: Yes, but I have no idea of why the popcount return matching bits. I notice that you don't have a loop, so ill check that out :-) Thanks!

You can find similar code with comments here. It should be easy for you to try it out. I suspect it will be significantly faster.

AndersOredsson

100

Author

April 13, 2011 09:04 PM

Amazing.

Optimization #1: use alvaros popcount tok me from benchmark 2480ms to 1138ms. rolling out the 8 step loop took me down to 1086ms, thats 43.8% of the original time! I guess those while-branches really killed my performance :-) (Even if I'm still at O(N) ) BTW; I also checked that the results are exactly the same.

Now lets get on with the space partitioning!

AndersOredsson

100

Author

April 13, 2011 09:08 PM

...I suspect it will be significantly faster.

About twice the speed :-)

robotichrist

193

April 13, 2011 09:32 PM

For nearest neighbor search, a k-d tree or BSP is not likely to be a good idea in 256 dimensions. The reason for this is that the there is a lower bound on the query time for any range searching data structure with linear storage (ie all BSP type data strucutres) of Omega(n^((d-1)/d)). In this case, as d grows larger, this quickly approaches linear cost, which basically means that you will be back to your ordinary brute force in terms of performance. A better option for intermediate/small dimensions is to use an r-tree, which requires more space but guarantees polylog range queries. However, the space storage here is exponentially large in d, and so it would again be a terrible idea.

As a result, I would be inclined to recommend that you forgo using a spatial indexing tree and instead default to using a locality sensitive hash instead. The behavior of locality sensitive hashes for hamming distance computations are now quite well understood and should get you pretty good results. Also they are much easier to implement, and tend to behave nicely with cache performance. Here is a link from wikipedia: (scroll to the bottom to see how to use it to compute nearest neighbors)

http://en.wikipedia....amming_Distance

AndersOredsson

100

Author

April 13, 2011 09:59 PM

robotichrist: Thanks, I'll check that out :-) I have already implemented the BSP stuff (running it right now :-) ), so we will know in a really short time what the result is :-)

AndersOredsson

100

Author

April 14, 2011 06:05 AM

Hmm... BSP does not work :-) I clock in on 134% used time compared to normal brute force run, and thats before I use my distance-map-stuff, so I guess it runs in about 150% compared to that. I know that my BSP isnt super-optimized, but I was hoping to get better results; not to much to optimize in the BSP code.

robotichrist

193

April 14, 2011 08:15 PM

I don't know if you are still working on this problem, but you may want to take a look at this paper:

Gionis, Indyk and Motwani, "Similarity Search in Higher Dimensions via Hashing", Proc. of VLDB (1999) PDF

As far as I know, it is the current best solution to the hamming distance nearest neighbor problem.

Emergent

982

April 14, 2011 09:21 PM

For nearest neighbor search, a k-d tree or BSP is not likely to be a good idea in 256 dimensions. The reason for this is that the there is a lower bound on the query time for any range searching data structure with linear storage (ie all BSP type data strucutres) of O(n^((d-1)/d)). In this case, as d grows larger, this quickly approaches linear cost, which basically means that you will be back to your ordinary brute force in terms of performance.

Wow, that's not obvious. You must be referring to this. I think my intuition is going to be having screaming matches with that paper for a little while, until I trust its assumptions.

AndersOredsson

100

Author

April 16, 2011 07:35 PM

robotichrist: Thanks! Ill check it out :-) I have been working on the problem for half a year, so no reason to stop now! (not just the nearest neighbor stuff, but the whole system)

Emergent: I agree; it _SHOULD_ work :-)

nearest neighbour 1-bit bitmap

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

nearest neighbour 1-bit bitmap

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines