nearest neighbour 1-bit bitmap

Started by
17 comments, last by AndersOredsson 13 years ago

alvaro: Yes, but I have no idea of why the popcount return matching bits. I notice that you don't have a loop, so ill check that out :-) Thanks!


You can find similar code with comments here. It should be easy for you to try it out. I suspect it will be significantly faster.
Advertisement
Amazing.

Optimization #1: use alvaros popcount tok me from benchmark 2480ms to 1138ms. rolling out the 8 step loop took me down to 1086ms, thats 43.8% of the original time! I guess those while-branches really killed my performance :-) (Even if I'm still at O(N) ) BTW; I also checked that the results are exactly the same.

Now lets get on with the space partitioning!





...I suspect it will be significantly faster.


About twice the speed :-)
For nearest neighbor search, a k-d tree or BSP is not likely to be a good idea in 256 dimensions. The reason for this is that the there is a lower bound on the query time for any range searching data structure with linear storage (ie all BSP type data strucutres) of Omega(n^((d-1)/d)). In this case, as d grows larger, this quickly approaches linear cost, which basically means that you will be back to your ordinary brute force in terms of performance. A better option for intermediate/small dimensions is to use an r-tree, which requires more space but guarantees polylog range queries. However, the space storage here is exponentially large in d, and so it would again be a terrible idea.

As a result, I would be inclined to recommend that you forgo using a spatial indexing tree and instead default to using a locality sensitive hash instead. The behavior of locality sensitive hashes for hamming distance computations are now quite well understood and should get you pretty good results. Also they are much easier to implement, and tend to behave nicely with cache performance. Here is a link from wikipedia: (scroll to the bottom to see how to use it to compute nearest neighbors)

http://en.wikipedia....amming_Distance
robotichrist: Thanks, I'll check that out :-) I have already implemented the BSP stuff (running it right now :-) ), so we will know in a really short time what the result is :-)
Hmm... BSP does not work :-) I clock in on 134% used time compared to normal brute force run, and thats before I use my distance-map-stuff, so I guess it runs in about 150% compared to that. I know that my BSP isnt super-optimized, but I was hoping to get better results; not to much to optimize in the BSP code.
I don't know if you are still working on this problem, but you may want to take a look at this paper:

Gionis, Indyk and Motwani, "Similarity Search in Higher Dimensions via Hashing", Proc. of VLDB (1999) PDF

As far as I know, it is the current best solution to the hamming distance nearest neighbor problem.

For nearest neighbor search, a k-d tree or BSP is not likely to be a good idea in 256 dimensions. The reason for this is that the there is a lower bound on the query time for any range searching data structure with linear storage (ie all BSP type data strucutres) of O(n^((d-1)/d)). In this case, as d grows larger, this quickly approaches linear cost, which basically means that you will be back to your ordinary brute force in terms of performance.


Wow, that's not obvious. You must be referring to this. I think my intuition is going to be having screaming matches with that paper for a little while, until I trust its assumptions.
robotichrist: Thanks! Ill check it out :-) I have been working on the problem for half a year, so no reason to stop now! (not just the nearest neighbor stuff, but the whole system)

Emergent: I agree; it _SHOULD_ work :-)

This topic is closed to new replies.

Advertisement