**0**

# nearest neighbour 1-bit bitmap

###
#1
Members - Reputation: **100**

Posted 12 April 2011 - 03:16 AM

Question:

I'm doing a nearest neighbor search with a couple of thousand 16x16 1-bit black and white images. I pack the 256 bits into 32 bytes, and use a superfast bit-comparator to count the number of matching bits, and that way i find the closest match.

Without any optimization I can do a search with 2000 instances within the performance requirements.

My first optimization was to create a distance map from every instance to each other, a NxN array. While searching trough the set, i keep track of the closest instance found until now, and then for the rest of the instances I can check in the distance map if i possibly can be closer to that instance, before actually checking the real distance. This makes the search run in about 66% of the time.

I have also tried a lot of other approaches, but with no success.

The problem is that any k-tree or any stuff like that does not work, since its basically 256 dimensions with only two position each; no way to "split" the dimension.

So, any idea on how to do this hierarchal? I have the fealing that i should be able to do this in about 10% of a linear search, and i have all the preprocessing time in the world :-)

###
#3
Crossbones+ - Reputation: **19059**

Posted 12 April 2011 - 06:40 AM

Do you think this might work with your images?

###
#4
Members - Reputation: **980**

Posted 12 April 2011 - 09:45 AM

Considering general BSP trees: A reasonable algorithm for building these would be to recursively split each data set with a plane through its mean with normal along its principal axis. A yet-better one would be to use k-means with k=2 to determine your splitting planes. If you look at the independent component analysis (ICA) literature, you might also find other ideas to consider.

Whatever splitting method you choose, for a general BSP-tree approach, you'd then at runtime be doing roughly O(log(N)) inner products (each of which is O(n^2)) (if you have N images, each of which is nxn), so you'd have O(n^2 log(N)) complexity, vs. the O(n^2 N) complexity you have now. That said, you'd be doing multiplications instead of bitwise comparisons, so the constants would be worse; there might be a point in the tree beyond which it's cheaper to brute-force.

Anyway I'm speculating at this point. Have you looked at e.g. this?

Ah hah! YES. Check out p. 52 of this pdf. I like this. Here, halfspaces are encoded by pairs of points (called "pivot pairs"): If you're closer to the first point, you're in the first halfspace; if you're closer to the second, you're in the second halfspace. What's nice about this is that (1) it allows you to stick with your fast bitwise stuff, without requiring more general arithmetic (no MULs!), and (2) there are only finitely-many planes to consider. Let's see... well, without even coming up with something cleverer, you can brute-force construct a near-optimal tree (by choosing the pair of pivots that most nearly divide your points in half at each splitting) in O(N^3) time...

There's other stuff out there; e.g. R-trees, M-trees, but I don't really know them well. They're all various kinds of bounding volume hierarchies.

Hope that's somewhat helpful.

###
#5
Members - Reputation: **100**

Posted 13 April 2011 - 02:01 PM

Still the same great responses I remember from my old days here :-) Have to look into these forums more often; even if my fixed-pipeline-graphics-knowledge is somehow outdated :-) I also know alot about AI, but more of the theoretical "not usable in realtime games"-type stuff, then the "how do I make my soldiers cooperate as a realistic squad"-stuff.

alvaro#1: Currently i do about 10 lookups in a set of 2600 images, 10 times per sec = 100 checks in a 2600 set. I do it on an iPhone 3GS arm 32-bit processor. This is brute force check all strategy, so my feeling is that I should be able to do about 100 check is a 15000 set with optimizations. If i could do 200-300 check against a set of about 5000, I would be more then happy.

alvaro#2: Good idea; rank them along number of pixels in square, and then divide on that. Should give me the possibility to exclude a lot of the set + do some space splitting. I have a little pixel-lab where I try out stuff and measure improvement. I promise to post results if I try this out; should be interesting.

Emergent: Ah, thats how you do it :-) you don't need to split along axes. I did a lot of 3D BSP stuff back in the days, so I totally get this now (I sort of fell of the space-stuff when going from 3 to 256 dimensions :-) ). As I read trough your answer, I totally follow you. The two point axis makes perfectly sense regarding my bitstuff. I could even do a sqrt table lookup for distance, since square distance is int 0 - 256. Also thanks for a hint on how to find those points, the perfect divider-algo should be pretty simple to create.

Just to let you know:

Since last time, i optimized my distance-map system, and I optimized my bitstuff. Even if I hate to do low level optimizations; I managed to run in 50% time.

Distance map optimization: For all instances in set, I know the distance to all other instances. If my test is X away from instance 1, and instance 1 is Y away from instance 2, I don't need to do a full distance-test for instance 2 if Y > X*2.

Bitflick optimization: My old version had each case packed into bytes, so my 256 bits was inside 32 bytes. For each of those I did a bit-compare-count. My new version packs the bits into 8 32-bit ints. That way I get away with much less instructions, since I only need to do most of them 8 times (instead of 32).

I will implement the BSP solution and measure efficiency. Since the penalty for increasing the set is lower if I do BSP, I would actually be happy if i could match the current speed. Hopefully I _should_ be able to do it in 5-10% time :-)

###
#6
Members - Reputation: **100**

Posted 13 April 2011 - 02:09 PM

int distance2 = 0; for(int i=0;i<8;i++){ uint32_t value; value = A->data32[i] ^ B->data32[i]; while (value) { distance2++; value = (value - 1) & value; } } return distance2;

The code tag is killing me today :-)

###
#7
Crossbones+ - Reputation: **19059**

Posted 13 April 2011 - 02:20 PM

Is your code anything like this?

int popcount(unsigned x) { x -= (x >> 1) & 0x55555555u; x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u); x = (x + (x >> 4)) & 0x0f0f0f0fu; x += x >> 8; x += x >> 16; return x & 0x3f; } int hamming_distance(const char c1[32], const char c2[32]) { const unsigned *u1 = reinterpret_cast<unsigned const *>(c1); const unsigned *u2 = reinterpret_cast<unsigned const *>(c2); int result = 0; for (int i=0; i<8; ++i) result += popcount(u1[i] ^ u2[i]); return result; }

EDIT: I didn't see your previous post. This method should be faster, unless your images are very very similar.

###
#11
Crossbones+ - Reputation: **19059**

Posted 13 April 2011 - 02:43 PM

alvaro: Yes, but I have no idea of why the popcount return matching bits. I notice that you don't have a loop, so ill check that out :-) Thanks!

You can find similar code with comments here. It should be easy for you to try it out. I suspect it will be significantly faster.

###
#12
Members - Reputation: **100**

Posted 13 April 2011 - 03:04 PM

Optimization #1: use alvaros popcount tok me from benchmark 2480ms to 1138ms. rolling out the 8 step loop took me down to 1086ms, thats 43.8% of the original time! I guess those while-branches really killed my performance :-) (Even if I'm still at O(N) ) BTW; I also checked that the results are exactly the same.

Now lets get on with the space partitioning!

###
#14
Members - Reputation: **193**

Posted 13 April 2011 - 03:32 PM

As a result, I would be inclined to recommend that you forgo using a spatial indexing tree and instead default to using a locality sensitive hash instead. The behavior of locality sensitive hashes for hamming distance computations are now quite well understood and should get you pretty good results. Also they are much easier to implement, and tend to behave nicely with cache performance. Here is a link from wikipedia: (scroll to the bottom to see how to use it to compute nearest neighbors)

http://en.wikipedia....amming_Distance

###
#16
Members - Reputation: **100**

Posted 14 April 2011 - 12:05 AM

###
#17
Members - Reputation: **193**

Posted 14 April 2011 - 02:15 PM

Gionis, Indyk and Motwani, "Similarity Search in Higher Dimensions via Hashing", Proc. of VLDB (1999) PDF

As far as I know, it is the current best solution to the hamming distance nearest neighbor problem.

###
#18
Members - Reputation: **980**

Posted 14 April 2011 - 03:21 PM

For nearest neighbor search, a k-d tree or BSP is not likely to be a good idea in 256 dimensions. The reason for this is that the there is a lower bound on the query time for any range searching data structure with linear storage (ie all BSP type data strucutres) of O(n^((d-1)/d)). In this case, as d grows larger, this quickly approaches linear cost, which basically means that you will be back to your ordinary brute force in terms of performance.

Wow, that's not obvious. You must be referring to this. I think my intuition is going to be having screaming matches with that paper for a little while, until I trust its assumptions.