Bit-counting trick - new to me

Started by
16 comments, last by Mantear 14 years, 7 months ago
Greetings, I was looking for the best way to count the number of bits set in a 32-bit number, and found this:
unsigned int BitCount(unsigned int v)
{
   v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
   v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
   return ((v + (v >> 4) & 0x0F0F0F0F) * 0x01010101) >> 24; // count
}
I had never seen this before and was wondering if it was a commonly known method or not. I think it's rather impressive. I also came across another method called MIT HAKMEM, but the modulo 63 seems to kill its performance.
Advertisement
Did you have a question about how the method works or did you just want to show us a cool hack?

If you find this sort of thing entertaining then I can warmly recommend Hacker's Delight.
It's fairly well known. See http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive for some other options.
I've seen something like that before, though it had more steps. Very similar though. I like Kernighan's method personally, easy to understand and doesn't check useless bits:

unsigned int val, cnt;for (cnt = 0; val; cnt++)  val &= val - 1;


Unfortunately they all pretty much suck since it's usually one long dependency chain, which is the justification behind the population count instruction. [grin]
Quote:Original post by Mantear
I also came across another method called MIT HAKMEM, but the modulo 63 seems to kill its performance.


That's not the name of the method; it's the name of a series of articles, one of which presented a method, among other things.
For what it's worth, the fastest way of computing the number of 1-bits in an integer is with a lookup table 256-byte lookup table that you index 4 times. I agree the method you posted is cool though.
Quote:Original post by cache_hit
For what it's worth, the fastest way of computing the number of 1-bits in an integer is with a lookup table 256-byte lookup table that you index 4 times. I agree the method you posted is cool though.


What about a 64k lookup table that you index into twice?

Also, some architectures have a popcount instruction, which is likely much faster.
Quote:Original post by cache_hit
For what it's worth, the fastest way of computing the number of 1-bits in an integer is with a lookup table 256-byte lookup table that you index 4 times. I agree the method you posted is cool though.
That would depend on just how fast the processor is at doing memory lookups, or didn't we learn anything from the "Can this be optimized?" thread? Besides 16-bit LUTs would be faster on quite a few systems.

Aside from processors with population count instructions you can also get a lot of performance from the parallel shift-and-add trick on large bit vectors. That is by unrolling using SIMD instructions you can get a high degree of instruction-level parallelism as well as conveniently process large words within a single instruction, not to mention that the later iterations for dealing with the larger bit chunks need only be done once for several input words.


Besides which I'm disappointed in you people, I thought I had found the perfect thread to sneak in that earth-shattering link but no one noticed. What is the Internet coming to if you can't get people to click on random links?

edit: Beaten to the punch by RDragon1. How did I not notice your post?
Quote:Original post by cache_hit
For what it's worth, the fastest way of computing the number of 1-bits in an integer is with a lookup table 256-byte lookup table that you index 4 times. I agree the method you posted is cool though.


The article already posted above (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) says the code from the OP really is the fastest, faster than the lookup table.

I still don't get how it works though, or what numbers you have to put in there to make it work with 64-bit or 128-bit integers.

Maybe those numbers are some sort of bit filters? Because 0x55555555 is 01010101010101010101010101010101, 0x33333333 is 00110011001100110011001100110011, 0x0F0F0F0F is 00001111000011110000111100001111, etc...
Quote:Original post by RDragon1
Quote:Original post by cache_hit
For what it's worth, the fastest way of computing the number of 1-bits in an integer is with a lookup table 256-byte lookup table that you index 4 times. I agree the method you posted is cool though.


What about a 64k lookup table that you index into twice?

Also, some architectures have a popcount instruction, which is likely much faster.


Alright alright you win. Just that 64k is orders of magnitude larger than 256 bytes, often the tradeoff is not worth it. You could also make a 32MB lookup table if you really wanted to :)


The link showing the different methods is interesting. I'd have to benchmark it to be convinced that the method is faster than a lookup table. I can see it possibly being faster when the lookup table is small relative to the number of bits in the integer being counted (for example a 256-byte lookup table on a 32-bit integer), but not when the lookup table is relatively near in size to the number of bits in the integer.

This topic is closed to new replies.

Advertisement