I feel this discussion is worthy of a separate thread, as it does not particularly pertain to the discussion in the original thread.
No one broke out the leet bit-twiddling skillz?
[font="Courier New"][size="1"]count = (((((v - ((v >> 1) & 0x55555555)) & 0x33333333) + (((v - ((v >> 1) & 0x55555555)) >> 2) & 0x33333333)) + ((((v - ((v >> 1) & 0x55555555)) & 0x33333333) + (((v - ((v >> 1) & 0x55555555)) >> 2) & 0x33333333)) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;[/font]
Okay, I need to know.
Here's my naive approach, in 20 operations (*):
unsigned count_bits( unsigned x ) {
x = ((x & 0x55555555) + ((x >> 1) & 0x55555555));
x = ((x & 0x33333333) + ((x >> 2) & 0x33333333));
x = ((x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F));
x = ((x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF));
x = ((x & 0x0000FFFF) + ((x >> 16) & 0x0000FFFF));
return x;
}
Since a bit is, in itself, the number of bits in that bit, we can use 16 two-bit accumulators to count the number of bits are in each 2-bit group. This is done by selecting every other bit and adding it with it's neighbor. Then each pair of 2-bit accumulators are accumulated into 4-bit accumulators, and so on, until we get our result in the 16-bit accumulator.
Hodgman's sample code (taken from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive), however, reduces to 12 operations:
unsigned count_bits_reduced( unsigned x ) {
unsigned z = (x - ((x >> 1) & 0x55555555));
unsigned y = ((z & 0x33333333) + ((z >> 2) & 0x33333333));
return ((y + (y >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
My questions are:
1) What advantages are gained by negating the value in: [font="Courier New"](x - ((x >> 1) &0x55555555)[/font]?
2) Why the multiplication of [font="Courier New"]0x1010101[/font], and the r-shift of 24?
3) Does this trickery scale to 64-bit integers? Or does it only apply to 32-bit integers?
[size="1"](*) Most likely not a very good base to make assumptions about speed, size, or performance