Sign in to follow this  
y2jsave

count bits set in a number

Recommended Posts

Hi , can someone please explain me this code ..

int bitcount(unsigned int n)
{
register unsigned int tmp;

tmp = n - ((n >> 1) & 033333333333)
- ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}

i found it in net ,, but can't get it ..

regards
y2jsave

Share this post


Link to post
Share on other sites
This kind of bithack is probably more magic than it is computer science, and likelihood less efficient than more straightforward algorithms on most architectures, but it is nevertheless an interesting method so I'll try to explain how I believe it works.


First off lets consider another, easier, way of counting the bits in a 32-bit word:
int bitcount(unsigned int n)
{
n = (n & 0x55555555) + ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
n = (n & 0x00FF00FF) + ((n >> 8) & 0x00FF00FF);
n = (n & 0x0000FFFF) + ((n >> 16) & 0x0000FFFF);
return n;
}
This parallel divide-and-conquer method works on several smaller bit-fields at once within the same 32-bit integer. First the input is divided into two-bit pairs where the upper bit is added to the lower bit to calculate the total number of bits set within the pair (e.g. zero to two.) The same procedure is then repeated to combine the two-bit words into four-bit words, four-bit words into eight-bit words, and so on until we're left with a single bit count for the entire input.


Now, lets consider the first line of the algorithm you posted:
tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
This does actually divides the input into three-bit (or octal digits, hence the funky constants) in a similar way to the above solution. To see this consider what happens to a single digit in the two subtractions. First we subtract half of the digit from itself while masking off the least-significant bit so as not to spill over into the next lower digit. This yields:
(4·c + 2·b + 1·a) - (2·c + 1·b) = 2·c + b + a
We then subtract one quarter of the original digit in a similar fashion to get:
(2·c + b + a) - (1·c) = c + b + a
... and we're left with just the three-bit counts we wanted.

The part of the code, ((tmp + (tmp >> 3)) & 030707070707), simply combines pairs of three-bit digits into six-bit words as I described before. Only the single final and is necessary here since the maximum sum of two three-bit digits is six, which fits neatly within a digit without any risk of overflowing into the next.


The final modulo trick is then used to sum up each of the separate 6-bit fields into a single count. The idea here is that for any natural number, x, the sum of its' digits is congruent x modulo the base minus one (mod 63 in our particular case.) So the remainder will be equal to the total bit count as long as no more than 62-bits are set.

In other words if (xn ... x1 x0) are the base-64 digits of x then:
x ≡ (x0 + x1 + ... + xn) (mod 63)
You can see that this holds since subtracting RHS from LHS gives us an expression congruent with zero:
x - (x0 + x1 + ... + xn) =
(640x0 + 641x1 + 64nxn) - (x0 + x1 + ... + xn) =
(640 - 1)x0 + (641 - 1)x1 + ... + (64n - 1)xn
Where each (64n - 1) factor is clearly divisible by 63.

Share this post


Link to post
Share on other sites
I'd like to add that implicit's post is one of the most comprehensive and well-written explanations I have ever seen on these forums. Well done!

Share this post


Link to post
Share on other sites
A possibly better method can be found at:
support.amd.com/us/Processor_TechDocs/25112.PDF

See Section 8.6 - Efficient Implementation of Population-Count Function in 32-Bit Mode.

It can basically boil down to this:
unsigned int BitCount(unsigned int value)
{
value = value - ((value >> 1) & 0x55555555);
value = (value & 0x33333333) + ((value >> 2) & 0x33333333);
return (((value + (value >> 4) & 0x0F0F0F0F) * 0x01010101) >> 24);
}

No nasty '%' operator needed. Performance may vary depending upon platform, but this is the one I've been using for my auto-correlation code.

Share this post


Link to post
Share on other sites
Quote:
Original post by implicit
This kind of bithack is probably more magic than it is computer science, and likelihood less efficient than more straightforward algorithms on most architectures
It isn't magic, just basic bitwise manipulation.

You are right about different architectures; this code block only works with 32-bit unsigned integers.

As was pointed out, the mod operator can be removed through additional bitwise operations.



Be careful with this kind of processing because it can give poor performance on deeply pipelines processors, or those with out-of-order cores. Basically, that means any modern processor. :-)


There are CPU instructions on many architectures that do it in one instruction; GCC has inline functions that will use the optimal for your processor or a failsafe lookup table approach for processors that don't. It is named popcount (population count), I believe.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this