I have an array of integers (let us assume here 32 bit ints).

N=SomeValue; int *ptr=new int[N];

Does there exist some clever way to sum all the ones (positive bits) instead of going trough every bit and converting it to bool, and if true then sum++?

13 replies to this topic

Posted 05 October 2012 - 01:04 AM

Hello,

I have an array of integers (let us assume here 32 bit ints).

Does there exist some clever way to sum all the ones (positive bits) instead of going trough every bit and converting it to bool, and if true then sum++?

I have an array of integers (let us assume here 32 bit ints).

N=SomeValue; int *ptr=new int[N];

Does there exist some clever way to sum all the ones (positive bits) instead of going trough every bit and converting it to bool, and if true then sum++?

Sponsor:

Posted 05 October 2012 - 01:42 AM

You might be able to use the popcnt SSE4 instruction on each integer, which counts the number of bits set. Not every CPU supports it.

Otherwise, there are some portable tricks to counting number of bits, http://en.wikipedia.org/wiki/Hamming_weight

Otherwise, there are some portable tricks to counting number of bits, http://en.wikipedia.org/wiki/Hamming_weight

Posted 05 October 2012 - 04:40 AM

Written in Notepad, it may or may not compile.

Also it's not tested, it may or may not be faster/slower, that's up to you to test.

Also it's not tested, it may or may not be faster/slower, that's up to you to test.

// initialize table, could be constant, but that's a lot to type out int numBits[256] = { }; for(int i = 0; i < 256; ++i) { numBits[i] = ((i & 0x01) ? 1 : 0) + ((i & 0x02) ? 1 : 0) + ((i & 0x04) ? 1 : 0) + ((i & 0x08) ? 1 : 0) + ((i & 0xF0) ? 1 : 0) + ((i & 0xF1) ? 1 : 0) + ((i & 0xF2) ? 1 : 0) + ((i & 0xF4) ? 1 : 0) + ((i & 0xF8) ? 1 : 0); } int *ptr=new int[N]; int totalNumBits; for(int i = 0; i < N; ++i) { union { int integer; char charArray[4]; } int2char; int2char.integer = ptr[i]; totalNumBits += numBits[int2char.charArray[0]] + numBits[int2char.charArray[1]] + numBits[int2char.charArray[2]] + numBits[int2char.charArray[3]]; }

Posted 05 October 2012 - 05:05 AM

I don't understand the question quite well, but if you are talking about count bit number in an integer, or any similar bit algorithm, here is a very good bit algorithm collection.

http://graphics.stanford.edu/~seander/bithacks.html

Bookmark/Delicious it if you didn't.

http://graphics.stanford.edu/~seander/bithacks.html

Bookmark/Delicious it if you didn't.

http://www.cpgf.org/

cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.

v1.5.5 was released. Now supports tween and timeline for ease animation.

Posted 05 October 2012 - 07:58 AM

This code (or similar) is somewhere in that bithacks page wqking linked:

gcc also has a function called __builtin_popcount, but on my computer the one above is three times faster. You can do it even faster using 64-bit arithmetic.

EDIT: If your hardware can take it, using __builtin_popcount and compiling with -msse4.2 issues the assembly instruction popcnt, which is extremely fast.

unsigned popcount(unsigned x) { x -= (x >> 1) & 0x55555555u; x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u); x = (x + (x >> 4)) & 0x0f0f0f0fu; return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits }

gcc also has a function called __builtin_popcount, but on my computer the one above is three times faster. You can do it even faster using 64-bit arithmetic.

EDIT: If your hardware can take it, using __builtin_popcount and compiling with -msse4.2 issues the assembly instruction popcnt, which is extremely fast.

**Edited by alvaro, 05 October 2012 - 08:26 AM.**

Posted 05 October 2012 - 08:51 AM

Talking about performance, here is my personal experience (maybe off topic).gcc also has a function called __builtin_popcount, but on my computer the one above is three times faster. You can do it even faster using 64-bit arithmetic.

There is another lookup table algorithm for bit counting in my posted url, but be careful to use it.

AFAIR, it may be slower due to the data lookup may invalid the cpu data cache.

The algorithm posted by alvaro should be better because it uses "in-place" data and doesn't access any data in the memory.

**Edited by wqking, 05 October 2012 - 08:54 AM.**

cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.

v1.5.5 was released. Now supports tween and timeline for ease animation.

Posted 05 October 2012 - 12:10 PM

That function scares me...This code (or similar) is somewhere in that bithacks page wqking linked:

unsigned popcount(unsigned x) { x -= (x >> 1) & 0x55555555u; x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u); x = (x + (x >> 4)) & 0x0f0f0f0fu; return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits }

Posted 05 October 2012 - 12:24 PM

That function scares me...This code (or similar) is somewhere in that bithacks page wqking linked:

unsigned popcount(unsigned x) { x -= (x >> 1) & 0x55555555u; x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u); x = (x + (x >> 4)) & 0x0f0f0f0fu; return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits }

Yeah, it is kind of scary... Let me see if I can break it down for you:

The general plan is to treat the unsigned integer initially as 32 integers of size 1 bit each. We'll pair up these 1-bit numbers and add them. The result is 16 integers of size 2 bits. We'll then pair those up, add them together and get 8 integers of size 4 bits. Then pair them up, add the pairs together and we'll have 4 integers of size 8 bits. At this point we use one final trick to compute the sum of all 4, and that's our answer.

For the first step, we need to map every 2 bits by the following table:

00 -> 00

01 -> 01

10 -> 01

11 -> 10

Notice that in the first two cases we aren't changing anything, and in the other two, we are just subtracting one. So we can do that by just computing `x-=x>>1' (if x were a 2-bit integer). Since we are doing this 16 times in parallel, we need to select the appropriate bits after the shift:

x -= (x >> 1) & 0x55555555u; // 0x55555555u is 01010101010101010101010101010101 in binary

So now x can be interpreted as 16 2-bit integers, whose sum we are trying to compute. The next two steps are very similar, and easier to understand than the previous one:

x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u); x = (x + (x >> 4)) & 0x0f0f0f0fu;

By now x can be seen as 4 8-bit integers. In order to compute their sum, let's see what happens when we multiply ABCD times 1111 (in base 256):

ABCD x 1111 ------ ABCD ABCD ABCD ABCD ------- abcdefgNotice that there is no carry in this sum, because all the digits A, B, C and D are at most 8. Since we are doing this whole thing in 32-bit arithmetic, abc is lost to overflow, and the result of the multiplication is really just defg. As you see d=A+B+C+D, and we can extract it with a shift 24 bits to the right.

return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits

Are you less scared now? ;)

**Edited by alvaro, 05 October 2012 - 12:25 PM.**

Posted 05 October 2012 - 01:01 PM

Written in Notepad, it may or may not compile.

Also it's not tested, it may or may not be faster/slower, that's up to you to test.

It's also incorrect, you would need to test against:

0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80.

Oh my... What have I done. Sometimes I wish I could downvote myself.