• Create Account

Smart way to count positive entries in bitset

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

13 replies to this topic

#1Misery  Members

Posted 05 October 2012 - 01:04 AM

Hello,

I have an array of integers (let us assume here 32 bit ints).

 N=SomeValue;
int *ptr=new int[N];


Does there exist some clever way to sum all the ones (positive bits) instead of going trough every bit and converting it to bool, and if true then sum++?

#2doeme  Members

Posted 05 October 2012 - 01:23 AM

Aside from comparing every entry to 0 and then adding it, I don't see a better way unless you are able to store the entries sorted, so you only need to know where the first 0 is and then only iterate over a part of the array.

#3patrrr  Members

Posted 05 October 2012 - 01:42 AM

You might be able to use the popcnt SSE4 instruction on each integer, which counts the number of bits set. Not every CPU supports it.
Otherwise, there are some portable tricks to counting number of bits, http://en.wikipedia.org/wiki/Hamming_weight

#4Misery  Members

Posted 05 October 2012 - 01:44 AM

Pity
I thought that it could work similarily to filling with ones or zeros the whole bitset. It is enough then to set ~0 or 0 to the whole container.

#5Ripiz  Members

Posted 05 October 2012 - 04:40 AM

Written in Notepad, it may or may not compile.
Also it's not tested, it may or may not be faster/slower, that's up to you to test.

// initialize table, could be constant, but that's a lot to type out
int numBits[256] = { };
for(int i = 0; i < 256; ++i) {
numBits[i] =
((i & 0x01) ? 1 : 0) +
((i & 0x02) ? 1 : 0) +
((i & 0x04) ? 1 : 0) +
((i & 0x08) ? 1 : 0) +
((i & 0xF0) ? 1 : 0) +
((i & 0xF1) ? 1 : 0) +
((i & 0xF2) ? 1 : 0) +
((i & 0xF4) ? 1 : 0) +
((i & 0xF8) ? 1 : 0);
}

int *ptr=new int[N];
int totalNumBits;
for(int i = 0; i < N; ++i) {
union {
int integer;
char charArray[4];
} int2char;
int2char.integer = ptr[i];
totalNumBits += numBits[int2char.charArray[0]] + numBits[int2char.charArray[1]] + numBits[int2char.charArray[2]] + numBits[int2char.charArray[3]];
}


#6wqking  Members

Posted 05 October 2012 - 05:05 AM

I don't understand the question quite well, but if you are talking about count bit number in an integer, or any similar bit algorithm, here is a very good bit algorithm collection.
http://graphics.stanford.edu/~seander/bithacks.html
Bookmark/Delicious it if you didn't.

http://www.cpgf.org/
cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.
v1.5.5 was released. Now supports tween and timeline for ease animation.

#7Waterlimon  Members

Posted 05 October 2012 - 05:49 AM

Apparently this is called "Hamming weight"

o3o

#8luca-deltodesco  Members

Posted 05 October 2012 - 06:12 AM

Written in Notepad, it may or may not compile.
Also it's not tested, it may or may not be faster/slower, that's up to you to test.

It's also incorrect, you would need to test against:

0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80.

#9Álvaro  Members

Posted 05 October 2012 - 07:58 AM

This code (or similar) is somewhere in that bithacks page wqking linked:
unsigned popcount(unsigned x) {
x -= (x >> 1) & 0x55555555u;
x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u);
x = (x + (x >> 4)) & 0x0f0f0f0fu;
return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits
}


gcc also has a function called __builtin_popcount, but on my computer the one above is three times faster. You can do it even faster using 64-bit arithmetic.

EDIT: If your hardware can take it, using __builtin_popcount and compiling with -msse4.2 issues the assembly instruction popcnt, which is extremely fast.

Edited by alvaro, 05 October 2012 - 08:26 AM.

#10wqking  Members

Posted 05 October 2012 - 08:51 AM

gcc also has a function called __builtin_popcount, but on my computer the one above is three times faster. You can do it even faster using 64-bit arithmetic.

Talking about performance, here is my personal experience (maybe off topic).
There is another lookup table algorithm for bit counting in my posted url, but be careful to use it.
AFAIR, it may be slower due to the data lookup may invalid the cpu data cache.
The algorithm posted by alvaro should be better because it uses "in-place" data and doesn't access any data in the memory.

Edited by wqking, 05 October 2012 - 08:54 AM.

http://www.cpgf.org/
cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.
v1.5.5 was released. Now supports tween and timeline for ease animation.

#11Misery  Members

Posted 05 October 2012 - 08:56 AM

@Alvaro & @wqking:

That rocks!!
Thanks a lot :]

#12jwezorek  Members

Posted 05 October 2012 - 12:10 PM

This code (or similar) is somewhere in that bithacks page wqking linked:

unsigned popcount(unsigned x) {
x -= (x >> 1) & 0x55555555u;
x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u);
x = (x + (x >> 4)) & 0x0f0f0f0fu;
return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits
}


That function scares me...

#13Álvaro  Members

Posted 05 October 2012 - 12:24 PM

POPULAR

This code (or similar) is somewhere in that bithacks page wqking linked:

unsigned popcount(unsigned x) {
x -= (x >> 1) & 0x55555555u;
x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u);
x = (x + (x >> 4)) & 0x0f0f0f0fu;
return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits
}


That function scares me...

Yeah, it is kind of scary... Let me see if I can break it down for you:

The general plan is to treat the unsigned integer initially as 32 integers of size 1 bit each. We'll pair up these 1-bit numbers and add them. The result is 16 integers of size 2 bits. We'll then pair those up, add them together and get 8 integers of size 4 bits. Then pair them up, add the pairs together and we'll have 4 integers of size 8 bits. At this point we use one final trick to compute the sum of all 4, and that's our answer.

For the first step, we need to map every 2 bits by the following table:
00 -> 00
01 -> 01
10 -> 01
11 -> 10

Notice that in the first two cases we aren't changing anything, and in the other two, we are just subtracting one. So we can do that by just computing x-=x>>1' (if x were a 2-bit integer). Since we are doing this 16 times in parallel, we need to select the appropriate bits after the shift:
x -= (x >> 1) & 0x55555555u; // 0x55555555u is 01010101010101010101010101010101 in binary

So now x can be interpreted as 16 2-bit integers, whose sum we are trying to compute. The next two steps are very similar, and easier to understand than the previous one:
x = (x & 0x33333333u) + ((x >> 2) & 0x33333333u);
x = (x + (x >> 4)) & 0x0f0f0f0fu;

By now x can be seen as 4 8-bit integers. In order to compute their sum, let's see what happens when we multiply ABCD times 1111 (in base 256):
   ABCD
x 1111
------
ABCD
ABCD
ABCD
ABCD
-------
abcdefg
Notice that there is no carry in this sum, because all the digits A, B, C and D are at most 8. Since we are doing this whole thing in 32-bit arithmetic, abc is lost to overflow, and the result of the multiplication is really just defg. As you see d=A+B+C+D, and we can extract it with a shift 24 bits to the right.

return (x * 0x01010101u) >> 24; // assumes unsigned is 32 bits`

Are you less scared now? ;)

Edited by alvaro, 05 October 2012 - 12:25 PM.

#14Ripiz  Members

Posted 05 October 2012 - 01:01 PM

Written in Notepad, it may or may not compile.
Also it's not tested, it may or may not be faster/slower, that's up to you to test.

It's also incorrect, you would need to test against:

0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80.

Oh my... What have I done. Sometimes I wish I could downvote myself.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.