Sign in to follow this  
zeelot

Help - How to get the most significant bit

Recommended Posts

We are developing game engine on ARM CPU... Here is a problem we have. There is a binary number (stuff like "00101011"). We need to know WHICH bit is its MSB (Most Significant Bit). In the above example, the answer is "6". The problem is: Is it possible to figure out the answer without FOR/WHILE or LOG... - bit operations are preferred - O(1) operation Thanks everybody zEELOt

Share this post


Link to post
Share on other sites
How about something like this... at least it's a start.


int done = 0;
int bit = 17;
int testPattern = 0xFF; // or whatever in hex code
int mask = 0x80;

while(!done)
{
done = testPattern & mask;
mask >>= 1;
bit--;
}
return bit;




Share this post


Link to post
Share on other sites

msb = number;
msb |= msb >> 1;
msb |= msb >> 2;
msb |= msb >> 4;
msb |= msb >> 8;
msb |= msb >> 16;
msb = (++msb)>>1;


ok?

There are faster ways to do it tho.... (many faster ways).

From,
Nice coder

Share this post


Link to post
Share on other sites
To explain.

Lets say you had the number 0010001101

First stage
0010001101 |= 0010001101 >> 1
= 0010001101 | 0001000110

0010001101
0001000110
----------
0011001111

Second stage
0011001111 | 0011001111 >> 2
0011001111
0000110011
0011111111

Next stage....
(they don't make any difference, because of the number. you could use some special cases to increase its speed a bit, but they might not work well (speed vs clutter)).

And the end.

0011111111
+ 1
=
0100000000
Leftshift one
0010000000
Compare to our number
0010001101

See any similarities?
Yes, its its msb.

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by ascorbic
A binary search type approach could yield O(log n). That's probably the best you can do without a huge lookup table.


My current version does 12 ops (all bitwise) for a 32 bit number. (two more for a 64 bit number, ect)

For your o(log n) solution to be better, you'd need a S of less then

S(5) < 12
S < 12/5
S < 2.4

So, you need 2.4 operations per iteration AT MOST. (ie. to break even).

A single if is more then 2.4 operations.

There was a thread about this.... (next biggest power of two... thread, i don't have a link and i'm not gonna battle google to find it).

From,
Nice coder

Share this post


Link to post
Share on other sites
ascorbic: How would you do a binary search?
Nice Coder: That's a clever idea, but you'd better test it to be sure that it's any faster than a simple loop. Also it doesn't answer the question.

zeelot: What do you need the msb for? Does it really make any difference if you use the method ascorbic posted?

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder

msb = number;
msb |= msb >> 1;
msb |= msb >> 2;
msb |= msb >> 4;
msb |= msb >> 8;
msb |= msb >> 16;
msb = (++msb)>>1;



Doesn't work btw.

100000b>>0 | 100000b>>1 | 100000b>>2b | 100000b>>4b =
100000b | 10000b | 1000b | 10b =
111010b

You need to have all of the bits:

msb |= msb >> 1;
msb |= msb >> 2;
msb |= msb >> 3; // <---
msb |= msb >> 4;
msb |= msb >> 5; // <---
....


On ARM I think you can code each line with one instruction. A loop would take 3 instructions, at an average of 1.5 per bit, assuming all integers are equally likely and would also give the answer that was asked for, ie. the MSB-bit. Due to instruction scheduling and caching, it would be best to test either case, or just use the loop for simplicity and smaller size.

Share this post


Link to post
Share on other sites
I figured there had to be a really, really fast way to do it.

For the binary search method, I was just thinking that for a 32 bit number, you "or" the first 16 bits and see if there is a 1 in there. Then narrow the search down to 8 bits in either the first or second half. Then repeat until you find a 1 or you don't. Five or six comparisons. I'll take the other way any day.

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
Quote:
Original post by Nice Coder

msb = number;
msb |= msb >> 1;
msb |= msb >> 2;
msb |= msb >> 4;
msb |= msb >> 8;
msb |= msb >> 16;
msb = (++msb)>>1;



Doesn't work btw.

100000b>>0 | 100000b>>1 | 100000b>>2b | 100000b>>4b =
100000b | 10000b | 1000b | 10b =
111010b

You need to have all of the bits:

msb |= msb >> 1;
msb |= msb >> 2;
msb |= msb >> 3; // <---
msb |= msb >> 4;
msb |= msb >> 5; // <---
....


On ARM I think you can code each line with one instruction. A loop would take 3 instructions, at an average of 1.5 per bit, assuming all integers are equally likely and would also give the answer that was asked for, ie. the MSB-bit. Due to instruction scheduling and caching, it would be best to test either case, or just use the loop for simplicity and smaller size.


Look closer.
Look at the worked eg i used.
Do what i did.

From,
Nice coder

Share this post


Link to post
Share on other sites
Thinking about the problem, I don't think any method, short of the look up table, is going to get you any better performance than O(b), where b is the number of bits that represent the number. The reason being that you have to look at every bit in the number, from MSB to LSB, in order to see whether it is 1 or 0. There's no way you can skip a bit without leaving the opportunity that your answer will be wrong.

So in summary, zeelot, I think you are either forced to use a lookup table or use one of the O(b) algorithms to get the answer.

Share this post


Link to post
Share on other sites
Quote:
Original post by ascorbic
I figured there had to be a really, really fast way to do it.

For the binary search method, I was just thinking that for a 32 bit number, you "or" the first 16 bits and see if there is a 1 in there. Then narrow the search down to 8 bits in either the first or second half.

That would work, but it would be hard on branch prediction, also you'd have a to generate the mask and keep track of the bit, probably would take more instructions total than a simple loop for 32-bits.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder

Look closer.
Look at the worked eg i used.
Do what i did.

Oops. You're right. Anyways it doesn't solve the problem of finding the MSB. It just clears the rest of the bits. Also you've got an overflow problem if the MSB is the 31th bit (counting from 0). You need to test for zero, before and after.

Share this post


Link to post
Share on other sites
Quote:
Original post by ascorbic
I figured there had to be a really, really fast way to do it.

For the binary search method, I was just thinking that for a 32 bit number, you "or" the first 16 bits and see if there is a 1 in there. Then narrow the search down to 8 bits in either the first or second half. Then repeat until you find a 1 or you don't. Five or six comparisons. I'll take the other way any day.


Yep. Assuming a comparison ain't that expencive, its one of the best options.

The best option is a lut assisted binary search.

My best would be only a few comparisons. (ie. somewhere near three), and a 256 element lut.

Assume you have a 32 bit number.

First you test the first 16 bits. (jz, no cmp needed. Use the 16 bit registers)
Then if its true, then you try the first 8 bits. (another jz).
If that doesn't work, then you just use the lut on the output.

If the first 16 bits are false, then you just try the other way. (same thing, just using different registers).

i'm pretty sure an optimised asm version would be superfast.

From,
Nice coder

Share this post


Link to post
Share on other sites
The methods we've come up with so far do find the bit (ex: 0100 0000 0000 0000), but doesn't address the OP's need to know exactly which bit number it is.
Maybe O(n) is the best solution to the problem. Or not... I've reached my level of incompetence for the night.

EDIT: Working in hand coded assembly with all the branching you could return the bit number.

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
Quote:
Original post by Nice Coder

Look closer.
Look at the worked eg i used.
Do what i did.

Oops. You're right. Anyways it doesn't solve the problem of finding the MSB. It just clears the rest of the bits. Also you've got an overflow problem if the MSB is the 31th bit (counting from 0). You need to test for zero, before and after.


Hmm.

Ok, it looks like i misread the question...

This makes it a whole lot harder.

Wtf do you want the msb AS A NUMBER for?

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
ascorbic: How would you do a binary search?
Nice Coder: That's a clever idea, but you'd better test it to be sure that it's any faster than a simple loop. Also it doesn't answer the question.

zeelot: What do you need the msb for? Does it really make any difference if you use the method ascorbic posted?


My method can find it in o(log2(log2(n)) where n is the biggest number that you want it to use.

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
Yep. Assuming a comparison ain't that expencive, its one of the best options.

The best option is a lut assisted binary search.

My best would be only a few comparisons. (ie. somewhere near three), and a 256 element lut.

Assume you have a 32 bit number.

First you test the first 16 bits. (jz, no cmp needed. Use the 16 bit registers)
Then if its true, then you try the first 8 bits. (another jz).
If that doesn't work, then you just use the lut on the output.

If the first 16 bits are false, then you just try the other way. (same thing, just using different registers).

i'm pretty sure an optimised asm version would be superfast.

From,
Nice coder

ARM doesn't have 16-bit subregisters. Other than that, this approach is worth trying and does actually return the MSB bit number. However one should not forget the penalty incurring from a cache miss when accessing the lut.
Though I still don't understand why anyone would need something like this when a three instruction loop does the trick in no more that 32 iterations.

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
Quote:
Original post by Nice Coder
Yep. Assuming a comparison ain't that expencive, its one of the best options.

The best option is a lut assisted binary search.

My best would be only a few comparisons. (ie. somewhere near three), and a 256 element lut.

Assume you have a 32 bit number.

First you test the first 16 bits. (jz, no cmp needed. Use the 16 bit registers)
Then if its true, then you try the first 8 bits. (another jz).
If that doesn't work, then you just use the lut on the output.

If the first 16 bits are false, then you just try the other way. (same thing, just using different registers).

i'm pretty sure an optimised asm version would be superfast.

From,
Nice coder

ARM doesn't have 16-bit subregisters. Other than that, this approach is worth trying and does actually return the MSB bit number. However one should not forget the penalty incurring from a cache miss when accessing the lut.
Though I still don't understand why anyone would need something like this when a three instruction loop does the trick in no more that 32 iterations.



Neither do i. But its a nice problem.

Also 3 instructino loop (+ overhead, your've got two instructions loop overhad, with branch mispredictions, ect. with another 4-5 in the loop itself.)

Assume you've got 7 instructions in the main loop.
7*31 = 217 instructions.
Thats pretty big. (tho nothing major unless your doing it heaps)

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
My method can find it in o(log2(log2(n)) where n is the biggest number that you want it to use.

That's 5+1 operations vs. average of 16 for 32-bit numbers. In other words, it's only worth the extra lines of code if it's really performance critical. Still, it's a nice trick to remember, if ever needed.

Share this post


Link to post
Share on other sites
Quote:
Original post by zeelot
We need to know WHICH bit is its MSB (Most Significant Bit).
There are gazillion ways of doing this. Here is one:

int ilog2(unsigned int i) {
int b = 0;
if ((i & 0xAAAAAAAA) > (i & 0x55555555)) b++;
if ((i & 0xCCCCCCCC) > (i & 0x33333333)) b += 2;
if ((i & 0xF0F0F0F0) > (i & 0x0f0f0f0f)) b += 4;
if ((i & 0xFF00FF00) > (i & 0x00ff00ff)) b += 8;
if ((i & 0xFFFF0000) > 0) b += 16;
return b;
}


Here's another one:

int ilog2(unsigned int i) {
int b = 0;
if (i & 0xffff0000) b += 16; else i<<=16;
if (i & 0xff000000) b += 8; else i<<=8;
if (i & 0xf0000000) b += 4; else i<<=4;
if (i & 0xc0000000) b += 2; else i<<=2;
if (i & 0x80000000) b += 1;
return b;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder

Also 3 instructino loop (+ overhead, your've got two instructions loop overhad, with branch mispredictions, ect. with another 4-5 in the loop itself.)

Assume you've got 7 instructions in the main loop.
7*31 = 217 instructions.
Thats pretty big. (tho nothing major unless your doing it heaps)

From,
Nice coder


I can't be bothered to lookup ARM instruction set right now, but I think you can do the entire loop in three:

1. Increment MSB bit counter
2. Shift left and set flags
3. Branch if not zero

So it's only 3*31 = 93 instructions and if all integers are equally likely, it's half of that on average, ie. 41.5 . If the MSB is more likely to be small, then it's faster than 41.5 on average. If it's more likely to be bigger, then the loop can be written to shift in the other direction and again get a better average.

I don't know how different ARM processor do branch prediction, but the branch is always taken, except on the last iteration, so it has a pretty good change of getting it right.


[Edited by - FlowingOoze on July 11, 2005 3:02:34 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Christer Ericson
Quote:
Original post by zeelot
We need to know WHICH bit is its MSB (Most Significant Bit).
There are gazillion ways of doing this. Here is one:

*** Source Snippet Removed ***

Here's another one:

*** Source Snippet Removed ***

This is the binary search that was mentioned. Roughly counting the instructions needed, for the latter: 19, of which five are branches, with 50% change of correct branch prediction. I'm inclined to think that this is somewhat faster than a loop.

EDIT: Actually on second though, one could write this using conditional codes in instructions and avoid branch misprediction. I'd say this is much faster, if optimised properly.


[Edited by - FlowingOoze on July 11, 2005 3:13:19 AM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this