Help - How to get the most significant bit

Started by
32 comments, last by zeelot 18 years, 9 months ago
Christer Ericson posted solution already.

Quote:Original post by FlowingOoze
Quote:Original post by Nice Coder
msb = number;msb |= msb >> 1;msb |= msb >> 2;msb |= msb >> 4;msb |= msb >> 8;msb |= msb >> 16;msb = (++msb)>>1;



Doesn't work btw.

100000b>>0 | 100000b>>1 | 100000b>>2b | 100000b>>4b =
100000b | 10000b | 1000b | 10b =
111010b

You need to have all of the bits:
msb |= msb >> 1;msb |= msb >> 2;msb |= msb >> 3;  // <---msb |= msb >> 4;msb |= msb >> 5;  // <---....


On ARM I think you can code each line with one instruction. A loop would take 3 instructions, at an average of 1.5 per bit, assuming all integers are equally likely and would also give the answer that was asked for, ie. the MSB-bit. Due to instruction scheduling and caching, it would be best to test either case, or just use the loop for simplicity and smaller size.

not, it sure _does_ work. (though it's not an answer to OP question, but anyway)

How it works: it sets to true all bits lower than most significant bit including it, then adds one and shifts right to get 1 in place of MSB.

How it works: consider history of how most significant bit affect the output after or-ing:
Initial state: we have one bit set to 1, rest is x (unknown)
i=1xxxxxxx
first step:
i=i|i>>1 = 1xxx xxxx | 01xx xxxx = 11xx xxxx;
second step:
i=i|i>>2 = 11xx xxxx | 0011 xxxx = 1111 xxxx;
third
i=i|i>>4 = 1111 xxxx | 0000 1111 = 1111 1111;

and so-on.


******************************************************
as about O(1), I'm unsure you know what it does mean in this context. Note that addition (as it done in processor) requirs O(log N) time and requirs O(N*log N) transistors, where N is number of bits.
If by "O(1)" you simply want to say that you need it to execute in constant time for 32 bit integer (for example), then _any_ sane algorithm (including slow ones) will be O(1) just because number of bits is constant. It's simply meaningless. What's important is how many operations.
For what nice coder's code does, it is better than this a|=a>>1 , a|=a>>2, a|=a>>3 , a|=a>>4 , a|=a>>5 and so-on

[Edited by - Dmytry on July 11, 2005 6:06:00 AM]
Advertisement
Quote:Original post by FlowingOoze
EDIT: Actually on second though, one could write this using conditional codes in instructions and avoid branch misprediction. I'd say this is much faster, if optimised properly.
Recall that the OP was asking about doing this on the ARM. The ARM has conditional execution on each instruction so there's no need to try to remove the branching -- the CPU architecture already handles that (because it's cool like that).
Quote:Original post by Nice Coder
I am sorry, but this opinion must be said.

OMG THAT IS SO 1337!!!

And also, Rate++.
Those weren't the 1337 versions. This is the 1337 version!

static char tab[64] = {    -1, 0,-1,15,-1, 1,28,-1,16,-1,-1,-1, 2,21,29,-1,    -1,-1,19,17,10,-1,12,-1,-1, 3,-1, 6,-1,22,30,-1,    14,-1,27,-1,-1,-1,20,-1,18, 9,11,-1, 5,-1,-1,13,    26,-1,-1, 8,-1, 4,-1,25,-1, 7,24,-1,23,-1,31,-1};int ilog2(unsigned int n) {    n |= n >> 1;    n |= n >> 2;    n |= n >> 4;    n |= n >> 8;    n |= n >> 16;    n = (n << 3) - n;    n = (n << 8) - n;    n = (n << 8) - n;    n = (n << 8) - n;    return tab[n >> 26]; }


BTW, I don't take any credit for these approaches; I picked them up from other people, many years ago. Once you've seen them they become pretty straightforward if you think about it for a bit. Although not this one! :)

Hello, everybody.

Sorry that I didn't expect this forum is so "active".

I sincerely thank you for your valuable help and suggestions.

I also have to apologize that I didn't describe the problem in enough detail...and sorry for the confusion.

Here is a little bit more background about the problem.

Our purpose is a simple calculation: 1.0/f, "f" is a float-point value.
Unfortunately, the ARM system we have been working on has no hardware float-point division support. So we have to simulate the division calculation by ourselves.

A common way to do this is by a look-up table, i.e., storing the pre-calculated results in a LARGE table. For a normal 32-bit float, there are 2^32 possibilities; the table would be HUGE. So we need to do some "pre-processing" on the float number, i.e., removing some lower digits from the original number.

So, if we need to preserve 10 digits and the total number of effective digits of the float is 14, we need to remove 14-10=4 digits, which can be easily achieved by ">>".

For example, for the number "0001 1011", the number of effective digits (or bits) is 5 (11011, the 3 zeros on the left-most shouldn't be counted in). If we only need 3 effecive bits, the "11" on the rear need to be removed (the positions will filled up with zero - "0001 1000")

Obviously, the key is HOW MANY digits we need to remove. In order to get that, since how many digits will be preserved is already known, the number of effective digits of the original number is required.

Because this operation will be used so frequently on 3D calculation, the performance is important, that's the reason why a FAST solution is necessary.

Thank you

zEELOt

This topic is closed to new replies.

Advertisement