Sign in to follow this  
Jengu

fastest ceil power of 2 function?

Recommended Posts

This is the fastest "get next largest number that is a power of 2" function that I've been able to come up with, and I'm curious if anyone has made a faster one. I googled a bit and most of the examples I've seen use bitscanning and are 6-7 instructions. This one is just 4. In pseudo assembly: 1. Shift X 1 bit left, store in register 1. 2. Xor X and Register 1, store in register 2. 3. Subtract register 1 from register 2, store in register 1. 4. Add register 1 to X, store in return result. Sample output from doing by hand in python: >>> 97 << 1 194 >>> 194 ^ 97 163 >>> 194-163 31 >>> 97+31 128

Share this post


Link to post
Share on other sites
Don't know which one is faster, but here is another one using the bitscan:

MOV dword [output], 1
BSR ecx, [input]
INC ecx
SHL [output], cl

Share this post


Link to post
Share on other sites
Your method has 1 fatal flaw: it doesn't work.

Say X=9.

Step 1, R1 = 18.
Step 2, R2 = 27.
Step 3, R1 = 9.
Step 4, return 18?

This algorithm on works properly in the case that the two highest bits of X are both 1; thus it works for numbers such as 24, 49, and 112, but fails for numbers such as 10 and 132.
As a side note, if X IS a power of 2, this function will indeed return the next highest power of 2 (so for X=2, it returns 4); whether that's what is desired is a different story entirely.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anthony Serrano
This algorithm on works properly in the case that the two highest bits of X are both 1; thus it works for numbers such as 24, 49, and 112, but fails for numbers such as 10 and 132.


Good catch. I'm surprised that picking a few random numbers I happened to always hit ones that worked.

Share this post


Link to post
Share on other sites
I think I probably stole this from this site a long time ago. I've never bothered to try to understand it or even debug it to make sure it's 100% correct, but so far it's worked fine for me. No clue if it's fast.


inline unsigned int npow2(unsigned int n)
{
static const unsigned int MantissaMask = (1UL << 23UL) - 1UL;
*(float*)&n = (float)n;
n = n + MantissaMask & ~MantissaMask;
n = (unsigned int)(*(float*)&n);
return n;
}


Share this post


Link to post
Share on other sites
It's not fast, it does some pretty disgusting bit hackery (although that is true of all the methods here) and it improperly assumes that sizeof(int) == sizeof(float) when no such guarantee is made by the language specification.

It is converting the integer to its floating point representation, and then making use of the way that floating points are represented under the IEEE standard to extract the exponent (power of 2 when you express the value in "base-2 scientific notation). This in turn is done by clearing the mantissa (so that you have a floating-point value that's just 1.000000 * that exponent) and converting the floating point representation back to its integer representation. Both conversions are done re-writing the value into its original place.

Don't use this. If you're going to do bit-level hacks, you may as well at least pick an efficient one (one that only does the work you're really interested in).

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
It's not fast

Just to lend some validity to this, I ran a quick test.

Results of 5 runs of those implementations:

NextPowerOf2 result: 3466242731 in 0.0063902 seconds
npow2 result: 3466242731 in 0.038244 seconds

NextPowerOf2 result: 3466242731 in 0.0063821 seconds
npow2 result: 3466242731 in 0.0391567 seconds

NextPowerOf2 result: 3466242731 in 0.00667347 seconds
npow2 result: 3466242731 in 0.0381962 seconds

NextPowerOf2 result: 3466242731 in 0.00662486 seconds
npow2 result: 3466242731 in 0.0388066 seconds

NextPowerOf2 result: 3466242731 in 0.00641897 seconds
npow2 result: 3466242731 in 0.0382289 seconds

Here's the test code. It was built in release mode in VS7.1 with Full Optimizations on. The StopWatch class is just a QPC/QPF wrapper.


inline size_t NextPowerOf2(size_t n)
{
n -= 1;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
#ifdef _WIN64
n |= n >> 32;
#endif
n += 1;

return n;
}

inline unsigned int npow2(unsigned int n)
{
static const unsigned int MantissaMask = (1UL << 23UL) - 1UL;
*(float*)&n = (float)n;
n = n + MantissaMask & ~MantissaMask;
n = (unsigned int)(*(float*)&n);
return n;
}

void test_funcs(void)
{
size_t n;
StopWatch stopWatch;
n = 0;
for (int i = 0; i < 1000000; ++i)
n += NextPowerOf2(i);

std::cout << "NextPowerOf2 result: " << n << " in " << stopWatch.elapsed() << " seconds" << std::endl;

stopWatch.reset();
n = 0;
for (int i = 0; i < 1000000; ++i)
n += npow2(i);

std::cout << "npow2 result: " << n << " in " << stopWatch.elapsed() << " seconds" << std::endl;
}

int main(int argc, char *argv[])
{
test_funcs();
test_funcs();
test_funcs();
test_funcs();
test_funcs();
}


Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this