Jengu 122 Report post Posted June 11, 2006 This is the fastest "get next largest number that is a power of 2" function that I've been able to come up with, and I'm curious if anyone has made a faster one. I googled a bit and most of the examples I've seen use bitscanning and are 6-7 instructions. This one is just 4. In pseudo assembly: 1. Shift X 1 bit left, store in register 1. 2. Xor X and Register 1, store in register 2. 3. Subtract register 1 from register 2, store in register 1. 4. Add register 1 to X, store in return result. Sample output from doing by hand in python: >>> 97 << 1 194 >>> 194 ^ 97 163 >>> 194-163 31 >>> 97+31 128 0 Share this post Link to post Share on other sites
Winograd 380 Report post Posted June 12, 2006 Don't know which one is faster, but here is another one using the bitscan:MOV dword [output], 1BSR ecx, [input]INC ecxSHL [output], cl 0 Share this post Link to post Share on other sites
Anthony Serrano 3285 Report post Posted June 12, 2006 Your method has 1 fatal flaw: it doesn't work.Say X=9.Step 1, R1 = 18.Step 2, R2 = 27.Step 3, R1 = 9.Step 4, return 18?This algorithm on works properly in the case that the two highest bits of X are both 1; thus it works for numbers such as 24, 49, and 112, but fails for numbers such as 10 and 132.As a side note, if X IS a power of 2, this function will indeed return the next highest power of 2 (so for X=2, it returns 4); whether that's what is desired is a different story entirely. 0 Share this post Link to post Share on other sites
Marcus Speight 152 Report post Posted June 12, 2006 BSR takes 7-39 cycles. I useinline size_t NextPowerOf2(size_t n){ n -= 1; n |= n >> 1; n |= n >> 2; n |= n >> 4; n |= n >> 8; n |= n >> 16;#ifdef _WIN64 n |= n >> 32;#endif n += 1; return n;} 0 Share this post Link to post Share on other sites
Jengu 122 Report post Posted June 12, 2006 Quote:Original post by Anthony SerranoThis algorithm on works properly in the case that the two highest bits of X are both 1; thus it works for numbers such as 24, 49, and 112, but fails for numbers such as 10 and 132.Good catch. I'm surprised that picking a few random numbers I happened to always hit ones that worked. 0 Share this post Link to post Share on other sites
mfawcett 373 Report post Posted June 12, 2006 I think I probably stole this from this site a long time ago. I've never bothered to try to understand it or even debug it to make sure it's 100% correct, but so far it's worked fine for me. No clue if it's fast.inline unsigned int npow2(unsigned int n){ static const unsigned int MantissaMask = (1UL << 23UL) - 1UL; *(float*)&n = (float)n; n = n + MantissaMask & ~MantissaMask; n = (unsigned int)(*(float*)&n); return n;} 0 Share this post Link to post Share on other sites
Zahlman 1682 Report post Posted June 12, 2006 It's not fast, it does some pretty disgusting bit hackery (although that is true of all the methods here) and it improperly assumes that sizeof(int) == sizeof(float) when no such guarantee is made by the language specification.It is converting the integer to its floating point representation, and then making use of the way that floating points are represented under the IEEE standard to extract the exponent (power of 2 when you express the value in "base-2 scientific notation). This in turn is done by clearing the mantissa (so that you have a floating-point value that's just 1.000000 * that exponent) and converting the floating point representation back to its integer representation. Both conversions are done re-writing the value into its original place.Don't use this. If you're going to do bit-level hacks, you may as well at least pick an efficient one (one that only does the work you're really interested in). 0 Share this post Link to post Share on other sites
mfawcett 373 Report post Posted June 12, 2006 Quote:Original post by ZahlmanIt's not fastJust to lend some validity to this, I ran a quick test.Results of 5 runs of those implementations:NextPowerOf2 result: 3466242731 in 0.0063902 secondsnpow2 result: 3466242731 in 0.038244 secondsNextPowerOf2 result: 3466242731 in 0.0063821 secondsnpow2 result: 3466242731 in 0.0391567 secondsNextPowerOf2 result: 3466242731 in 0.00667347 secondsnpow2 result: 3466242731 in 0.0381962 secondsNextPowerOf2 result: 3466242731 in 0.00662486 secondsnpow2 result: 3466242731 in 0.0388066 secondsNextPowerOf2 result: 3466242731 in 0.00641897 secondsnpow2 result: 3466242731 in 0.0382289 secondsHere's the test code. It was built in release mode in VS7.1 with Full Optimizations on. The StopWatch class is just a QPC/QPF wrapper.inline size_t NextPowerOf2(size_t n){ n -= 1; n |= n >> 1; n |= n >> 2; n |= n >> 4; n |= n >> 8; n |= n >> 16;#ifdef _WIN64 n |= n >> 32;#endif n += 1; return n;}inline unsigned int npow2(unsigned int n){ static const unsigned int MantissaMask = (1UL << 23UL) - 1UL; *(float*)&n = (float)n; n = n + MantissaMask & ~MantissaMask; n = (unsigned int)(*(float*)&n); return n;}void test_funcs(void){ size_t n; StopWatch stopWatch; n = 0; for (int i = 0; i < 1000000; ++i) n += NextPowerOf2(i); std::cout << "NextPowerOf2 result: " << n << " in " << stopWatch.elapsed() << " seconds" << std::endl; stopWatch.reset(); n = 0; for (int i = 0; i < 1000000; ++i) n += npow2(i); std::cout << "npow2 result: " << n << " in " << stopWatch.elapsed() << " seconds" << std::endl;}int main(int argc, char *argv[]){ test_funcs(); test_funcs(); test_funcs(); test_funcs(); test_funcs();} 0 Share this post Link to post Share on other sites