Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

cnstrnd

Nearest power of 2

This topic is 5600 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The next power of 2 is for the numbers from 0 to 15:




p=5;
if (i>=p) {
p=8;
labelcritticall15:;
if (i>p) {
p=p+p;
}
} else {
p=2;
if (i>=p) {
goto labelcritticall15;
}
p=1;
}





i stands for input and p stands for power.

Also developed by Critticall tool.

Share this post


Link to post
Share on other sites
Advertisement


p=8;
if (i>p) {
p=p+p;
} else {
p=5;
if (i>=p) {
p=8;
} else {
p=2;
if (i>=p) {
if (i!=p) {
p=4;
}
} else {
p=1;
}
}
}



Another one. Faster for 5%.

Share this post


Link to post
Share on other sites
And another 1% better.


p=5;
if (i>=p) {
p=8;
if (i>p) {
p=p+p;
}
} else {
p=2;
if (i>=p) {
if (i>p) {
p=4;
}
} else {
p=1;
}
}



To generalize it now, to any bit number is quite easy. Isn't it?

Share this post


Link to post
Share on other sites
How we can generalize it to 5 bits numbers?

This is an obvius way:

if (i>16) {p=32;goto over;}
if (i==16) {p=16;goto over;}
p=5;
if (i>=p) {
p=8;
if (i>p) {
p=p+p;
}
} else {
p=2;
if (i>=p) {
if (i>p) {
p=4;
}
} else {
p=1;
}
}

over:;




Critticall will smooth it to:



p=16;
if (i>p) {
p=p+p;
} else {
p=5;
if (i>=p) {
p=8;
if (i>p) {
p=p+p;
}
} else {
p=2;
if (i>=p) {
if (i>p) {
p=4;
}
} else {
p=1;
}
}
}


Share this post


Link to post
Share on other sites
Seems like it's generating the b-search approach and we've already beaten that one. Also would you mind using Charles exsiting framework so we can get comparable numbers?

Share this post


Link to post
Share on other sites
Well ... it's the same thing. _WE_ are faster, not this BSR version.

This b tree is better. I don't care what you say, until you run and see!

Share this post


Link to post
Share on other sites
Quote:
Original post by Thomas_CR

Well ... it's the same thing. _WE_ are faster, not this BSR version.

This b tree is better. I don't care what you say, until you run and see!


I said b-search as in "binarysearch" and honestly we've beaten hardcoded binarytrees just have a look at the 1kb LUT version.

And really you have no reason to be rude just because I asked you to use the excellent framework everyone else is using so we can compare speeds.

EDIT: added the Criticall version to the testing framework here's the numbers


nextpow2_Naive : 42.906250
nextpow2_Logical : 10.898438
nextpow2_Recursive : 67.859375
nextpow2_IEEE : 8.250000
nextpow2_BSR : 14.710938
nextpow2_3DNow : 9.523438
nextpow2_DD : 4.898438
nextpow2_DD2 :8.250000
nextpow2_DD3 : 6.281250
NextPow2_Critticall : 37.257813


note that the framework makes heavy used of inlineing to get
good performace so testing through function pointers can yield quite diffrent numbers.

I still consider my DD3 version using 1kb LUT to be the best practial solution, for lower numbers where memory is at a premium the IEEE version is a really good choice and for the general case the logical version is probably the best choice.

Share this post


Link to post
Share on other sites
What is the problem here?

You _have to_ measure for _all_ the numbers from 0 to 2^(N+1)-1. Where N denotes the bitness.

For example, if your function is meant to work for 10 bits numbers, you must measure it for 0 to 1023. There is no point to pick one number and measure for this one only.

Now, suppose that we are actually measuring for 21 bits number. Half of them are greater than 1048576. In all those cases we compare the number with 1048576 and set the result to 2097152. Few clock cycles and it is done. For the next half of the half, we have two comparisons. Several comparisons and one set for the waste majority of numbers. That also means several CPU cycles.

And yes, for some small numbers this can actually be a slower way. But they are just a tiny minority.

All in all, you can't beat the binary approach, toward which Critticall is also converting.

Here, you do a FLAW measurement. Do you understand that?



Share this post


Link to post
Share on other sites
Either you don't understand what Im saying or you're just ignorant but the fact remains that a hardcoded binarysearch isn't the fastest way to do this. All benchmarks will have flaws and that's the universal truth about benchmarking still you can see the numbers the Criticall goop just stinks and doesn't let the compiler carry out its job in a good fashion. Ill run some additional tests and report again.

Share this post


Link to post
Share on other sites
I understand you perfectly. I insist however, that you make test on ALL the numbers of a certain bit witdth.

Otherwise, your test is pointless. Worse, it's wrong.


Share this post


Link to post
Share on other sites

This topic is 5600 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Guest
This topic is now closed to further replies.

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!