Archived

This topic is now archived and is closed to further replies.

Guest Anonymous Poster

Bit Shifting... Yay or Nay? :)

This topic is 6564 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I think it's another thing that newer Intel chips are just slow at doing... There have been other weird things like that, such as PUSHA and POPA being slower than pushing and popping all the registers manually. I think another one had to do with the LOOP instruction actually being slower than manually looping.
On the other hand, check out: http://www.geocities.com/SiliconValley/Way/6422/optimizations.htm
It says a shift is 1 clock. I'm assuming a multiply is more. I'm also assuming you had optimizations off, but you might want to see what code the compiler made for you, and that it didn't substitute an ADD for the multiply. (Try multiplying by 64 and << by 6)..I'd be interested to see what you get from doing that It's also possible that the multiplication was better suited for the dual pipeline, but probably not if it used a MUL.
-ns


[This message has been edited by NightShade (edited December 22, 1999).]

Share this post


Link to post
Share on other sites
I think you have too much loop overhead to get meaningful results. Unroll the loop about 32 times (just copy/paste) and see if you get diff results. yeah, and disable optimizations like the guy said.

a few other things. do it in asm to make sure the compiler doesnt touch it. don't assign to a variable in memory every time. this dillutes any results you might get. go for the least loop overhead, like this:

code:

_asm {
mov eax, 4566436
mov ecx, 10000000/32
looptop:
shl eax, 4
shl eax, 4
shl eax, 4
shl eax, 4
(repeat 32 times)
dec ecx
jnz looptop
}


and
code:

_asm {
mov eax, 4566436
mov ecx, 10000000/32
looptop:
imul eax, 16
imul eax, 16
imul eax, 16
imul eax, 16
(repeat 32 times)
dec ecx
jnz looptop
}

[This message has been edited by foofightr (edited December 22, 1999).]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Hey

I just tried your suggested test (*64, << 6) and got exactly the same framerates (equal). I just redid the first test and found them to be pretty equal as well, only varying by 0.01 of a fps. Pretty odd And yes, I turned all optimisations off.

I guess its still better to use shifting tho, if only to aid any slower processors (although my p200 isn't exactly moder ).

Thanks for the relpy,

Russ

Share this post


Link to post
Share on other sites
Hey. I was just playing around with generating assembly listings for console projects in VC++ 6, and seeing what C code would generate what assembly code. I had "i" as an integer variable. Anyway, I looked and saw that
code:

i = i * 2;
i *= 2;
i = i * 4;
i *= 4;


generated
code:

shl eax,1
shl eax,1
shl eax,2
shl eax,2


anyway. (I'm not sure if eax was the correct register or not, though). So the compiler uses bit shifts when you multiply by a power of 2 anyway. I just thought I'd mention that.

------------------
http://qoy.tripod.com

[This message has been edited by Qoy (edited December 22, 1999).]

Share this post


Link to post
Share on other sites
Trivial shifts (like multiplication by a power of 2) are still quicker, but it depends. If there are a lot of independent instructions, PPro and above will overlap the short latency instructions with the long lantency multiply.

When you get into two or more shifts and adds, multiplying is quicker.

If you're unsure, use QueryPerformanceCounter() around the shifting version and multiplying version of a function and see which is quicker.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Which is faster? Directly bit shifting a variable or accessing an
array that has precalculated bit shifted values?

Also does the length of the register (32-bits, 64-bits) have
anything to do with how long it takes to bit shift?

Dubious

Share this post


Link to post
Share on other sites
My guess would be that since you're multiplying constants, it just pre-calculates the answer and hard codes it. There is no reason to calculate it every pass, and even without optimizations it should do this. A shift is as fast as any instruction, and should certainly beat an integer multiply (at the asm level).

Make sure to check the asm code to see that it did generate shifts and mults. It shouldn't have in your example.

Rock

Share this post


Link to post
Share on other sites
Ah yes, I forgot, the compiler is probably just multiplying the 2 constants and then hardcoding the results. Shift or multiply a variable instead, and don't use any optimizations or the variable may be taken out of the loop and replaced with a constant if the compiler detects its value doesn't change.

------------------
--Shannon Schlomer, BLAZE Technologies, Inc.

[This message has been edited by blaze (edited December 24, 1999).]

Share this post


Link to post
Share on other sites
Hands down, a single bit shift operation will beat a multiplication by a power of two. Your compiler will know this and change any multiplications by a constant power of two into a bit shift. The real question is if a mulitplication of a number that is a sum of two or more powers of two will beat out the bit shift/addition combo. (i*6 faster than (i << 2) + (i << 1) ?) On the 486 processors and earlier shift/add was faster for the some of two powers of two, but I haven't benched it since then, but again with the newer processors the superscalar architecture should make sure that even if you are doing a "slow" multiplication, no processors cycles are wasted.

Having the processor perform the bit shifts would also be faster than an array lookup. In the array lookup, you have to access memory possible multiple times with the chance of cache miss, page fault, etc.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Hi,

A few mins ago I was reading some reviews of Andre's new book on amazon. Some guy was denouncing all of andre's claims, and one of them was that 'bit shifting is faster than multiplying'. After years of been told _always_ bit shift, I laughed.

I thought I'd check it out tho, having never actually compared normal multiplication and bit shifting, just taking it on faith that it is faster...anyway.

I used the following code for bit shifting;

for(DWORD i=0; i<10000000; i++)
temp = 6545345 << 1;

And this for normal multiplication;

for(DWORD i=0; i<10000000; i++)
temp = 6545345 * 2;

10,000,000 seemed a large enough amount of multiplication

Upon checking the frame rates, I got 3.15fps for the standard multiplication, and about 3.12fps for the bit shifting... HMM.

Is this just me? Bear in mind im on a lowly p200mmx, hence the slow fps But in comparison, how come bitshifting is actually slower than standard multiplication??

Is this all just me?

Comments/Advice/Flames...

--
Russ

Share this post


Link to post
Share on other sites