#### Archived

This topic is now archived and is closed to further replies.

# Bit Shifting... Yay or Nay? :)

This topic is 6807 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I think it's another thing that newer Intel chips are just slow at doing... There have been other weird things like that, such as PUSHA and POPA being slower than pushing and popping all the registers manually. I think another one had to do with the LOOP instruction actually being slower than manually looping.
On the other hand, check out: http://www.geocities.com/SiliconValley/Way/6422/optimizations.htm
It says a shift is 1 clock. I'm assuming a multiply is more. I'm also assuming you had optimizations off, but you might want to see what code the compiler made for you, and that it didn't substitute an ADD for the multiply. (Try multiplying by 64 and << by 6)..I'd be interested to see what you get from doing that It's also possible that the multiplication was better suited for the dual pipeline, but probably not if it used a MUL.
-ns

[This message has been edited by NightShade (edited December 22, 1999).]

##### Share on other sites
I think you have too much loop overhead to get meaningful results. Unroll the loop about 32 times (just copy/paste) and see if you get diff results. yeah, and disable optimizations like the guy said.

a few other things. do it in asm to make sure the compiler doesnt touch it. don't assign to a variable in memory every time. this dillutes any results you might get. go for the least loop overhead, like this:

code:
_asm {  mov eax, 4566436  mov ecx, 10000000/32looptop:  shl eax, 4  shl eax, 4  shl eax, 4  shl eax, 4  (repeat 32 times)  dec ecx  jnz looptop}

and
code:
_asm {  mov eax, 4566436  mov ecx, 10000000/32looptop:  imul eax, 16  imul eax, 16  imul eax, 16  imul eax, 16  (repeat 32 times)  dec ecx  jnz looptop}

[This message has been edited by foofightr (edited December 22, 1999).]

##### Share on other sites
Hey

I just tried your suggested test (*64, << 6) and got exactly the same framerates (equal). I just redid the first test and found them to be pretty equal as well, only varying by 0.01 of a fps. Pretty odd And yes, I turned all optimisations off.

I guess its still better to use shifting tho, if only to aid any slower processors (although my p200 isn't exactly moder ).

Thanks for the relpy,

Russ

##### Share on other sites
Hey. I was just playing around with generating assembly listings for console projects in VC++ 6, and seeing what C code would generate what assembly code. I had "i" as an integer variable. Anyway, I looked and saw that
code:
i = i * 2;i *= 2;i = i * 4;i *= 4;

generated
code:
shl eax,1shl eax,1shl eax,2shl eax,2

anyway. (I'm not sure if eax was the correct register or not, though). So the compiler uses bit shifts when you multiply by a power of 2 anyway. I just thought I'd mention that.

------------------
http://qoy.tripod.com

[This message has been edited by Qoy (edited December 22, 1999).]

##### Share on other sites
Trivial shifts (like multiplication by a power of 2) are still quicker, but it depends. If there are a lot of independent instructions, PPro and above will overlap the short latency instructions with the long lantency multiply.

When you get into two or more shifts and adds, multiplying is quicker.

If you're unsure, use QueryPerformanceCounter() around the shifting version and multiplying version of a function and see which is quicker.

##### Share on other sites
Which is faster? Directly bit shifting a variable or accessing an
array that has precalculated bit shifted values?

Also does the length of the register (32-bits, 64-bits) have
anything to do with how long it takes to bit shift?

Dubious

##### Share on other sites
Are you doing this test without optimizations? 'tis possible that the compiler may be changing the *2 to << 1.

##### Share on other sites
My guess would be that since you're multiplying constants, it just pre-calculates the answer and hard codes it. There is no reason to calculate it every pass, and even without optimizations it should do this. A shift is as fast as any instruction, and should certainly beat an integer multiply (at the asm level).

Make sure to check the asm code to see that it did generate shifts and mults. It shouldn't have in your example.

Rock

##### Share on other sites
Ah yes, I forgot, the compiler is probably just multiplying the 2 constants and then hardcoding the results. Shift or multiply a variable instead, and don't use any optimizations or the variable may be taken out of the loop and replaced with a constant if the compiler detects its value doesn't change.

------------------
--Shannon Schlomer, BLAZE Technologies, Inc.

[This message has been edited by blaze (edited December 24, 1999).]

##### Share on other sites
Hands down, a single bit shift operation will beat a multiplication by a power of two. Your compiler will know this and change any multiplications by a constant power of two into a bit shift. The real question is if a mulitplication of a number that is a sum of two or more powers of two will beat out the bit shift/addition combo. (i*6 faster than (i << 2) + (i << 1) ?) On the 486 processors and earlier shift/add was faster for the some of two powers of two, but I haven't benched it since then, but again with the newer processors the superscalar architecture should make sure that even if you are doing a "slow" multiplication, no processors cycles are wasted.

Having the processor perform the bit shifts would also be faster than an array lookup. In the array lookup, you have to access memory possible multiple times with the chance of cache miss, page fault, etc.

1. 1
2. 2
frob
14
3. 3
4. 4
5. 5
Rutin
12

• 12
• 10
• 57
• 14
• 15
• ### Forum Statistics

• Total Topics
632113
• Total Posts
3004181

×