# Divide or multiply? Is multiplying faster than dividing?

This topic is 4481 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I see in many codes things like instead of divide by two multiply by 0.5 (when working with float point). And another example is this when finding the roots of a quadratic:
.
.
.
if(delta > 0.0)
{
delta = sqrt(delta);
a = 1/(2*a);
double x1 = (-b+delta) * a;
double x2 = (-b-delta) * a;
.
.
.
instead of divide by a two times divide 1 by a and then multiply by a two times. Is it really faster?!?

##### Share on other sites
The optimizer is generally capable of making that transformation itself, if necessary. (At least in C/C++. Be careful with other languages, C# in particular.)

##### Share on other sites
Go here. Search for "r/m32/m64" (without quotes). This table lists instruction cycles. FMUL takes 3 cycles, while FDIV takes 19-39. Typically there is a fast instruction for 1/x as well.

2*a = 3 cycles
1/2*a = 3 + 3 cycles (I doubt it takes 3 for 1/x).
number*(1/2*a) = 3 + 3 + 3.

That 9 cycles, 1/2*a is likely cached for the next line, so the second number*(1/2*a) is just 3 cycles, for a total of 12.

With divides
2*a = 3 cycles
number/(2*a) = 19 + 3 (being generous assuming best FDIV time)
So 22 cycles. The 2*a is likely cached for the next line, so the second number/(2*a) is only 19 cycles, for a total of 31 cycles.

12 vs. 31, and I'm assuming 1/x is slower than it likely is, and I'm assuming FDIV is likely faster than it is.

Oh, and it's all much worse because you're using doubles.

##### Share on other sites
Double Integer division takes 39 cycles.
But is there really an optimised form for 1/x?
That would seem odd, wouldn't the "real" division become obsolete then?

##### Share on other sites
Quote:
 Original post by kiomeDouble Integer division takes 39 cycles.But is there really an optimised form for 1/x?That would seem odd, wouldn't the "real" division become obsolete then?

no, as a real division would then be a combination of multiplication and 1/x, means, an optimised concatenation of those two ops. and only where you need to divide more than 1 time trough the same value, the optimised 1/x could get used, with a combination of multiplications afterwards.

##### Share on other sites
Quote:
 Original post by PromitThe optimizer is generally capable of making that transformation itself, if necessary. (At least in C/C++. Be careful with other languages, C# in particular.)

Also be careful that VS2005 defaults to not doing that transformation - it uses /fp:precise by default, but only does the division-is-approximately-multiplication-by-reciprocal optimisation under /fp:fast, as far as I can see.

It also converts /2 into *0.5 in both /fp:fast and /fp:precise (though not /fp:strict), but converts /3 into *0.3333... only in /fp:fast.

##### Share on other sites
if I'm not mistaken, I read in one of my books that computers don't divide, they can only add, so none of them would be quicker according to taht...

##### Share on other sites
From what I know, computer hardware doesn´t subtract since it´s only a matter of complementing the byte/word/double word and then adding. Regarding multiplication/division, I guess it would be stupid not to implement circuits that do this, but maybe wasting cycles is cheaper?
What I know for sure is that x86 Assembly language provides these operations in some ways so things get optimized, and that makes me believe the way you do a division can make it a lot quicker.

##### Share on other sites
I've checked my old processor book, and there is no mention of a 1/x instruction, so I guess I was wrong there. SSE has such an instruction, but not the regular FPU.

Quote:
 Original post by CarolinaWhat I know for sure is that x86 Assembly language provides these operations in some ways so things get optimized

Not always true. Back on the 8088/8086 they had "loop", which decreased CX and jumped if it was not zero. I haven't kept up with the timings on the latest chips, but all the way up to the 80486 it was actually faster to do "dec cx, jnz label". Not only was it faster, but it made it possible to use other registers than CX to hold a loop counter. Ah, the balancing act of performance, vs. silicon complexity and cost, vs. engineering complexity and cost. I'm glad I don't make CPUs.

Intel ASM is a horrible kludge of ancient technology that they've miraculously been able to keep going. While a new processor design could beat it in performance, not a by a huge enough margin for everyone to give up backwards compatibility, experience, etc.

##### Share on other sites
Quote:
 Original post by brandonmanif I'm not mistaken, I read in one of my books that computers don't divide, they can only add, so none of them would be quicker according to taht...

You are mistaken.

CM

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 15
• 10
• 9
• 35
• 16
• ### Forum Statistics

• Total Topics
634125
• Total Posts
3015678
×