• Advertisement
Sign in to follow this  

Divide or multiply? Is multiplying faster than dividing?

This topic is 4147 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I see in many codes things like instead of divide by two multiply by 0.5 (when working with float point). And another example is this when finding the roots of a quadratic:
.
.
.
if(delta > 0.0)
{
    delta = sqrt(delta);
    a = 1/(2*a);
    double x1 = (-b+delta) * a;
    double x2 = (-b-delta) * a;
.
.
.
instead of divide by a two times divide 1 by a and then multiply by a two times. Is it really faster?!?

Share this post


Link to post
Share on other sites
Advertisement
The optimizer is generally capable of making that transformation itself, if necessary. (At least in C/C++. Be careful with other languages, C# in particular.)

Share this post


Link to post
Share on other sites
Go here. Search for "r/m32/m64" (without quotes). This table lists instruction cycles. FMUL takes 3 cycles, while FDIV takes 19-39. Typically there is a fast instruction for 1/x as well.

2*a = 3 cycles
1/2*a = 3 + 3 cycles (I doubt it takes 3 for 1/x).
number*(1/2*a) = 3 + 3 + 3.

That 9 cycles, 1/2*a is likely cached for the next line, so the second number*(1/2*a) is just 3 cycles, for a total of 12.

With divides
2*a = 3 cycles
number/(2*a) = 19 + 3 (being generous assuming best FDIV time)
So 22 cycles. The 2*a is likely cached for the next line, so the second number/(2*a) is only 19 cycles, for a total of 31 cycles.

12 vs. 31, and I'm assuming 1/x is slower than it likely is, and I'm assuming FDIV is likely faster than it is.

Oh, and it's all much worse because you're using doubles.

Share this post


Link to post
Share on other sites
Double Integer division takes 39 cycles.
But is there really an optimised form for 1/x?
That would seem odd, wouldn't the "real" division become obsolete then?

Share this post


Link to post
Share on other sites
Quote:
Original post by kiome
Double Integer division takes 39 cycles.
But is there really an optimised form for 1/x?
That would seem odd, wouldn't the "real" division become obsolete then?


no, as a real division would then be a combination of multiplication and 1/x, means, an optimised concatenation of those two ops. and only where you need to divide more than 1 time trough the same value, the optimised 1/x could get used, with a combination of multiplications afterwards.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
The optimizer is generally capable of making that transformation itself, if necessary. (At least in C/C++. Be careful with other languages, C# in particular.)

Also be careful that VS2005 defaults to not doing that transformation - it uses /fp:precise by default, but only does the division-is-approximately-multiplication-by-reciprocal optimisation under /fp:fast, as far as I can see.

It also converts /2 into *0.5 in both /fp:fast and /fp:precise (though not /fp:strict), but converts /3 into *0.3333... only in /fp:fast.

Share this post


Link to post
Share on other sites
if I'm not mistaken, I read in one of my books that computers don't divide, they can only add, so none of them would be quicker according to taht...

Share this post


Link to post
Share on other sites
From what I know, computer hardware doesn´t subtract since it´s only a matter of complementing the byte/word/double word and then adding. Regarding multiplication/division, I guess it would be stupid not to implement circuits that do this, but maybe wasting cycles is cheaper?
What I know for sure is that x86 Assembly language provides these operations in some ways so things get optimized, and that makes me believe the way you do a division can make it a lot quicker.

Share this post


Link to post
Share on other sites
I've checked my old processor book, and there is no mention of a 1/x instruction, so I guess I was wrong there. SSE has such an instruction, but not the regular FPU.

Quote:
Original post by Carolina
What I know for sure is that x86 Assembly language provides these operations in some ways so things get optimized

Not always true. Back on the 8088/8086 they had "loop", which decreased CX and jumped if it was not zero. I haven't kept up with the timings on the latest chips, but all the way up to the 80486 it was actually faster to do "dec cx, jnz label". Not only was it faster, but it made it possible to use other registers than CX to hold a loop counter. Ah, the balancing act of performance, vs. silicon complexity and cost, vs. engineering complexity and cost. I'm glad I don't make CPUs.

Intel ASM is a horrible kludge of ancient technology that they've miraculously been able to keep going. While a new processor design could beat it in performance, not a by a huge enough margin for everyone to give up backwards compatibility, experience, etc.

Share this post


Link to post
Share on other sites
Quote:
Original post by brandonman
if I'm not mistaken, I read in one of my books that computers don't divide, they can only add, so none of them would be quicker according to taht...

You are mistaken.

CM

Share this post


Link to post
Share on other sites
Quote:
Original post by Excors
Quote:
Original post by Promit
The optimizer is generally capable of making that transformation itself, if necessary. (At least in C/C++. Be careful with other languages, C# in particular.)

Also be careful that VS2005 defaults to not doing that transformation - it uses /fp:precise by default, but only does the division-is-approximately-multiplication-by-reciprocal optimisation under /fp:fast, as far as I can see.

It also converts /2 into *0.5 in both /fp:fast and /fp:precise (though not /fp:strict), but converts /3 into *0.3333... only in /fp:fast.


GCC follows roughly the same strategy. I don't believe any serious compiler would apply such unsafe optimizations by default. With a few exceptions (as shown above), division does not equal multiplication by the inverse. It might work for real numbers, but it certainly does not for floating-point. And compilers are usually acutely aware of that.

Share this post


Link to post
Share on other sites
So is this curiosity, or is there a portion of your code that isn't performing as it should and you've ran out of algorithm optimizations?

Note that doing things like multiplying by 0.5 when you want to do a division, while a neat trick (assuming the compiler/jitter doesn't do that sort of optimization itself) will make your code harder to read, and in the process reduce maintainability. It should be done only if you're out of options.

Share this post


Link to post
Share on other sites
It is probably a good idea to keep the cycle time as low as possible of course, but the impact is probably going to be unoticable. Depends on what your doing.

But it would be nice to see the difference in a game for example. see if there is a performance difference between the two.

Share this post


Link to post
Share on other sites
Quote:
Original post by xissburg
instead of divide by a two times divide 1 by a and then multiply by a two times. Is it really faster?!?
It most certainly is faster! Even for two divisions it is faster to make this optimisation, but the bigger payoff comes when there are 3 or more divisions by the same value. e.g. Matrix inverse calculation.

The typical usage of such optimisation is in 3D engines, for calculating a matrix inverse, normalising a vector, perspective-correct software-rendering, etc.

Similiar such optimsations include:
Replacing pow(x, 3) with x*x*x
Replacing if (sqrt(x) < radius) with if (x < radius*radius)

Share this post


Link to post
Share on other sites
Quote:
Original post by Spoonbender
I don't believe any serious compiler would apply such unsafe optimizations by default.

I forgot to mention it in my post, but VS2003 does actually do those optimisations by default - it's about the same as VS2005's /fp:fast, unless you add the /Op option to force more correct floating point behaviour. Whether that makes it not a serious compiler is probably a matter of opinion, but at least VS2005 has moved in the right direction [smile].

(I think I'm happiest to have numerical correctness as the default, and then use a profiler to look for bottlenecks in heavy floating-point code and then explicitly enable more compiler optimisations for that section or rewrite the code to get some improvements (like using multiplication rather than division). That profiling approach doesn't work when there aren't any bottlenecks left and it's still too slow and you just need an overall faster compiler (e.g. if a program is written in Perl instead of C++), but I'd expect slow floating-point code would be isolated in nice small spots that you can focus on easily, and the compiler shouldn't needlessly waste accuracy in the rest of the program where performance doesn't matter.)

Share this post


Link to post
Share on other sites
Quote:
Original post by xissburg
I see in many codes things like instead of divide by two multiply by 0.5 (when working with float point). And another example is this when finding the roots of a quadratic:
instead of divide by a two times divide 1 by a and then multiply by a two times. Is it really faster?!?


Multiplication is 1-2 cycles, division on SSE2 up to 80 cycles. All Core2Duo and newer CPUs have fastest type of division on SSE2 registers. 64 bit computing is using SSE2 registers for floating point anyway (if you want or not).

It makes sense actually. Division is 58 subtraction operations, followed by normalization.

Of course there are faster ways. For example I used 1/a precalculated followed by three correction steps (on a fixed point integer).
If you are using ASM there is an imprecise, however about 3 CPU cycles instruction. There is also much more precise instruction that is still cheap 15 CPU cycles, and have just a small error. Of course these instructions are not visible for a higher language type of programmer.

Share this post


Link to post
Share on other sites
Quote:
The optimizer is generally capable of making that transformation itself, if necessary.


That's not true. Multiplication has different precision implications than division.

A defined constant will be folded:

static const float x = 1/2; // folded

However, a literal division will not be folded:

float res = val / 2; // not folded


The latency of a division is a lot higher than the latency of a multiplication, but the cycle counts quoted above are highly dependent on CPU and instruction stream; don't treat them as gospel. However, for fixed divisions, it pays to multiply by the inverse as a constant, rather than divide.

Share this post


Link to post
Share on other sites
Quote:
Original post by alexmoura
Note that doing things like multiplying by 0.5 when you want to do a division, while a neat trick (assuming the compiler/jitter doesn't do that sort of optimization itself) will make your code harder to read, and in the process reduce maintainability. It should be done only if you're out of options.


Actually I would consider 0.5F as quite readable in comparison to (1F/2). If you have something more serious like inverse of a large prime number, you could easily use constants k1, k2, and so on.

Actually any game developer should view multiplication by power of two on floating point as a simple commonly used operation. 0.5 = 2^-1

Share this post


Link to post
Share on other sites
DOn't forget about pipelining in most processors

it could take you 39 cycles for a devision, but if the processor doesn't need to use that result for another 48 cycles, it will happily send the thing off and calculate it in parraell on the math-coprocessor, and will be done well before it is ever used. In said case, the overhead is really only the ammount it takes the cpu to place it in the co-processor.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement