Multiplication vs Division?

Started by
8 comments, last by iMalc 13 years, 3 months ago
As I mentioned in another thread Arx Fatalis Source Code has been released. Poking through it I noticed this little magnificent tidbit:


#define DIV2 0.5f
#define DIV3 0.3333333333333333333333f
#define DIV4 0.25f
#define DIV5 0.2f
#define DIV6 0.1666666666666666666666f
#define DIV7 0.1428571428571428571428f
#define DIV8 0.125f
#define DIV9 0.1111111111111111111111f
#define DIV10 0.1f
#define DIV11 0.0909090909090909090909f
#define DIV12 0.0833333333333333333333f
#define DIV13 0.0769230769230769230769f
#define DIV14 0.0714285714285714285714f
#define DIV15 0.0666666666666666666667f
#define DIV16 0.0625f
#define DIV17 0.0588235294117647058823f
#define DIV18 0.0555555555555555555555f
...


Resulting in code such as:

float myFloat = someVal * DIV2;

instead of

float myFloat = someVal / 2.0f;

I figured "damn that's clever yet simple way of optimizing code!" But, of course, I decided to benchmark this out of curiosity, and multiplication has proven to be some 5 times faster than division WHOA......... in Debug. When I switched to release, the results were pretty much identical.

Out of curiosity, any thoughts why this "optimization" might have been used if not producing any real benefit while complicating the code? Just speeding up Debugging? Or maybe some older processors do handle mul faster than div?
Comrade, Listen! The Glorious Commonwealth's first Airship has been compromised! Who is the saboteur? Who can be saved? Uncover what the passengers are hiding and write the grisly conclusion of its final hours in an open-ended, player-driven adventure. Dziekujemy! -- Karaski: What Goes Up...
Advertisement
A modern optimizing compiler should by itself do the faster thing which will usually be a multiplication instead of a division. As you noticed, Release builds (which generally are configured to do such optimizations) have identical speeds while debug builds (where such optimizations are typically disabled) are have different speeds. Since Arx Fatalis is quite old it is quite likely the they were using a compiler which was unable to do such optimizations by itself.
Thats what I was thinking as well, actually. Coding it up 2002 style, yeee!
Comrade, Listen! The Glorious Commonwealth's first Airship has been compromised! Who is the saboteur? Who can be saved? Uncover what the passengers are hiding and write the grisly conclusion of its final hours in an open-ended, player-driven adventure. Dziekujemy! -- Karaski: What Goes Up...

A modern optimizing compiler should by itself do the faster thing which will usually be a multiplication instead of a division.

Some useful reading on floating point optimization here. Floating point optimization may not be as trivial as you may think, so don't count on this happening, simply because transforming the division into its reciprocal multiplication may not produce the same result. If it does, the behavior of the program has been altered, and the compiler has not done its job properly (in a strict sense). The article concerns VS 2005, but as far as I have seen, it still applies in VS 2010, and the default settings won't do the division to multiplication transformation with the default project settings.
I wouldn't bother. It is a micro-optimisation, which can be applied where needed if profiling indicates a problem. In other cases, it just reduces the readability of the code. At the very least don't use the pre-processor for this, use the "const" keyword if you must. That said, I do generally multiply by 0.5f rather than divide by 2, but that is partly to make integer division less likely and I don't feel it impacts the code readability too much.

As Brother Bob mentioned, there are compiler settings which you can play with to make it more likely that this optimisation will be used. But heed the warning over the correctness issue.
Apart from all the issues already mentioned, it's also problematic that the constants are hard coded as single precision floats. What if, for some reason, you need to calculate something in double precision and use one of those macros, forgetting this little detail? Depending on the ordering of the expression, the compiler might cast the doubles to float, make the calculation, and cast back to double to store it, throwing away the benefit of using doubles in the first place.
any thoughts why this "optimization" might have been used
There was a time when division was absurdly slow. There was also time when multiplication was so slow that lookup tables were used instead.

It may be worth it, it might not be. This type of optimizations are moving targets. Sometimes they work, sometimes they don't.

But at least you benchmarked, which is more than 99% of people do and answered the question. This is also the correct approach.

-------
Going into realm of mostly speculation here:

The running time of the above operations is probably dominated by load/store and other memory accesses. Most operations today are IO bound. Either by DRAM or other CPU caches. Load store means that variable is first loaded from memory to register, manipulated in register, then written back. This alone can introduce a ton of side-effects which slow things down.

Ideally, one would convert the above into SIMD, then apply single multiplication/division to thousands of elements in a row.

Also, if optimizing for throughput, one always wants to maximize the reads:writes ratio. Try to keep values that change in registers for as long as possible and minimize writes. Reading is mostly fine, especially if streamed.


This works for bulk operations, but when dealing with very fine-grained and conditional logic, things simply aren't cost-effective to optimize this way.
There is something that a lot of people seem to not understand. "Do not optimize until you have found a performance issue." In most cases the parts of the code that people think have will have a performance issue in is not the area that is really the problem. Next, it is a waste of time and effort until you find there is a real problem. Finally most of the optimization that people do doesn't really help much.

theTroll

In most cases the parts of the code that people think have will have a performance issue in is not the area that is really the problem.

Gather 'round children, and I shall read to ye from the good book...Code Complete 2.

And brother McConnell sayeth in regards to performance "...programmers are notoriously poor at guessing which code really causes problems."

Amen.
Instead of thisdouble x = y / 7.0;
you can write thisdouble x = y * (1.0 / 7);and you can be fairly sure that the correct constant double value will be calculated for you at compile time, probably even in debug configuration.
The #defines are a waste of time.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement