**3**

# Normalization Approximation in 30 operations

###
#1
Members - Reputation: **270**

Posted 17 January 2012 - 08:50 AM

squaredvector = position.x * position.x + position.y * position.y + position.z * position.z

onediv = 1 / squaredvector

position.x *= signx

position.y *= signy

position.z *= signz

xper = position.x * onediv

yper = position.y * onediv

zper = position.z * onediv

posyz = position.y + position.z

posxz = position.x + position.z

posxy = position.x + position.y

xmul = 1.0 - (posyz * onediv)

ymul = 1.0 - (posxz * onediv)

zmul = 1.0 -(posxy * onediv)

normal.x = xper * (xmul * posyz)* signx

normal.y = yper * (ymul * posxz)* signy

normal.z = zper * (zmul * posxy) * signz

Think this is the final version.

###
#2
Members - Reputation: **2084**

Posted 17 January 2012 - 09:08 AM

Compare this to the code

float x,y,z; float r = 1.0f / sqrt(x*x + y*y + z*z); x *= r; y *= r; z *= r;which has one sqrt, div, six muls and two adds. See here for an example.

Typically sqrt and divisions dominate in a code snippet like that. To see whether your version is a good approximation, I recommend you profile these two versions for runtime performance (clock cycles taken to normalize), and by how much the normalization is off in the worst case.

###
#4
Members - Reputation: **270**

Posted 17 January 2012 - 09:12 AM

1 division, 3 multiplies and, 5 additions.

As far as I've noticed the most it's off by is about 15%

###
#6
Crossbones+ - Reputation: **2352**

Posted 17 January 2012 - 12:42 PM

Typically sqrt and divisions dominate in a code snippet like that.

true for std::sqrt, but not true for _mm_sqrt_ss. Unless you are using an Atom, div and sqrt will take just as long as an add or sub. On an i3/i5/i7, doing 1/sqrt is actually slower (an additional dependency chain) and less accurate than simply dividing through by the length (since rcp + mul is less accurate than div). [As an aside, you could just use rsqrt instead]

Of course, if you actually want to take account of the fact that normalization can cause a division by zero, then mul will probably still make more sense...

With AVX (sandy bridge) it is possible to normalise 8 x Vector3's with nothing more than: 6 muls, 2 adds, and an rsqrt. The OP's method produces nonsense results, and uses 9 adds, 3 mul, 3 div. That's 15 ops vs 9 for something that is at least 8 times slower.

###
#7
Crossbones+ - Reputation: **10103**

Posted 17 January 2012 - 07:31 PM

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.

###
#8
Members - Reputation: **270**

Posted 17 January 2012 - 09:40 PM

most recent version.

if(position.x < 0)

position.x *= -1

signx = -1

}

addvector = position.x + position.y + position.z

positionerror = 1.0 - addvector * 0.01

onediv = 1/ addvector

normal.x = (onediv * (position.x + position.x)) * positionerror) * signx;

At 20 operations. 1 divide, 13 muls and 5 adds and 1 subtraction

###
#9
Members - Reputation: **270**

Posted 17 January 2012 - 09:41 PM

On a GPU I wouldn't touch this code with a 10-foot pole, no matter how happy the OP may be with it as an approximation. Normalization on a GPU is just a dp3, an rsq and a mul - 1 instruction slot each, 3 total, and all the benefits of parallel processing.

This is meant more for people who do software graphics.

###
#10
Crossbones+ - Reputation: **1422**

Posted 17 January 2012 - 10:19 PM

And people with software graphics will choose that over any other strange code (with strange limitations on max values) any day and any time.

One implementation of Newton-Raphson method (with smart choice of initial value) is "Carmack's inverse sqrt": http://en.wikipedia....iew_of_the_code

Anway - for a long time any of these methods are slower that simply one assembler instruction - rsqrtss from SSE instruction set. And that is from year 1999! Welcome to the future!

Here are some performance numbers on FPU sqrt vs Newton method vs Carmack's invsqrt: http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/

###
#12
Members - Reputation: **921**

Posted 18 January 2012 - 02:18 AM

Although if new things are never tried, nothing is ever advanced.

Trying arbitrary permutations of things without a decent mathematical foundation is not only unproductive, but counter-productive as you cannot really reason about the behavior and properties of the "approximation" you have produced.

This reflects in your confused statements (now some edited away) about just simply applying scaling factors and misuse of terms.

Things like the Carmack sqrt approximation have a foundation in mathematics and architecture (Newton-Rhapson iterations and IEEE754 vs. integer interaction), while your approach just substitutes a function with something with completely different properties and proclaim it to be The Next Best Things without a proper analysis of it.

Heck, a simple mesh plot of your function would have revealed the massive oscillations and regions of infinite results.

###
#14
Members - Reputation: **921**

Posted 20 January 2012 - 09:23 PM

There is nothing about your operation that is "normalization". You do not preserve the direction, nor produce a magnitude remotely approaching unity.

The topic should be named "Mutilation in way too many operations", as it doesn't have any of the properties desirable by division by a norm.

Note that the first line is your "final" version is the square of the length of a vector. If you've gone the length of computing that dot product already, you're just a square root away from a properly working regular normalization, with all the excellent properties it guarantees.

###
#16
Members - Reputation: **1300**

Posted 20 January 2012 - 10:10 PM

It might be slow, but I'm not sure where you got the impression that the sqrt operation is the anti-christ.

###
#17
Members - Reputation: **270**

Posted 20 January 2012 - 10:39 PM

This could work fine for lighting, generating normals as they are needed, but whatever, it's obvious no one here appreciates what I'm trying to do so I'm done.