Normalization Approximation in 30 operations

Graphics and GPU Programming Programming

Started by Butabee January 17, 2012 02:50 PM

26 comments, last by Chris_F 12 years, 2 months ago

274

Author

January 17, 2012 02:50 PM

I think I finally figured out a version that works... it is this

squaredvector = position.x * position.x + position.y * position.y + position.z * position.z
onediv = 1 / squaredvector
position.x *= signx
position.y *= signy
position.z *= signz
xper = position.x * onediv
yper = position.y * onediv
zper = position.z * onediv
posyz = position.y + position.z
posxz = position.x + position.z
posxy = position.x + position.y
xmul = 1.0 - (posyz * onediv)
ymul = 1.0 - (posxz * onediv)
zmul = 1.0 -(posxy * onediv)
normal.x = xper * (xmul * posyz)* signx
normal.y = yper * (ymul * posxz)* signy
normal.z = zper * (zmul * posxy) * signz

Think this is the final version.

MMO player simulation.

clb

2,152

January 17, 2012 03:08 PM

Your code has 3 divs, 3 muls, and 6 adds.
Compare this to the code



float x,y,z;

float r = 1.0f / sqrt(x*x + y*y + z*z);

x *= r;

y *= r;

z *= r;

which has one sqrt, div, six muls and two adds. See here for an example.

Typically sqrt and divisions dominate in a code snippet like that. To see whether your version is a good approximation, I recommend you profile these two versions for runtime performance (clock cycles taken to normalize), and by how much the normalization is off in the worst case.

Butabee

274

Author

January 17, 2012 03:10 PM

The above formula is off by 15.47005383792515290182975610039% so if you multiply the above normals by 0.86602540378443864676372317075293 you should get a completely accurate normal.

MMO player simulation.

Butabee

274

Author

January 17, 2012 03:12 PM

This code doesn't need a square root, and there's only 9 operations because the 1.0 divided by the 3 added up positions only needs to be done once per Vector.

1 division, 3 multiplies and, 5 additions.

As far as I've noticed the most it's off by is about 15%

MMO player simulation.

Zao

988

January 17, 2012 03:37 PM

Could you please "normalize" the two vectors <1,0,0> and <-1,1,0> for me?

To make it is hell. To fail is divine.

RobTheBloke

2,553

January 17, 2012 06:42 PM

Typically sqrt and divisions dominate in a code snippet like that.

true for std::sqrt, but not true for _mm_sqrt_ss. Unless you are using an Atom, div and sqrt will take just as long as an add or sub. On an i3/i5/i7, doing 1/sqrt is actually slower (an additional dependency chain) and less accurate than simply dividing through by the length (since rcp + mul is less accurate than div). [As an aside, you could just use rsqrt instead]
Of course, if you actually want to take account of the fact that normalization can cause a division by zero, then mul will probably still make more sense...

With AVX (sandy bridge) it is possible to normalise 8 x Vector3's with nothing more than: 6 muls, 2 adds, and an rsqrt. The OP's method produces nonsense results, and uses 9 adds, 3 mul, 3 div. That's 15 ops vs 9 for something that is at least 8 times slower.

21st Century Moose

13,459

January 18, 2012 01:31 AM

On a GPU I wouldn't touch this code with a 10-foot pole, no matter how happy the OP may be with it as an approximation. Normalization on a GPU is just a dp3, an rsq and a mul - 1 instruction slot each, 3 total, and all the benefits of parallel processing.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Butabee

274

Author

January 18, 2012 03:40 AM

*EDIT*

most recent version.

if(position.x < 0)

position.x *= -1

signx = -1

}
addvector = position.x + position.y + position.z

positionerror = 1.0 - addvector * 0.01

onediv = 1/ addvector

normal.x = (onediv * (position.x + position.x)) * positionerror) * signx;

At 20 operations. 1 divide, 13 muls and 5 adds and 1 subtraction

MMO player simulation.

Butabee

274

Author

January 18, 2012 03:41 AM

On a GPU I wouldn't touch this code with a 10-foot pole, no matter how happy the OP may be with it as an approximation. Normalization on a GPU is just a dp3, an rsq and a mul - 1 instruction slot each, 3 total, and all the benefits of parallel processing.

This is meant more for people who do software graphics.

MMO player simulation.

bubu LV

1,436

January 18, 2012 04:19 AM

So basically you are trying to invent faster sqrt approximation? It has been already done. One of such methods are called Newton-Raphson method.
And people with software graphics will choose that over any other strange code (with strange limitations on max values) any day and any time.
One implementation of Newton-Raphson method (with smart choice of initial value) is "Carmack's inverse sqrt": http://en.wikipedia....iew_of_the_code
Anway - for a long time any of these methods are slower that simply one assembler instruction - rsqrtss from SSE instruction set. And that is from year 1999! Welcome to the future!

Here are some performance numbers on FPU sqrt vs Newton method vs Carmack's invsqrt: http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/

Normalization Approximation in 30 operations

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Normalization Approximation in 30 operations

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines